Difference between revisions of "Emacs"

From Apertium
Jump to navigation Jump to search
 
(141 intermediate revisions by 6 users not shown)
Line 1: Line 1:
  +
{{TOCD}}
'''Emacs''' stuff:
 
== Quickstart for non-emacs users ==
 
If you just want to get it set up for dix editing with the minimum of hassle, here is a howto. Assming you have emacs installed, first execute (paste) the following commands in your terminal:
 
   
  +
Info on using '''Emacs''' for Apertium-related tasks.
   
  +
==Quickstart==
mkdir ~/.elisp
 
cd ~/.elisp
 
wget http://www.thaiopensource.com/download/nxml-mode-20030901.tar.gz
 
tar xzvf nxml-mode-20030901.tar.gz
 
rm -f nxml-mode-20030901.tar.gz
 
wget http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/apertium-tools/dix.el
 
cd ..
 
touch ~/.emacs
 
   
  +
There is an init file in git that will give your Emacs some useful Apertium-related packages and settings, including:
Then open the file ~/.emacs in an editor (like vi) and enter the following:
 
  +
* dix-mode, for XML dictionary and transfer editing
  +
* cg-mode, for Constraint Grammar rule editing and testing
  +
* hfst-mode, for lexc/twol syntax highlighting
  +
* tab-completion
   
  +
To get that set up, simply
 
<pre>
 
<pre>
  +
mkdir -p ~/.emacs.d/
; Start of DIX mode setup
 
  +
curl https://raw.githubusercontent.com/unhammer/dix/master/init-apertium.el > ~/.emacs.d/init-apertium.el
(add-to-list 'load-path "~/.elisp") ; path to the folder where you have dix.el
 
  +
echo '(load "~/.emacs.d/init-apertium.el")' >> ~/.emacs.d/init.el
(load "~/.elisp/nxml-mode-20030901/rng-auto.el") ; full path to the _file_ rng-auto.el which you just extracted
 
  +
emacs
(autoload 'dix-mode "dix"
 
  +
</pre>
"dix-mode is a minor mode for editing Apertium XML dictionary files." t)
 
  +
The last line starts up Emacs, which will download the new packages since it's the first startup. (The next startups will be much faster.)
(add-hook 'nxml-mode-hook
 
(lambda () (and buffer-file-name
 
(string-match "\\.dix$" buffer-file-name)
 
(dix-mode 1))))
 
(add-to-list 'auto-mode-alist '("\\.dix\\'" . nxml-mode))
 
   
  +
If you ever want to update your installed Emacs packages, you do <code>M-x list-packages</code>, then <code>U x</code>.
   
; Start of CUA mode setup - to make Emacs behave like other editors - see http://www.emacswiki.org/CuaMode
 
(cua-mode t)
 
(setq cua-auto-tabify-rectangles nil) ;; Don't tabify after rectangle commands
 
(transient-mark-mode 1) ;; No region when it is not highlighted
 
(setq cua-keep-region-after-copy t) ;; Standard Windows behaviour
 
</pre>
 
   
  +
The rest of this page gives some documentation of the various modes.
   
  +
===Mac OS X===
  +
If you're on Mac, the built-in emacs is ancient. Don't use that. Instead, get https://emacsformacosx.com/
   
  +
You can make an alias to start this emacs from the command line with e.g.
=== On Debian/Ubuntu ===
 
  +
<pre>alias em="open -a /Applications/Emacs.app"</pre>
   
  +
(If you prefer having non-GUI emacs, change that to <code>alias em="open -a /Applications/Emacs.app -nw"</code>.)
sudo apt-get install nxml-mode
 
mkdir ~/.elisp
 
cd ~/.elisp
 
wget http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/apertium-tools/dix.el
 
touch ~/.emacs
 
cd ..
 
   
  +
===Validation slow?===
Then open the file ~/.emacs in an editor (like vi) and enter the following:
 
   
  +
The above init-apertium.el turns on on-the-fly XML validation, which can be slow on old computers. If editing large .dix files seems too slow, try turning off one or both of the validators by putting
<pre>
 
(add-to-list 'load-path "~/.elisp") ; path to the folder where you have dix.el
 
(autoload 'dix-mode "dix"
 
"dix-mode is a minor mode for editing Apertium XML dictionary files." t)
 
(add-hook 'nxml-mode-hook
 
(lambda () (and buffer-file-name
 
(string-match "\\.dix$" buffer-file-name)
 
(dix-mode 1))))
 
(add-to-list 'auto-mode-alist '("\\.dix\\'" . nxml-mode))
 
   
  +
<pre>
(cua-mode t)
 
  +
(add-hook 'nxml-mode-hook (lambda () (rng-validate-mode 0)) 'append)
(setq cua-auto-tabify-rectangles nil) ;; Don't tabify after rectangle commands
 
(transient-mark-mode 1) ;; No region when it is not highlighted
 
(setq cua-keep-region-after-copy t) ;; Standard Windows behaviour
 
 
</pre>
 
</pre>
  +
in your ~/.emacs.d/init.el
  +
  +
== XML (dix, transfer, …) editing ==
   
== nxml-mode ==
+
=== nxml-mode ===
Emacs has a nice xml editing mode called
+
Emacs has a nice xml editing mode (included as of version 23) called
 
[http://www.emacswiki.org/emacs/NxmlMode nXML], with syntax highlighting, movement commands to navigate through the XML (out of, into, across elements, etc.). It also has validation, and can auto-complete using the XML schema if a schema file is available.
 
[http://www.emacswiki.org/emacs/NxmlMode nXML], with syntax highlighting, movement commands to navigate through the XML (out of, into, across elements, etc.). It also has validation, and can auto-complete using the XML schema if a schema file is available.
   
  +
[[#Quickstart|init-apertium.el]] turns on nxml-mode for the common Apertium XML file extensions.
Note: since the dix-files can often get rather huge, syntax highlighting can make nXML a bit slow (at least if you're eg. planning on running a keyboard macro 10000 times). To speed it up, just temporarily turn off syntax highlighting with by typing <code>M-x set-variable RET nxml-syntax-highlight-flag RET nil RET</code>. Alternatively, use the dix.el function <code>C-c H</code> (<code>dix-toggle-syntax-highlighting</code>).
 
   
  +
==== keybindings ====
If your emacs doesn't turn on nxml-mode automatically when you open an xml-file, you can add the following line to your <code>~/.emacs</code> file:
 
  +
init-apertium.el also turns on the variable <code>nxml-sexp-element-flag</code>, which lets you use the following handy keys:
  +
* '''C-M-f''' to move forward one element (e.g. from &lt;e&gt; to &lt;/e&gt;)
  +
* '''C-M-b''' to move backward one element (e.g. from &lt;/e&gt; to &lt;e&gt;)
  +
* '''C-M-d''' to move into one element (e.g. from &lt;e&gt; to &lt;p&gt;)
  +
* '''M-S-d''' (meta-shift-d) to move into one element backwards (e.g. from after &lt;/e&gt; to after &lt;/p&gt;)
  +
* '''C-M-u''' to move out of one element (e.g. from &lt;p&gt; to &lt;e&gt;)
  +
* '''C-M-k''' to kill (cut) one element
  +
and <code>nxml-slash-auto-complete-flag</code> which lets you type
  +
* '''&lt;/''' to write the end tag of whatever element you're in (e.g. after typing &lt;e&gt;&lt;p&gt;…&lt;/p&gt;&lt;/, it'll complete with e&gt;)
   
  +
=== dix-mode ===
  +
[[Image:dix.el.png|thumb|300px|right|Screenshot of an older version of dix.el in Aquamacs (fullscreen). Upper left window has output from dix-view-pardef, lower left shows rng schema completion. There is a red underline since a <code>p</code> can't be an empty element, as noted by the message in the minibuffer]]
  +
  +
[https://github.com/unhammer/dix dix.el] is a minor mode under nxml-mode which gives some handy Apertium-related functions for XML editing. It is installed and turned on for the relevant file extensions by [[#Quickstart|init-apertium.el]].
  +
  +
There are some [http://www.youtube.com/playlist?list=PL99D23BDD6C7756E9 short screencasts here] showing off some usage.
  +
  +
I use [[Format dictionaries|Apertium-dixtools]]-formatted dix with one line per &lt;e&gt;, not all functions have been tested in more verbose formats, but I've tried to make the functions use XML-movements so mostly they should work no matter how you format your files.
  +
  +
When you open emacs (after adding the above lines to ~/.emacs) and load a .dix-file, you should see a menu named ''dix''. Most of the functions added by dix-mode are shown in this menu (which also shows their keyboard shortcuts). Hovering over a menu-item might give a little popup-help. The ''Help for dix-mode'' entry will show all the user functions defined by dix-mode. The keyboard shortcuts are in general a lot more useful than the menu bar, which is mostly there in case you forget which buttons to press... ''Remember: C is Control, S is Shift, M is alt'' (well, M stands for Meta, but that's typically alt).
  +
  +
  +
Some useful functions in dix-mode:
  +
  +
* Movement and editing:
  +
** The '''space bar''' inserts a &lt;b/&gt; in &lt;r&gt;, &lt;l&gt; or &lt;i&gt; elements; a <code>_</code> in par/pardef names; otherwise a plain space. This [https://asciinema.org/a/26142 works with the . (repeat) command] as well, if you use the vim keybindings.
  +
** '''M-n''' and '''M-p''' move to the next and previous "important bits" of &lt;e&gt;-elements (just try it!).
  +
  +
* Copying elements and adding restrictions:
  +
** '''C-c C''' just creates a copy of the current &lt;e&gt; element, putting it below the current one
  +
** '''C-c L''' and '''C-c R''' also make a copy of the current &lt;e&gt; element, but with an LR or RL restriction
  +
** '''C-TAB''' cycles between the restriction possibilities LR, RL or none for the current &lt;e&gt; element
  +
** '''C-S-TAB''', used with elements that have the slr/srl attribute, will swap the sense translation of this &lt;e&gt; with the &lt;e&gt; above
  +
  +
* Creating elements from plain text:
  +
** '''C-c g''' in a monodix guesses the pardef for a word based on the suffix. Write a word in the bottom of a dix files, place point somewhere in the middle of the word, and hit C-c g, it'll try to find words earlier in the file that have the same ending (characters after point)
  +
*** [http://www.youtube.com/watch?v=OrmSahK_5Gk&hd=1&list=SP99D23BDD6C7756E9 screencast]
  +
** '''C-c x''' in a monodix or bidix turns a word-list into &lt;e&gt; entries using the above &lt;e&gt; entry as a template. Words should be written one per line. You can use it in a bidix by writing the left-side, then a colon (:) then the right-side. ''Assumes that the entry used as a template is written all on one line.''
  +
*** [http://www.youtube.com/watch?v=OPaFn8mBDfg&hd=1&list=SP99D23BDD6C7756E9 screencast]
  +
  +
* Pardef viewing and manipulation:
  +
** '''C-c G''' will go to the pardef of the nearest &lt;par&gt;
  +
*** the place you left is saved in the standard emacs fashion, so you can go back by pressing '''C-u C-SPACE'''
  +
** '''C-c V''' will show the pardef of the nearest &lt;par&gt; in another window
  +
** '''C-c S''' will sort a pardef by its right-hand-side, &lt;r&gt;.
  +
*** You can also do '''M-x dix-sort-e-by-l''' to sort the selected &lt;e;&gt; elements by the contents of their &lt;l&gt; element
  +
** '''C-c D''' (in a pardef or an &lt;e&gt;) will print a list of all pardefs which have the same suffixes as this one (where a 'suffix' is the contents of an &lt;l&gt;-element), useful for finding duplicates. Note: it ignores the tags
  +
** Inside a pardef, '''C-c A''' shows all usages of that pardef within the dictionaries represented by the variable `dix-dixfiles'
  +
  +
  +
  +
''Note: capital letters means you have to press shift.'' If you fancy other keyboard shortcuts, copy the relevant <code>define-key</code> entries from the bottom of <code>dix.el</code>, put them in your ~/.emacs, e.g. to add '''F12''' as an alternative to '''C-c V''':
 
<pre>
 
<pre>
(add-to-list 'auto-mode-alist '("\\.dix\\'" . nxml-mode))
+
(add-hook 'dix-mode-hook (lambda nil (define-key dix-mode-map (kbd "<f12>") 'dix-view-pardef)))
 
</pre>
 
</pre>
  +
(the whole add-hook thing is needed since dix-mode is not loaded until the first .dix-file is loaded)
   
If your emacs doesn't even come with nxml-mode, download nxml-mode-20030901.tar.gz (or whatever the newest version is) from http://www.thaiopensource.com/download/, extract somewhere, and add the following to your <code>.emacs</code> file:
 
   
  +
Also, if you like having all &lt;i&gt; elements aligned at eg. column 25, select a region and do '''M-x align''' to achieve that (this also aligns &lt;p&gt; to 10 and &lt;r&gt; to 44, for bidix). These numbers are customizable with '''M-x customize-group RET dix'''. (Ie. there's no extra indentation function, but then, nxml already has that.)
  +
  +
==== dix-mode for transfer rules ====
  +
[[Image:Dix-mode_transfer_rule_number.jpg|thumb|300px|right|Useful in transfer mode too!]]
  +
There are some transfer-specific functions in dix-mode that make it worth turning on in transfer mode files too, e.g. '''C-c n''', which lets you enter a rule number to go to (useful when tracing with <code>apertium-transfer -t</code>). The .emacs in the Quickstart section will turn on nxml-mode and dix-mode in transfer files (ie. all files with the suffix .t1x, .t2x, .t3x, etc.).
  +
  +
'''M-n''' and '''M-p''' (go to next/previous useful position) should also Do What You Mean in transfer files.
  +
  +
  +
<br style="clear:both" />
  +
  +
=== Validation (Relax NG-schemas) ===
  +
  +
nxml-mode uses Compact Relax NG schemas (<code>.rnc</code> files) for validation (without these, XML is only checked for well-formedness by nxml-mode).
  +
  +
dix.el should find the .rnc's installed by lttoolbox/apertium if you've installed with packages (or even with "sudo make install" to /usr/local). If not, you'll have to copy the schemas.xml included with https://github.com/unhammer/dix into some folder, editing the paths to .rnc's in there, and put <code>(add-to-list 'rng-schema-locating-files "/path/to/your/schemas.xml")</code> in your ~/.emacs.d/init.el.
  +
  +
You can toggle validation using the XML menu at the top of the screen, or the keyboard shortcut <code>C-c C-v</code>.
  +
  +
It can be a bit slow with big files; a
  +
<pre>(add-hook 'nxml-mode-hook (lambda () (rng-validate-mode 0)) 'append)</pre>
  +
will turn it off by default (or just do <code>C-c C-v</code> to turn it off once).
  +
  +
Validation can also provide "intelligent" tab completion of elements and attributes (see the function <code>nxml-complete</code>).
  +
  +
=== Linting with flycheck ===
  +
The package https://github.com/unhammer/flycheck-apertium/ (installed by [[#Quickstart|init-apertium.el]]) gives on-the-fly linting of dix files and transfer files.
  +
  +
For dix files, it assumes /usr/share/lttoolbox/dix.xsd exists (ie. you've installed lttoolbox from packages.
  +
  +
For transfer files, it assumes you've got https://github.com/ggm/vm-for-transfer-cpp compiled and installed to your $PATH; this gives some extra info on transfer errors. If you've got the binary somewhere outside your $PATH, set it like this:
 
<pre>
 
<pre>
  +
(setq flycheck-apertium-transfervm-executable "/home/me/src/vm-for-transfer-cpp/apertium-compile-transfer")
(load "/path/to/nxml-mode-20030901/rng-auto.el") ; full path to the _file_ rng-auto.el which you just extracted
 
 
</pre>
 
</pre>
  +
Note that the line numbers given by transfervm are at the end of the matching rule, not always at the exact line where the error occurred. But it's better than segfaults.
  +
  +
If any of the above files don't exist, the checker will just silently not run.
   
== dix-mode ==
+
=== Yasnippet ===
  +
[https://github.com/capitaomorte/yasnippet/ Yasnippet] is a snippet-expansion package for Emacs. It lets you write boilerplate faster.
[[Image:dix.el.png|thumb|300px|right|Screenshot of dix.el in Aquamacs (fullscreen). Upper left window has output from dix-view-pardef, lower left shows rng schema completion. There is a red underline since a <code>p</code> can't be an empty element, as noted by the message in the minibuffer]]
 
  +
This section shows how to use the snippets made for dix-mode. There's a short screencast of it at https://asciinema.org/a/11192
   
  +
To use, install yasnippet by doing <code>M-x package-refresh-contents</code> and <code>M-x package-install RET yasnippet RET</code> (assuming you've added [http://melpa.milkbox.net/#/getting-started melpa] to your package-archives; this happens automatically when you add [[#Quickstart|init-apertium.el]]).
In svn there is a minor mode for editing .dix files, [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-tools/dix.el dix.el] (or use <code>svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-tools</code>). It needs nxml-mode (see above).
 
   
  +
Then put this into ~/.emacs.d/init.el to make the snippets available in dix-mode:
Put the following in your <code>~/.emacs</code> file to use it:
 
 
<pre>
 
<pre>
  +
(eval-after-load 'yasnippet
(add-to-list 'load-path "/path/to/dix.el-folder") ; ie. path to the _folder_ containing dix.el
 
  +
'(progn
(autoload 'dix-mode "dix"
 
  +
(setq yas-verbosity 1)
"dix-mode is a minor mode for editing Apertium XML dictionary files." t)
 
  +
(yas-reload-all)
(add-hook 'nxml-mode-hook
 
  +
(remhash 'nxml-mode yas--tables) ; until https://github.com/AndreaCrotti/yasnippet-snippets/issues/41 is solved
(lambda () (and buffer-file-name
 
  +
(add-to-list 'yas-key-syntaxes 'dix-yas-skip-backwards-to-key)
(string-match "\\.dix$" buffer-file-name)
 
  +
; The default is to use a point-and-click menu when there are several choices, I prefer ido:
(dix-mode 1))))
 
  +
(setq yas-prompt-functions '(yas-ido-prompt yas-completing-prompt yas-dropdown-prompt yas-no-prompt))
  +
))
  +
(add-hook 'dix-mode-hook 'yas-minor-mode)
 
</pre>
 
</pre>
   
  +
== C++ ==
I use [[Format dictionaries|Apertium-dixtools]]-formatted dix, not all functions have been tested in the regular format.
 
  +
See [[Emacs C style for Apertium hacking]].
   
  +
== HFST ==
''Note: there's now a menu-bar, if you forget the keyboard shortcuts :-)''
 
  +
[[Image:Hfst-mode.png|thumb|300px|right|Screenshot of syntax highlighting in hfst-mode.el]]
   
  +
* [https://github.com/unhammer/hfst-mode HFST-mode] is for editing the lexc and twol files used with [[HFST]]
The minor mode adds keyboard shortcuts <code>C-c L</code> and <code>C-c R</code> which make LR or RL restricted copies of &lt;e&gt;'s (use <code>C-TAB</code> to cycle between restriction possibilities LR, RL or none, <code>C-c C</code> creates a copy without modifying restrictions), <code>C-c G</code> which finds the pardef of a dictionary entry (and lets you go back with <code>C-u C-SPC</code>) and <code>C-c S</code> which sorts a pardef by its right-hand-side &lt;r&gt;. <code>M-n</code> and <code>M-p</code> move to the next and previous "important bits" of &lt;e&gt;-elements (just try it!). Inside a pardef, <code>C-c A</code> shows all usages of that pardef within the dictionaries represented by the variable `dix-dixfiles', while <code>C-c D</code> gives you a list of all pardefs which use these suffixes (where a suffix is the contents of an &lt;l&gt;-element). The space bar inserts a &lt;b/&gt; in &lt;r&gt;, &lt;l&gt; or &lt;i&gt; elements (o/w a regular space).
 
  +
* [http://www.cis.uni-muenchen.de/~wastl/emacs/sfst.el SFST-mode] is for editing [[SFST]] files
   
  +
HFST-mode provides go-to-lexicon on M-. (and back with M-,), and will syntax highlight occurrences of Multichar_Symbols in lexicons, so you can easily tell if you've mistyped or forgotten to define a symbol.
Also, if you like having all &lt;i&gt; elements aligned at eg. column 25, the minor mode lets you do <code>M-x align</code> on a region to achieve that, and also aligns &lt;p&gt; to 10 and &lt;r&gt; to 44 (for bidix). These numbers are customizable with <code>M-x customize-group RET dix</code>. (Ie. there's no extra indentation function, but then nxml already has that.)
 
   
== Validation (Relax NG-schemas) ==
 
nxml-mode uses compact Relax NG schemas for validation (without these, XML is only checked for well-formedness by nxml-mode).
 
   
  +
<br style="clear:both" />
(There is a non-compact [http://www.student.uib.no/~kun041/doc/dix.rng dix.rng here], while transfer.rng and modes.rng are in trunk/apertium/apertium.)
 
   
  +
== CG ==
You can make compact Relax NG schemas (<code>.rnc</code>) using [http://www.thaiopensource.com/relaxng/trang.html trang]. Use a script like this to keep all your rnc's up-to-date:
 
  +
[[Image:cg.el.cg-output.screenshot.png|thumb|300px|right|Screenshot after running the grammar ('''C-c C-c''') on an rlx file in cg.el]]
   
  +
There is a [https://github.com/GrammarSoft/cg3/blob/main/emacs/cg.el CG-mode for emacs] in the vislcg3 repository (see [[Constraint Grammar]]). It's installed by [[#Quickstart|init-apertium.el]].
cd /path/to/trunk/apertium/apertium
 
for DTD in `ls *.dtd`; do
 
OUT=`echo $DTD | sed 's/dtd$/rnc/'`;
 
CMD="java -jar /path/to/trang.jar $DTD $OUT"
 
echo $CMD
 
eval $CMD
 
done
 
   
Note: if you want to auto-complete using the schema (keyboard shortcut: C-RET), you should have <code>(add-to-list 'nxml-completion-hook 'rng-complete)</code> somewhere in your <code>~/.emacs</code>.
 
   
  +
You can use '''C-;''' (alternative keybinding '''M-#''') to quickly comment/uncomment a rule ([https://asciinema.org/a/25236 quick demo]). '''C-M-a/e''' move back and forth full rules (alternatively, '''M-a/e''' moves back/forth by "sentences" which includes commented rules).
You can toggle validation using the XML menu at the top of the screen, or the keyboard shortcut <code>C-c C-v</code>.
 
   
== See also ==
 
* [[Emacs C style for Apertium hacking]]
 
* [http://www.cis.uni-muenchen.de/~wastl/emacs/sfst.el SFST/HFST mode for emacs] (see [[SFST]] and [[HFST]])
 
* [http://www.student.uib.no/~kun041/doc/cg.el CG-mode for emacs] (see [[Constraint Grammar]])
 
   
  +
If you want to test the CG while you're working on it from within Emacs, you can add a line like
  +
<pre>
  +
# -*- cg-pre-pipe: "apertium -d . nb-nn-morph|cg-conv -a 2>/dev/null" -*-
  +
</pre>
  +
to the top of your CG file (replace <code>nb-nn-morph</code> for whatever mode that runs everything up until <code>cg-proc</code> in your regular mode, or just use something like <code>lt-proc some.automorf.bin|cg-conv -a 2>/dev/null</code>). Then close and re-open the file, and hit '''!''' when you're asked whether you approve of the command (you only have to do this once).
  +
  +
Now you can do '''C-c C-i''' to type in some test text, then '''C-c C-c''' (either in that buffer or in the CG buffer) to test the CG on the text. You can do '''C-c c''' to toggle if you want to test the text for every change you do (some might find that annoying). You can click REMOVE, SELECT, MAP, ADD etc. in the output to go to the corresponding line, or use '''C-c C-n''' / '''C-c C-p''' to go back and forth between occurrences (also works for warnings and compile errors).
  +
  +
[[Image:Cg.el.hiding.screenshot.png|thumb|300px|right|Screenshot after running the grammar ('''C-c C-c''') and hitting '''u''' to hide everything but analyses containing ''det dem'']]
  +
  +
If you have a lot of input sentences you want to test at once, you can hide all analyses, except ones matching some regex. Select the output buffer, then hit '''u''' and type in a regex for analyses you want to see (e.g. <code>vblex</code>, or <code>\b\(sg\|pl\)\b</code> to match pl or sg but not the string "place"). Now you should see only the wordforms in the output buffer, except for analyses containing your exceptions. Type '''h''' to toggle between a full view and hiding (click a word when hiding and press '''h''' to ensure you're scrolled into the analysis of that word). See also the variable <code>cg-sent-tag</code> which is used to keep linebreaks after certain tags; if you use a non-Apertium sentence tag you may want to put in your ~/.emacs something like <code>(setq cg-sent-tag "\\bpunct\\b")</code> (if your sentence tag was <code>punct</code>).
  +
  +
[[Image:Cg-mode-hide-unhide.gif|Demo of hide/unhide (h/u) in cg-mode output, rule trace and editing]]
  +
  +
  +
There is also error underlining (using the builtin flymake mode); the following is a plain emacs setup using only [unhammer.org/cg-init.el this init file], and the toolbar enabled for buttons:
  +
  +
[[Image:Cg-flymake.gif|Demo error underlining and input example testing]]
  +
  +
  +
<br style="clear:both" />
  +
  +
== IRC ==
  +
Do <code>M-x erc</code> to start the [[IRC]] client. See http://www.emacswiki.org/emacs/ErcBasics and http://emacs-fu.blogspot.com/2009/06/erc-emacs-irc-client.html for more info.
  +
  +
== See also ==
  +
* [http://www.emacswiki.org/emacs/ZenCoding ZenCoding] lets you type <code>section#main>e*2</code> and it turns it into the full <pre><section id="main"><e></e><e></e></section></pre>, etc.
  +
* [http://www.emacswiki.org/emacs/Yasnippet YASnippet] is a template system ([http://www.youtube.com/watch?v=76Ygeg9miao automatically expand abbreviations])
  +
* [[Text Editors Compatible With Different Scripts]] about RTL, bidi support
   
 
[[Category: Writing dictionaries]]
 
[[Category: Writing dictionaries]]
 
[[Category: Development]]
 
[[Category: Development]]
  +
[[Category:Documentation in English]]

Latest revision as of 12:46, 23 March 2022

Info on using Emacs for Apertium-related tasks.

Quickstart[edit]

There is an init file in git that will give your Emacs some useful Apertium-related packages and settings, including:

  • dix-mode, for XML dictionary and transfer editing
  • cg-mode, for Constraint Grammar rule editing and testing
  • hfst-mode, for lexc/twol syntax highlighting
  • tab-completion

To get that set up, simply

mkdir -p ~/.emacs.d/
curl https://raw.githubusercontent.com/unhammer/dix/master/init-apertium.el > ~/.emacs.d/init-apertium.el
echo '(load "~/.emacs.d/init-apertium.el")' >> ~/.emacs.d/init.el
emacs

The last line starts up Emacs, which will download the new packages since it's the first startup. (The next startups will be much faster.)

If you ever want to update your installed Emacs packages, you do M-x list-packages, then U x.


The rest of this page gives some documentation of the various modes.

Mac OS X[edit]

If you're on Mac, the built-in emacs is ancient. Don't use that. Instead, get https://emacsformacosx.com/

You can make an alias to start this emacs from the command line with e.g.

alias em="open -a /Applications/Emacs.app"

(If you prefer having non-GUI emacs, change that to alias em="open -a /Applications/Emacs.app -nw".)

Validation slow?[edit]

The above init-apertium.el turns on on-the-fly XML validation, which can be slow on old computers. If editing large .dix files seems too slow, try turning off one or both of the validators by putting

(add-hook 'nxml-mode-hook (lambda () (rng-validate-mode 0)) 'append)

in your ~/.emacs.d/init.el

XML (dix, transfer, …) editing[edit]

nxml-mode[edit]

Emacs has a nice xml editing mode (included as of version 23) called nXML, with syntax highlighting, movement commands to navigate through the XML (out of, into, across elements, etc.). It also has validation, and can auto-complete using the XML schema if a schema file is available.

init-apertium.el turns on nxml-mode for the common Apertium XML file extensions.

keybindings[edit]

init-apertium.el also turns on the variable nxml-sexp-element-flag, which lets you use the following handy keys:

  • C-M-f to move forward one element (e.g. from <e> to </e>)
  • C-M-b to move backward one element (e.g. from </e> to <e>)
  • C-M-d to move into one element (e.g. from <e> to <p>)
  • M-S-d (meta-shift-d) to move into one element backwards (e.g. from after </e> to after </p>)
  • C-M-u to move out of one element (e.g. from <p> to <e>)
  • C-M-k to kill (cut) one element

and nxml-slash-auto-complete-flag which lets you type

  • </ to write the end tag of whatever element you're in (e.g. after typing <e><p>…</p></, it'll complete with e>)

dix-mode[edit]

Screenshot of an older version of dix.el in Aquamacs (fullscreen). Upper left window has output from dix-view-pardef, lower left shows rng schema completion. There is a red underline since a p can't be an empty element, as noted by the message in the minibuffer

dix.el is a minor mode under nxml-mode which gives some handy Apertium-related functions for XML editing. It is installed and turned on for the relevant file extensions by init-apertium.el.

There are some short screencasts here showing off some usage.

I use Apertium-dixtools-formatted dix with one line per <e>, not all functions have been tested in more verbose formats, but I've tried to make the functions use XML-movements so mostly they should work no matter how you format your files.

When you open emacs (after adding the above lines to ~/.emacs) and load a .dix-file, you should see a menu named dix. Most of the functions added by dix-mode are shown in this menu (which also shows their keyboard shortcuts). Hovering over a menu-item might give a little popup-help. The Help for dix-mode entry will show all the user functions defined by dix-mode. The keyboard shortcuts are in general a lot more useful than the menu bar, which is mostly there in case you forget which buttons to press... Remember: C is Control, S is Shift, M is alt (well, M stands for Meta, but that's typically alt).


Some useful functions in dix-mode:

  • Movement and editing:
    • The space bar inserts a <b/> in <r>, <l> or <i> elements; a _ in par/pardef names; otherwise a plain space. This works with the . (repeat) command as well, if you use the vim keybindings.
    • M-n and M-p move to the next and previous "important bits" of <e>-elements (just try it!).
  • Copying elements and adding restrictions:
    • C-c C just creates a copy of the current <e> element, putting it below the current one
    • C-c L and C-c R also make a copy of the current <e> element, but with an LR or RL restriction
    • C-TAB cycles between the restriction possibilities LR, RL or none for the current <e> element
    • C-S-TAB, used with elements that have the slr/srl attribute, will swap the sense translation of this <e> with the <e> above
  • Creating elements from plain text:
    • C-c g in a monodix guesses the pardef for a word based on the suffix. Write a word in the bottom of a dix files, place point somewhere in the middle of the word, and hit C-c g, it'll try to find words earlier in the file that have the same ending (characters after point)
    • C-c x in a monodix or bidix turns a word-list into <e> entries using the above <e> entry as a template. Words should be written one per line. You can use it in a bidix by writing the left-side, then a colon (:) then the right-side. Assumes that the entry used as a template is written all on one line.
  • Pardef viewing and manipulation:
    • C-c G will go to the pardef of the nearest <par>
      • the place you left is saved in the standard emacs fashion, so you can go back by pressing C-u C-SPACE
    • C-c V will show the pardef of the nearest <par> in another window
    • C-c S will sort a pardef by its right-hand-side, <r>.
      • You can also do M-x dix-sort-e-by-l to sort the selected <e;> elements by the contents of their <l> element
    • C-c D (in a pardef or an <e>) will print a list of all pardefs which have the same suffixes as this one (where a 'suffix' is the contents of an <l>-element), useful for finding duplicates. Note: it ignores the tags
    • Inside a pardef, C-c A shows all usages of that pardef within the dictionaries represented by the variable `dix-dixfiles'


Note: capital letters means you have to press shift. If you fancy other keyboard shortcuts, copy the relevant define-key entries from the bottom of dix.el, put them in your ~/.emacs, e.g. to add F12 as an alternative to C-c V:

(add-hook 'dix-mode-hook (lambda nil (define-key dix-mode-map (kbd "<f12>") 'dix-view-pardef)))

(the whole add-hook thing is needed since dix-mode is not loaded until the first .dix-file is loaded)


Also, if you like having all <i> elements aligned at eg. column 25, select a region and do M-x align to achieve that (this also aligns <p> to 10 and <r> to 44, for bidix). These numbers are customizable with M-x customize-group RET dix. (Ie. there's no extra indentation function, but then, nxml already has that.)

dix-mode for transfer rules[edit]

Useful in transfer mode too!

There are some transfer-specific functions in dix-mode that make it worth turning on in transfer mode files too, e.g. C-c n, which lets you enter a rule number to go to (useful when tracing with apertium-transfer -t). The .emacs in the Quickstart section will turn on nxml-mode and dix-mode in transfer files (ie. all files with the suffix .t1x, .t2x, .t3x, etc.).

M-n and M-p (go to next/previous useful position) should also Do What You Mean in transfer files.



Validation (Relax NG-schemas)[edit]

nxml-mode uses Compact Relax NG schemas (.rnc files) for validation (without these, XML is only checked for well-formedness by nxml-mode).

dix.el should find the .rnc's installed by lttoolbox/apertium if you've installed with packages (or even with "sudo make install" to /usr/local). If not, you'll have to copy the schemas.xml included with https://github.com/unhammer/dix into some folder, editing the paths to .rnc's in there, and put (add-to-list 'rng-schema-locating-files "/path/to/your/schemas.xml") in your ~/.emacs.d/init.el.

You can toggle validation using the XML menu at the top of the screen, or the keyboard shortcut C-c C-v.

It can be a bit slow with big files; a

(add-hook 'nxml-mode-hook (lambda () (rng-validate-mode 0)) 'append)

will turn it off by default (or just do C-c C-v to turn it off once).

Validation can also provide "intelligent" tab completion of elements and attributes (see the function nxml-complete).

Linting with flycheck[edit]

The package https://github.com/unhammer/flycheck-apertium/ (installed by init-apertium.el) gives on-the-fly linting of dix files and transfer files.

For dix files, it assumes /usr/share/lttoolbox/dix.xsd exists (ie. you've installed lttoolbox from packages.

For transfer files, it assumes you've got https://github.com/ggm/vm-for-transfer-cpp compiled and installed to your $PATH; this gives some extra info on transfer errors. If you've got the binary somewhere outside your $PATH, set it like this:

  (setq flycheck-apertium-transfervm-executable  "/home/me/src/vm-for-transfer-cpp/apertium-compile-transfer")

Note that the line numbers given by transfervm are at the end of the matching rule, not always at the exact line where the error occurred. But it's better than segfaults.

If any of the above files don't exist, the checker will just silently not run.

Yasnippet[edit]

Yasnippet is a snippet-expansion package for Emacs. It lets you write boilerplate faster. This section shows how to use the snippets made for dix-mode. There's a short screencast of it at https://asciinema.org/a/11192

To use, install yasnippet by doing M-x package-refresh-contents and M-x package-install RET yasnippet RET (assuming you've added melpa to your package-archives; this happens automatically when you add init-apertium.el).

Then put this into ~/.emacs.d/init.el to make the snippets available in dix-mode:

(eval-after-load 'yasnippet
  '(progn
     (setq yas-verbosity 1)
     (yas-reload-all)
     (remhash 'nxml-mode yas--tables) ; until https://github.com/AndreaCrotti/yasnippet-snippets/issues/41 is solved
     (add-to-list 'yas-key-syntaxes 'dix-yas-skip-backwards-to-key)
     ; The default is to use a point-and-click menu when there are several choices, I prefer ido:
     (setq yas-prompt-functions '(yas-ido-prompt yas-completing-prompt yas-dropdown-prompt yas-no-prompt))
  ))
(add-hook 'dix-mode-hook 'yas-minor-mode)

C++[edit]

See Emacs C style for Apertium hacking.

HFST[edit]

Screenshot of syntax highlighting in hfst-mode.el

HFST-mode provides go-to-lexicon on M-. (and back with M-,), and will syntax highlight occurrences of Multichar_Symbols in lexicons, so you can easily tell if you've mistyped or forgotten to define a symbol.



CG[edit]

Screenshot after running the grammar (C-c C-c) on an rlx file in cg.el

There is a CG-mode for emacs in the vislcg3 repository (see Constraint Grammar). It's installed by init-apertium.el.


You can use C-; (alternative keybinding M-#) to quickly comment/uncomment a rule (quick demo). C-M-a/e move back and forth full rules (alternatively, M-a/e moves back/forth by "sentences" which includes commented rules).


If you want to test the CG while you're working on it from within Emacs, you can add a line like

# -*- cg-pre-pipe: "apertium -d . nb-nn-morph|cg-conv -a 2>/dev/null" -*-

to the top of your CG file (replace nb-nn-morph for whatever mode that runs everything up until cg-proc in your regular mode, or just use something like lt-proc some.automorf.bin|cg-conv -a 2>/dev/null). Then close and re-open the file, and hit ! when you're asked whether you approve of the command (you only have to do this once).

Now you can do C-c C-i to type in some test text, then C-c C-c (either in that buffer or in the CG buffer) to test the CG on the text. You can do C-c c to toggle if you want to test the text for every change you do (some might find that annoying). You can click REMOVE, SELECT, MAP, ADD etc. in the output to go to the corresponding line, or use C-c C-n / C-c C-p to go back and forth between occurrences (also works for warnings and compile errors).

Screenshot after running the grammar (C-c C-c) and hitting u to hide everything but analyses containing det dem

If you have a lot of input sentences you want to test at once, you can hide all analyses, except ones matching some regex. Select the output buffer, then hit u and type in a regex for analyses you want to see (e.g. vblex, or \b\(sg\|pl\)\b to match pl or sg but not the string "place"). Now you should see only the wordforms in the output buffer, except for analyses containing your exceptions. Type h to toggle between a full view and hiding (click a word when hiding and press h to ensure you're scrolled into the analysis of that word). See also the variable cg-sent-tag which is used to keep linebreaks after certain tags; if you use a non-Apertium sentence tag you may want to put in your ~/.emacs something like (setq cg-sent-tag "\\bpunct\\b") (if your sentence tag was punct).

Demo of hide/unhide (h/u) in cg-mode output, rule trace and editing


There is also error underlining (using the builtin flymake mode); the following is a plain emacs setup using only [unhammer.org/cg-init.el this init file], and the toolbar enabled for buttons:

Demo error underlining and input example testing



IRC[edit]

Do M-x erc to start the IRC client. See http://www.emacswiki.org/emacs/ErcBasics and http://emacs-fu.blogspot.com/2009/06/erc-emacs-irc-client.html for more info.

See also[edit]