Emacs

From Apertium
Revision as of 17:39, 4 May 2012 by Unhammer (talk | contribs)
Jump to navigation Jump to search

Info on using Emacs for Apertium-related tasks.

dix and transfer

Quickstart for non-emacs users

If you just want to get emacs set up for dix editing with the minimum of hassle, here is a howto. This assumes you have emacs version 23 or higher installed (but see discussion page if you're stuck with an old version). First execute (paste) the following commands in your terminal:

mkdir -p ~/.emacs.d
cd ~/.emacs.d
wget -O dix.el http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/apertium-tools/dix.el
cd ..
touch ~/.emacs

Then open the file ~/.emacs in an editor (like vi) and enter the following:

; Start of dix-mode setup
(add-to-list 'load-path "~/.emacs.d") ; path to the folder where you have dix.el
(autoload 'dix-mode "dix" 
   "dix-mode is a minor mode for editing Apertium XML dictionary files."  t)

(add-to-list 'auto-mode-alist '("\\.dix\\'" . nxml-mode)) ; turn on nxml-mode for dix-files
(add-to-list 'auto-mode-alist '("\\.t[0-9]x\\'" . nxml-mode)) ; turn on nxml-mode for transfer files
(add-hook 'nxml-mode-hook               ; turn on dix-mode for transfer or dix-files after nxml-mode
 	  (lambda () (and buffer-file-name
 			  (string-match "\\.\\(dix\\|t[0-9]x\\)$" buffer-file-name)
 			  (dix-mode 1))))

; turn on schema-based completion with C-RET:
(if (boundp 'nxml-completion-hook)
    (add-to-list 'nxml-completion-hook 'rng-complete)
  (setq nxml-completion-hook '(rng-complete)))

 ; Start of CUA mode setup - to make Emacs behave like other editors - see http://www.emacswiki.org/CuaMode
(cua-mode t)
(setq cua-auto-tabify-rectangles nil) ; Don't tabify after rectangle commands
(setq cua-keep-region-after-copy t) ; Standard Windows behaviour

See also the Validation quickstart for auto-validation and schema-based completion.

nxml-mode

Emacs has a nice xml editing mode called nXML, with syntax highlighting, movement commands to navigate through the XML (out of, into, across elements, etc.). It also has validation, and can auto-complete using the XML schema if a schema file is available.

If your emacs doesn't turn on nxml-mode automatically when you open an xml-file, you can add the following line to your ~/.emacs file:

(add-to-list 'auto-mode-alist '("\\.dix\\'" . nxml-mode))

Emacs 23 or newer includes nxml-mode, but if your version of emacs doesn't: download nxml-mode-20041004.tar.gz (or whatever the newest version is) from http://www.thaiopensource.com/download/, extract somewhere, and add the following to your .emacs file:

  (load "/path/to/nxml-mode-20041004/rng-auto.el") ; full path to the _file_ rng-auto.el which you just extracted

dix-mode

Screenshot of an older version of dix.el in Aquamacs (fullscreen). Upper left window has output from dix-view-pardef, lower left shows rng schema completion. There is a red underline since a p can't be an empty element, as noted by the message in the minibuffer

In svn there is a minor mode for editing .dix files, dix.el (or use svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-tools). It needs nxml-mode (see above, installed by default in emacs version 23 or above). There are some short screencasts here.

Put the following in your ~/.emacs file to use it:

 (add-to-list 'load-path "/path/to/dix.el-folder") ; ie. path to the _folder_ containing dix.el
 (autoload 'dix-mode "dix" 
   "dix-mode is a minor mode for editing Apertium XML dictionary files."  t)
 (add-hook 'nxml-mode-hook
 	  (lambda () (and buffer-file-name
 			  (string-match "\\.dix$" buffer-file-name)
 			  (dix-mode 1))))

I use Apertium-dixtools-formatted dix, not all functions have been tested in the regular format, but I've tried to make the functions use XML-movements so mostly they should work no matter how you format your files.

When you open emacs (after adding the above lines to ~/.emacs) and load a .dix-file, you should see a menu named dix. Most of the functions added by dix-mode are shown in this menu (which also shows their keyboard shortcuts). Hovering over a menu-item might give a little popup-help. The Help for dix-mode entry will show all the user functions defined by dix-mode. The keyboard shortcuts are in general a lot more useful than the menu bar, which is mostly there in case you forget which buttons to press... Remember: C is Control, S is Shift, M is alt (well, M stands for Meta, but that's typically alt).


Some useful functions in dix-mode:

  • Movement and editing:
    • The space bar inserts a <b/> in <r>, <l> or <i> elements; a _ in par/pardef names; otherwise a plain space.
    • M-n and M-p move to the next and previous "important bits" of <e>-elements (just try it!).
  • Copying elements and adding restrictions:
    • C-c C just creates a copy of the current <e> element, putting it below the current one
    • C-c L and C-c R also make a copy of the current <e> element, but with an LR or RL restriction
    • C-TAB cycles between the restriction possibilities LR, RL or none for the current <e> element
    • C-S-TAB, used with elements that have the slr/srl attribute, will swap the sense translation of this <e> with the <e> above
  • Creating elements from plain text:
    • C-c g in a monodix guesses the pardef for a word based on the suffix. Write a word in the bottom of a dix files, place point somewhere in the middle of the word, and hit C-c g, it'll try to find words earlier in the file that have the same ending (characters after point)
    • C-c x in a monodix or bidix turns a word-list into <e> entries using the above <e> entry as a template. Words should be written one per line. You can use it in a bidix by writing the left-side, then a colon (:) then the right-side. Assumes that the entry used as a template is written all on one line.
  • Pardef viewing and manipulation:
    • C-c G will go to the pardef of the nearest <par>
      • the place you left is saved in the standard emacs fashion, so you can go back by pressing C-u C-SPACE
    • C-c V will show the pardef of the nearest <par> in another window
    • C-c S will sort a pardef by its right-hand-side, <r>.
      • You can also do M-x dix-sort-e-by-l to sort the selected <e;> elements by the contents of their <l> element
    • C-c D (in a pardef or an <e>) will print a list of all pardefs which have the same suffixes as this one (where a 'suffix' is the contents of an <l>-element), useful for finding duplicates. Note: it ignores the tags
    • Inside a pardef, C-c A shows all usages of that pardef within the dictionaries represented by the variable `dix-dixfiles'


Note: capital letters means you have to press shift. If you fancy other keyboard shortcuts, copy the relevant define-key entries from the bottom of dix.el, put them in your ~/.emacs, e.g. to add F12 as an alternative to C-c V:

(add-hook 'dix-mode-hook (lambda nil (define-key dix-mode-map (kbd "<f12>") 'dix-view-pardef)))

(the whole add-hook thing is needed since dix-mode is not loaded until the first .dix-file is loaded)


Also, if you like having all <i> elements aligned at eg. column 25, select a region and do M-x align to achieve that (this also aligns <p> to 10 and <r> to 44, for bidix). These numbers are customizable with M-x customize-group RET dix. (Ie. there's no extra indentation function, but then, nxml already has that.)

dix-mode for transfer rules

Useful in transfer mode too!

There are some transfer-specific functions in dix-mode that make it worth turning on in transfer mode files too, e.g. C-c n, which lets you enter a rule number to go to (useful when tracing with apertium-transfer -t). The .emacs in the Quickstart section will turn on nxml-mode and dix-mode in transfer files (ie. all files with the suffix .t1x, .t2x, .t3x, etc.).

M-n and M-p (go to next/previous useful position) should also Do What You Mean in transfer files.

Validation (Relax NG-schemas)

Validation quickstart

Download and extract trang by executing (pasting) this into your terminal:

cd
wget http://jing-trang.googlecode.com/files/trang-20091111.zip
unzip trang-20091111.zip

The copy this script to a file like "makeschema.sh", making sure to set APERTIUMSRC to the folder containing the apertium source, and TRANGJAR to the path to the trang jar-file you just extracted:

#!/bin/bash

## Set these to the correct paths:
APERTIUMSRC="$HOME/apertium-svn/trunk/apertium"
TRANGJAR="$HOME/trang-20091111/trang.jar"
SCHEMAFILE=~/.emacs.d/schemas.xml
# Change SCHEMAFILE if you want to put your schema locating file somewhere else.
# Note: this path can't have quotes around it for some reason

## No changes needed below

echo "Creating ${SCHEMAFILE}"
cat > ${SCHEMAFILE} <<EOF
<?xml version="1.0"?>
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
  <typeId id="dix" uri="${APERTIUMSRC}/apertium/dix.rnc"/>
  <typeId id="transfer" uri="${APERTIUMSRC}/apertium/transfer.rnc"/>
  <typeId id="interchunk" uri="${APERTIUMSRC}/apertium/interchunk.rnc"/>
  <typeId id="postchunk" uri="${APERTIUMSRC}/apertium/postchunk.rnc"/>
  <typeId id="format" uri="${APERTIUMSRC}/apertium/format.rnc"/>
  <typeId id="tagger" uri="${APERTIUMSRC}/apertium/tagger.rnc"/>
  <typeId id="modes" uri="${APERTIUMSRC}/apertium/modes.rnc"/>

  <documentElement localName="dictionary" typeId="dix"/>
  <documentElement localName="transfer" typeId="transfer"/>
  <documentElement localName="interchunk" typeId="interchunk"/>
  <documentElement localName="postchunk" typeId="postchunk"/>
  <documentElement localName="format" typeId="format"/>
  <documentElement localName="tagger" typeId="tagger"/>
  <documentElement localName="modes" typeId="modes"/>

  <uri pattern="*.dix" typeId="dix"/>
  <uri pattern="*.t1x" typeId="transfer"/>
  <uri pattern="*.t2x" typeId="interchunk"/>
  <uri pattern="*.t3x" typeId="interchunk"/>
  <!-- Some pairs have t3x as postchunk, others t4x or even t5x... but
       if one of the documentElement rules match, these rules are
       ignored since they're below them. -->
</locatingRules>
EOF

echo "Creating rnc files in ${APERTIUMSRC}/apertium"
cd ${APERTIUMSRC}/apertium || exit 1
for DTD in `ls *.dtd`; do
    OUT=`echo $DTD | sed 's/dtd$/rnc/'`;
    CMD="java -jar ${TRANGJAR} $DTD $OUT"
    echo $CMD
    eval $CMD
done

echo "Now inform nxml-mode about ${SCHEMAFILE} by appending this to ~/.emacs:"
cat <<EOF

(add-hook 'nxml-mode-hook
	  (lambda ()
	    (add-to-list 'rng-schema-locating-files "${SCHEMAFILE}")))
EOF

Run it like

sh makeschema.sh

and add the hook to your ~/.emacs as instructed by the script.

More about nxml validation

nxml-mode uses compact Relax NG schemas (.rnc files) for validation (without these, XML is only checked for well-formedness by nxml-mode).

You can make compact Relax NG schemas using trang, see the above script.

Note: if you want to auto-complete using the schema (keyboard shortcut: C-RET), you should have (add-to-list 'nxml-completion-hook 'rng-complete) somewhere in your ~/.emacs.

You can toggle validation using the XML menu at the top of the screen, or the keyboard shortcut C-c C-v.

See http://www.dpawson.co.uk/relaxng/nxml/schemaloc.html#d574e168 for how to write a schema.xml file to automatically find the right schema, or just use the quickstart script above.

C++

See Emacs C style for Apertium hacking.

HFST

CG

See also