Difference between revisions of "XML editors"

From Apertium
Jump to navigation Jump to search
m
 
(26 intermediate revisions by one other user not shown)
Line 1: Line 1:
Some XML editors used to edit Apertium language data (e.g. [[dix]] and [[transfer]] files):
+
If you are editing Apertium language data (e.g. [[dix]] and [[transfer]] files), you should use a real XML editor. These can show the errors as you type, so you won't have to parse the output of <code>make</code>.
  +
  +
There are two main kinds of XML errors:
  +
* '''well-formedness''' errors are things like missing a " or end-tag; most editors are able to catch these or at least show you that you forgot a " by syntax highlighting;
  +
* '''validation''' errors require some knowledge of what a .dix or .t1x file should look like, and may tell you things like that "you've referred to a pardef that hasn't been defined". Not all editors are able to do full validation.
  +
** Of the editors that use libxml2/xmllint for validation, there's a [https://sourceforge.net/p/apertium/tickets/69/ bug in libxml2] for files >65535 lines long. Programs like apertium-validate-dictionary are also affected.
  +
  +
  +
Some popular editors for XML – all the following can do XML syntax highlighting out of the box, which does indicate some well-formedness errors:
  +
* [[Apertium-viewer]] – an apertium-specific GUI for testing sentences throughout the pipeline, and editing source files (written in Java)
  +
** Does real XML validation with no setup required
  +
** Uses libxml2, so try to keep your validation errors below line 65535 :-)
 
* [[Gedit]] – a simple GUI code editor (written in C/Python)
  +
** <code>sudo apt-get install gedit gedit-developer-plugins</code>
  +
** There's an XML validation plugin at https://launchpad.net/gedit-xmltools but it only works with gedit2
  +
  +
* [http://xml-copy-editor.sourceforge.net/index.php?page=ubuntu XML Copy Editor] – a simple GUI editor (written in C++) purely meant for XML
  +
** <code>sudo apt-get install xmlcopyeditor</code>
  +
** XML Copy Editor will check well-formedness (that you have your brackets and quotes in place) out of the box
  +
** To get validation you may have to click XML→Associate→System DTD and select dix.dtd from lttoolbox (typically in /usr/local/share/lttoolbox or /usr/share/lttoolbox). This will insert a DOCTYPE line in your xml, but that's fine.
  +
** Uses libxml2, so try to keep your validation errors below line 65535 :-)
  +
 
* [https://en.wikipedia.org/wiki/JEdit Jedit] – a "programmers" GUI editor (written in Java) with frillions of options and menus
  +
** <code>sudo apt-get install jedit</code>
  +
** Install the plugin named "XML" to get validation (you can put <nowiki><code><!DOCTYPE dictionary SYSTEM "/usr/share/lttoolbox/dix.dtd"></code></nowiki> before the &lt;document&gt; element to make it use the dtd automatically).
  +
** There seems to be a bug that gives wrong line numbers on some validation errors (don't know if this uses libxml2 or something else)
  +
  +
* Kate – another "programmers" GUI editor (written in C++/Qt) from KDE
  +
** <code>sudo apt-get install kate</code>
  +
** Supposedly, enabling the XML validation plugin and putting <nowiki><code><!DOCTYPE dictionary SYSTEM "/usr/share/lttoolbox/dix.dtd"></code></nowiki> before the &lt;document&gt; element to make it use the DTD should catch validation errors, but it claims the document is valid when it is not.
   
* [https://en.wikipedia.org/wiki/Gedit Gedit] – a GUI editor (written in C/Python)
 
* [https://en.wikipedia.org/wiki/JEdit Jedit] – a GUI editor (written in Java)
 
 
* [[Vim]] – a lightweight, modal editor
 
* [[Vim]] – a lightweight, modal editor
  +
** Does syntax highlighting for XML out of the box if you have "syntax on" in your ~/.vimrc
  +
** See [[Vim]] for more information
  +
 
* [[Emacs]] – a self-documenting, extensible lisp machine
 
* [[Emacs]] – a self-documenting, extensible lisp machine
  +
** The built-in nxml-mode does syntax highlighting and well-formedness checking, see [[Emacs#nxml-mode]]
  +
** See [[Emacs#Validation_quickstart]] for how to make Emacs/nxml use the DTD's for validation
   
==See also==
 
* [[Easy dictionary maintenance]]
 
   
   
  +
==Converting DTD to XSD/RNC/RNG==
  +
An XML editor can check if you XML is well-formed (the brackets match up and so on), but to check for validity, you need to give it the schema for the file type you're editing. Some editors can read the DTD schemas in the lttoolbox/apertium directories, while some editors require other schema formats.
  +
  +
The java program "trang" can convert the dix and transfer DTD's to other formats like XSD, RNC or RNG, if your favourite editor doesn't support DTD's.
  +
  +
<pre>cd
  +
wget http://jing-trang.googlecode.com/files/trang-20091111.zip
  +
unzip trang-20091111.zip
  +
cd trang-20091111
  +
  +
java -jar trang.jar -I dtd -O xsd ~/src/apertium/trunk/lttoolbox/lttoolbox/dix.dtd ~/src/apertium/trunk/lttoolbox/lttoolbox/dix.xsd
  +
java -jar trang.jar -I dtd -O rng ~/src/apertium/trunk/lttoolbox/lttoolbox/dix.dtd ~/src/apertium/trunk/lttoolbox/lttoolbox/dix.rng
  +
java -jar trang.jar -I dtd -O rnc ~/src/apertium/trunk/lttoolbox/lttoolbox/dix.dtd ~/src/apertium/trunk/lttoolbox/lttoolbox/dix.rnc
  +
</pre>
  +
  +
Both <code>jedit</code> and <code>xmlcopyeditor</code> can also convert between DTD and other formats.
  +
 
==See also==
 
* [[Easy dictionary maintenance]] – GUI for editing dictionaries
  +
* [[Dixtools: Enhance]] – interactive tool to add new words to a dictionary (asks you for a word that inflects the same, finds the paradigm for you)
  +
* [[Text Editors Compatible With Different Scripts]] about bidi/RTL
  +
  +
[[Category:Writing dictionaries]]
 
[[Category:Tools]]
 
[[Category:Tools]]
  +
[[Category:Development]]

Latest revision as of 14:25, 29 December 2020

If you are editing Apertium language data (e.g. dix and transfer files), you should use a real XML editor. These can show the errors as you type, so you won't have to parse the output of make.

There are two main kinds of XML errors:

  • well-formedness errors are things like missing a " or end-tag; most editors are able to catch these or at least show you that you forgot a " by syntax highlighting;
  • validation errors require some knowledge of what a .dix or .t1x file should look like, and may tell you things like that "you've referred to a pardef that hasn't been defined". Not all editors are able to do full validation.
    • Of the editors that use libxml2/xmllint for validation, there's a bug in libxml2 for files >65535 lines long. Programs like apertium-validate-dictionary are also affected.


Some popular editors for XML – all the following can do XML syntax highlighting out of the box, which does indicate some well-formedness errors:

  • Apertium-viewer – an apertium-specific GUI for testing sentences throughout the pipeline, and editing source files (written in Java)
    • Does real XML validation with no setup required
    • Uses libxml2, so try to keep your validation errors below line 65535 :-)
  • Gedit – a simple GUI code editor (written in C/Python)
  • XML Copy Editor – a simple GUI editor (written in C++) purely meant for XML
    • sudo apt-get install xmlcopyeditor
    • XML Copy Editor will check well-formedness (that you have your brackets and quotes in place) out of the box
    • To get validation you may have to click XML→Associate→System DTD and select dix.dtd from lttoolbox (typically in /usr/local/share/lttoolbox or /usr/share/lttoolbox). This will insert a DOCTYPE line in your xml, but that's fine.
    • Uses libxml2, so try to keep your validation errors below line 65535 :-)
  • Jedit – a "programmers" GUI editor (written in Java) with frillions of options and menus
    • sudo apt-get install jedit
    • Install the plugin named "XML" to get validation (you can put <code><!DOCTYPE dictionary SYSTEM "/usr/share/lttoolbox/dix.dtd"></code> before the <document> element to make it use the dtd automatically).
    • There seems to be a bug that gives wrong line numbers on some validation errors (don't know if this uses libxml2 or something else)
  • Kate – another "programmers" GUI editor (written in C++/Qt) from KDE
    • sudo apt-get install kate
    • Supposedly, enabling the XML validation plugin and putting <code><!DOCTYPE dictionary SYSTEM "/usr/share/lttoolbox/dix.dtd"></code> before the <document> element to make it use the DTD should catch validation errors, but it claims the document is valid when it is not.
  • Vim – a lightweight, modal editor
    • Does syntax highlighting for XML out of the box if you have "syntax on" in your ~/.vimrc
    • See Vim for more information
  • Emacs – a self-documenting, extensible lisp machine


Converting DTD to XSD/RNC/RNG[edit]

An XML editor can check if you XML is well-formed (the brackets match up and so on), but to check for validity, you need to give it the schema for the file type you're editing. Some editors can read the DTD schemas in the lttoolbox/apertium directories, while some editors require other schema formats.

The java program "trang" can convert the dix and transfer DTD's to other formats like XSD, RNC or RNG, if your favourite editor doesn't support DTD's.

cd
wget http://jing-trang.googlecode.com/files/trang-20091111.zip
unzip trang-20091111.zip
cd trang-20091111

java -jar trang.jar -I dtd -O xsd ~/src/apertium/trunk/lttoolbox/lttoolbox/dix.dtd ~/src/apertium/trunk/lttoolbox/lttoolbox/dix.xsd
java -jar trang.jar -I dtd -O rng ~/src/apertium/trunk/lttoolbox/lttoolbox/dix.dtd ~/src/apertium/trunk/lttoolbox/lttoolbox/dix.rng
java -jar trang.jar -I dtd -O rnc ~/src/apertium/trunk/lttoolbox/lttoolbox/dix.dtd ~/src/apertium/trunk/lttoolbox/lttoolbox/dix.rnc

Both jedit and xmlcopyeditor can also convert between DTD and other formats.

See also[edit]