Difference between revisions of "User:Rcrowther"

From Apertium
Jump to navigation Jump to search
m
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
  +
[[Installation (français)|En français]]
  +
{{Main page header}}
   
  +
== To try Apertium ==
'''(proposed page 'Apertium workflow reference', or similar title)'''
 
  +
You can go online to the [https://apertium.org front page] :)
   
  +
There are several applications which work from the desktop without full installation. For these and more graphical user interfaces, services, plugins, etc. goto [[Tools]].
   
  +
If you would like install instructions for 'Apertium viewer', 'apy' (the Apertium server) etc. got to [[Tools]]. The install instructions can be found with the tool descriptions.
You can invent your own tags to pass information along.
 
   
Another example:
 
   
: "I may give only this advice"
 
   
  +
== For those who want to install Apertium locally, and developers==
If this is being translated to a Subject-Object-Verb language (English is a Subject-Verb-Object language, but many languages are not), it will need rearranging. If the wards are simple in grammar, they may be left as the morphological analysis has found them. More complex word sequences are gathered into a chunk and, if necessary, tagged,
 
  +
How to install Apertium core<ref>Apertium is a big system. There are many plugins, scripts, and extension projects. The core, the code which translates, is a multi-step set of tools joined by a stream format and, nowadays, invoked by scripts called 'modes'. You may also see the names 'lt-toolbox'/'lt-tools', 'apertium-lex-tools', and the simple title 'apertium'. These refer to groupings of the tools.
   
  +
Packaged or compiled, these tools can be installed as one unit. From here on, we call them 'Apertium core'.
: I <a verb>{may give only} <some object>{this advice}
 
  +
</ref> and language data on your system (developers may also want to consider their operating environment<ref>
  +
Apertium is written to be platform-independent. However, it can be difficult to maintain platform-independence over a project this wide. If you intend to do something deep with Apertium, you will gain more help from the tools if you use the [http://ubuntu.com Ubuntu], or a similar Debian-based, operating system.
   
  +
In no way does this mean that the Apertium project favours this platform.
The "may" and "only" words also need to be handled. They could be cut, which is also a job for this module. If they are not cut, then they will need to be tagged, so they can be re-ordered. To see what happens next, look at the next module, 'interchunk'.
 
  +
</ref>).
   
====Typical stream output====
 
From,
 
   
  +
===Installing: a summary===
: 'a poppy',
 
  +
Most people will need to,
   
  +
====Install Apertium Core by packaging/virtual environment====
The curly brackets are the stream representation for a 'chunk',
 
  +
* Linux systems: [[Install Apertium core using packaging]]
  +
* Windows and Apple systems: [[Apertium VirtualBox]]
   
  +
==== For translators: Install language data/dictionaries/pairs from repositories ====
<pre>
 
  +
[[Install language data using packaging]], including hints about the Apertium package repository.
^Det_nom<SN><DET><f><sg>{^uno<det><ind><3><4>$ ^amapola<n><3><4>$}$
 
</pre>
 
   
  +
==== For language developers: Install language data/dictionaries/pairs by compiling ====
====Tool used====
 
  +
* Start a new language pair: [[How to bootstrap a new pair]]
: apertium-transfer
 
  +
* Work on an existing language pair: [[Install language data by compiling]]
   
   
====Auto Mode====
 
: xxx-yyy-chunker
 
   
  +
===Alternatives===
   
  +
====Installing Apertium core by compiling====
====Configuration Files====
 
: apertium-xxx-yyy.xxx-yyy.t1x
 
   
  +
Apertium maintains a package repository that is up-to-date and reliable. If you do not want to work in core, or develop languages, please use either packaging or a virtual environment. The packages stay up-to-date and are stable. A compile will waste your time.
   
  +
However, if you are planning to work on Apertium core, or have an operating system not covered above, go right ahead, [[Install Apertium core by compiling]]<ref name="about installing">Most people know the word 'install'. It means 'put code in my operating system'. When developing, it is not usual to fully 'install'. You get the code working enough to get results.
====Links====
 
[[Chunking]]
 
[[Chunking:_A_full_example]]
 
   
  +
This is relevant to Apertium, which needs a rapid cycle for re-compiles. If you follow instructions to compile code, you will be discouraged from 'installing' builds. When we use the word 'install', we mean 'get code working on my computer'.</ref>
   
  +
== Notes ==
  +
<references/>
   
  +
== Installation Videos ==
   
  +
Most of these videos have been produced by Google Code-In students.
   
  +
* Using Apertium Virtualbox on Windows: https://youtu.be/XCUWMCJkRDo
===Transfer 2/InterChunk===
 
  +
* Installing Apertium on Ubuntu (Romanian, English): https://www.youtube.com/watch?v=vy7rWy2u_m0
In a three-stage transfer, the second transfer stage orders chunked and tagged items from stage one.
 
  +
* Ubuntu'ya Apertium Kurulumu / Apertium installation on Ubuntu (Turkish, English subtitles): https://www.youtube.com/watch?v=I__-BiQe7zg
  +
* Apertium on Slitaz (English): https://youtu.be/fCluA03oIXY
  +
* How to Install Apertium On Macintosh: https://www.youtube.com/watch?v=oSuovCCsa68
   
  +
[[Category:Installation]]
Configuring this stage is not necessary for making a basic pair.
 
  +
[[Category:Documentation in English]]
   
The detection is of patterns/sequences of chunks. This module can not match words in chunks, only the marks added to chunks.
 
   
  +
= Minimal installation from SVN=
For language pairs with no major reordering between chunks, this module is not needed. If the 't2x' file is not configured, the module passes data unaltered. For example, 'en-es' has a Postchunk module (see next section), but not an Interchunk module.
 
  +
This page is deprecated, and the information split across other pages.
   
  +
It used to contain instructions on how to compile Apertium core. For this, please see [[Install Apertium core by compiling]]
====Technical Description====
 
Reorder or modify chunk sequences (e.g. transfer noun gender to related adjectives).
 
   
  +
How to create language builds with new and exisiting repository information. Please see [[Install language data by compiling]]
   
  +
And details about the HFST and CG modules. Please see [[Installation of grammar libraries]]
====Example====
 
From the previous example,
 
   
  +
Or start from the information root at [[Installation]]?
: "I may give only this advice"
 
 
If this is being translated to a Subject-Object-Verb language (English is a Subject-Verb-Object language, but many languages are not), it will need rearranging. At the very least, the target dictionary will need,
 
 
: I advice give
 
 
and it is in this module the words are rearranged.
 
 
====Typical stream output====
 
The module only reorders chunks. It has no effect on the form of the stream. But see 'mode'.
 
 
====Tool used====
 
: apertium-interchunk
 
 
 
====Auto Mode====
 
: xxx-yyy-interchunk
 
 
 
====Configuration Files====
 
: apertium-xxx-yyy.xxx-yyy.t2x
 
 
 
====Links====
 
[[Chunking:_A_full_example]]
 
 
 
 
===Transfer 3/PostChunk===
 
In a three-stage transfer, the third transfer stage interferes with the resolution and writing of chunks.
 
 
Configuring this stage is not necessary for making a basic pair.
 
 
Detection is not by pattern matching, it is by the name/lemma of the chunk itself. Position marks refer to the words/lexical units inside the chunks. The module will not write chunks, only lexical units and blanks.
 
 
So PostChunk is less abstracted than Transfer 2/InterChunk processing.
 
 
For language pairs with no rewriting of chunks, this module is not needed. If the 't3x' file is not configured, the module defaults to resolving and removing chunk data.
 
 
 
====Technical Description====
 
Substitute fully-tagged target-language forms into the chunks.
 
 
 
====Example====
 
Reducing a previous example, text arrives, prepared by the Chunker, labelled as feminine and singular. The following stripped version of the input shows the 'chunk' marks,
 
 
: ^<f><sg>{^uno<det><ind>$ ^amapola<n>$}$
 
 
Now the postchunk module must render this. In English, the chunk had no gender, so neither did the 'a' word/determiner. Now it has a gender, and the chunker stages have defined where these tags should be applied. In some cases of translation, the chunks may also be reordered.
 
 
 
====Typical stream output====
 
From,
 
 
: "a poppy"
 
 
Chunk marks have been removed from the stream (compare to 'chunker' output above), and tags distributed,
 
 
<pre>
 
^Uno<det><ind><f><sg>$ ^amapola<n><f><sg>$
 
</pre>
 
 
The output looks much the same as before the chunker stages. However, tags may have been added and deleted, and chunks of words recordered, to suit the target language.
 
 
 
====Tool used====
 
: apertium-postchunk
 
 
 
====Auto Mode====
 
: xxx-yyy-postchunk
 
 
 
====Configuration Files====
 
: apertium-xxx-yyy.xxx-yyy.t3x
 
 
 
====Links====
 
[[Chunking:_A_full_example]]
 
 
 
 
 
 
===Morphological Generator===
 
'Generate' the surface forms of the translated words.
 
 
At this point, the text stream contains target language lemmas and tags, perhaps modified and prepared by the Lexical Selector and/or chunker stages. But this is not the final form. The Morphological Generator stage needs to takes the lemma and tags, then generate the target-language surface form e.g. it needs to take the lemma 'knife', and the tag '<pl>' (for plural), then generate 'knives'.
 
 
For this, the target language monodix is used in the direction right/left (in reverse of left->right reading). 'surface form' <- 'lexical unit'.
 
 
====Technical Description====
 
Use the lemma and tags ('lexical unit') to deliver the correct target-language surface form.
 
 
 
====Example====
 
The output, now translated,
 
 
: "He that travels into a country, before he has some entrance into the language, goes to school, and not to travel"
 
 
becomes,
 
 
: "Él aquello viaja a un país, antes de quei tiene alguna entrada a la lengua, va a escuela, y no para viajar"
 
 
This may be changed, for a few surface forms, by the Post Generator, and then formatted.
 
 
====Typical stream output====
 
The translation, now stripped of stream formatting, but before Post Generator and formatting. See the example above.
 
 
 
 
====Tool used====
 
: lt-proc -g
 
 
The switch/option,
 
 
: -g, --generation: morphological generation
 
 
 
====Auto Mode====
 
Post Chunker debug output (before Post Generator),
 
 
: xxx-yyy-dgen
 
 
 
====Configuration Files====
 
: apertium-xxx-yyy.yyy.dix
 
 
 
====Links====
 
[[Monodix_basics]]
 
[[Apertium_New_Language_Pair_HOWTO]]
 
 
 
 
 
 
===PostGenerator===
 
Corrects or localises spelling where the adjustment relies on the next word.
 
 
Configuring this stage is not necessary for making a basic pair.
 
 
The post-generator uses a dictionary very similar to a mono-dictionary, used as a generator. So it is capable of creating, modifying and removing text. It can use paradigms. However, please read the rest of this section! The generator must be triggered using an '<a/>' tag in the bidex.
 
 
The module was originally provided to convert Spanish-like 'de el' into 'del'. It also performs a good job on placing 'a'/'an' determiners before English nouns ('an apple'). Here you can see the two main features of the post-generator. First, it works on the text as generated, so can be used when the form of a word is closely related to the final form of the following word. Note that 'de el' and 'a'/'an' could not be handled earlier in the text stream/Apertium workflow, because we have no idea what the surface forms will be. These forms are only available after the generating monodix. The second feature is that, in general, the post-generator works by inflecting/selecting/replacing a word based on the following word.
 
 
The post-generator is sometimes referred to in documentation as intended for 'orthography'. Orthography is conventions of spelling, hyphenation, and other graphical display features i.e. the language side of typography. Perhaps that was the original intention for the Post Generator, but the module at the time of writing is unsuitable for many orthographic tasks. It displays several unexpected behaviours. Attempts at elision and compression, other than a 'de el'->'del' style of elision across the forward word boundary, are likely to fail. However, the module is so useful for these two cases alone that it is an established stage in the workflow.
 
 
 
====Technical Description====
 
Make final adjustments where a generated surface form relies on the next surface form.
 
 
====Example====
 
The example from the manual is Spanish,
 
 
: "de el"
 
 
which becomes,
 
 
: "del"
 
 
And the template includes an example in English,
 
 
: "a"
 
 
which becomes "an" before a vowel,
 
 
: "an apple" (but "a car")
 
 
Both examples are beyond pure orthography, but depend on the final surface forms and that they are next to each other,
 
 
The Post Generator handles difficult cases. For example, we translate into English,
 
 
: Un peligro inminente
 
 
The Post Generator will successfully handle the determiner, translating to,
 
 
: An imminent danger
 
 
It would be very difficult handle this action earlier in the Apertium workflow. It may also confuse intentions in the code, and maybe limit other work we needed to do.
 
 
But the Post Generator is not useful for some actions. English sometimes hyphenates groups of words,
 
 
: "But all this while, when I speak of vain-glory..."
 
 
Other common hyphenated groups are "misty-eyed", "follow-up", and "computer-aided". Finding a rule for this form of hyphenation is not easy. Let us imagine a rule exists. Unfortunately, the Post Generator could not handle the insertion of the hyphen, because it is made to recognise the following blank and either replace the first word, or the words as a whole. Language pair en-es handles the above examples by recognising these word groups in the initial monodix, not by Post Generator manipulation.
 
 
For the same reason, the Post-Generator can not handle another orthographic action in English; the use of the apostrophe. For example,
 
 
: "that is but a circle of tales"
 
 
will often become,
 
 
: that's but a circle of tales
 
 
The rule is clear and, for this example, reaches across a following blank. But the the Post Generator can give unintended results when manipulating a single letter (such as an 's') and is more predictable with a full replacement. Also, there are many such compressions and elisions in English ('where is' -> 'where's' etc.), so may be better to handle these with more general rules earlier in the workflow, or consider if the translation is better without them.
 
 
 
 
====Typical stream output====
 
The translation, now stripped of stream formatting e.g.
 
 
: "that is but a circle of tales"
 
 
gives,
 
 
<pre>
 
Aquello es pero un círculo de cuentos"
 
</pre>
 
 
====Tool used====
 
: lt-proc -p
 
 
The switch/option,
 
 
: '-p', --postgeneration
 
 
 
====Auto Mode====
 
Input,
 
 
: xxx-yyy-dgen
 
 
 
which is the debug output of the Post Chunker. Output, excluding formatting, is the finished product, so,
 
 
: xxx-yyy
 
 
 
====Configuration Files====
 
The files are in the mono-dictionary folders. For the source language,
 
 
: apertium-xxx.post-xxx.dix
 
 
 
For the target language,
 
 
: apertium-yyy.post-yyy.dix
 
 
 
 
==Links==
 
The post generator usually only contains a handful of rules, often common, so is not covered by much documentation. For depth, try the [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf Apertium 2.0: Official documentation] (Sect. 3.1.2). For a quick-reference example,
 
 
[[Post-generator]]
 
 
 
==References==
 
Text examples from,
 
* Francis Bacon (1625). [http://www.gutenberg.org/files/575/575-h/575-h.htm#link2H_4_0037 THE ESSAYS OR COUNSELS, CIVIL AND MORAL]. Project Gutenberg
 

Latest revision as of 12:55, 24 April 2017

En français

InstallationResourcesContactDocumentationDevelopmentTools

Gnome-home.png Home PageBugs.png BugsInternet.png WikiGaim.png Chat


To try Apertium[edit]

You can go online to the front page :)

There are several applications which work from the desktop without full installation. For these and more graphical user interfaces, services, plugins, etc. goto Tools.

If you would like install instructions for 'Apertium viewer', 'apy' (the Apertium server) etc. got to Tools. The install instructions can be found with the tool descriptions.


For those who want to install Apertium locally, and developers[edit]

How to install Apertium core[1] and language data on your system (developers may also want to consider their operating environment[2]).


Installing: a summary[edit]

Most people will need to,

Install Apertium Core by packaging/virtual environment[edit]

For translators: Install language data/dictionaries/pairs from repositories[edit]

Install language data using packaging, including hints about the Apertium package repository.

For language developers: Install language data/dictionaries/pairs by compiling[edit]


Alternatives[edit]

Installing Apertium core by compiling[edit]

Apertium maintains a package repository that is up-to-date and reliable. If you do not want to work in core, or develop languages, please use either packaging or a virtual environment. The packages stay up-to-date and are stable. A compile will waste your time.

However, if you are planning to work on Apertium core, or have an operating system not covered above, go right ahead, Install Apertium core by compiling[3]

Notes[edit]

  1. Apertium is a big system. There are many plugins, scripts, and extension projects. The core, the code which translates, is a multi-step set of tools joined by a stream format and, nowadays, invoked by scripts called 'modes'. You may also see the names 'lt-toolbox'/'lt-tools', 'apertium-lex-tools', and the simple title 'apertium'. These refer to groupings of the tools. Packaged or compiled, these tools can be installed as one unit. From here on, we call them 'Apertium core'.
  2. Apertium is written to be platform-independent. However, it can be difficult to maintain platform-independence over a project this wide. If you intend to do something deep with Apertium, you will gain more help from the tools if you use the Ubuntu, or a similar Debian-based, operating system. In no way does this mean that the Apertium project favours this platform.
  3. Most people know the word 'install'. It means 'put code in my operating system'. When developing, it is not usual to fully 'install'. You get the code working enough to get results. This is relevant to Apertium, which needs a rapid cycle for re-compiles. If you follow instructions to compile code, you will be discouraged from 'installing' builds. When we use the word 'install', we mean 'get code working on my computer'.

Installation Videos[edit]

Most of these videos have been produced by Google Code-In students.


Minimal installation from SVN[edit]

This page is deprecated, and the information split across other pages.

It used to contain instructions on how to compile Apertium core. For this, please see Install Apertium core by compiling

How to create language builds with new and exisiting repository information. Please see Install language data by compiling

And details about the HFST and CG modules. Please see Installation of grammar libraries

Or start from the information root at Installation?