Difference between revisions of "Infrastructure discussion"
Jump to navigation
Jump to search
(→Example formats: better order) |
|||
(4 intermediate revisions by 2 users not shown) | |||
Line 27: | Line 27: | ||
::Problems in incompatibility between text formats can be either taken care of in the code (e.g. add a mode to SFST to output Apertium style analyses), or using perl/python/awk scripts. - [[User:Francis Tyers|Francis Tyers]] 14:23, 24 February 2008 (UTC) |
::Problems in incompatibility between text formats can be either taken care of in the code (e.g. add a mode to SFST to output Apertium style analyses), or using perl/python/awk scripts. - [[User:Francis Tyers|Francis Tyers]] 14:23, 24 February 2008 (UTC) |
||
:::Thank you for the answer. This was exactly the answer I hoped for. This really makes it a moduler system, and different projects could try out different modules, where the core translation engine probably would be the common ground. [[User:Trondtr|Trondtr]] 17:00, 25 February 2008 (UTC). |
|||
::::See [[Apertium and Constraint Grammar]]. - [[User:Francis Tyers|Francis Tyers]] 15:29, 13 April 2008 (BST) |
|||
===Example formats=== |
===Example formats=== |
||
Line 40: | Line 44: | ||
;Apertium |
;Apertium |
||
<pre> |
<pre> |
||
^orða/orða<vblex><inf>/orð<n><nt><pl><gen><ind>$ |
before tagging: ^orða/orða<vblex><inf>/orð<n><nt><pl><gen><ind>$ |
||
after tagging : ^orð<n><nt><pl><gen><ind>$ |
|||
</pre> |
</pre> |
||
;vislcg |
|||
"<orða>" |
|||
"orða" V Ind Prs Pl |
|||
"orð" N Neu Pl Gen Indef |
|||
"orða" V Ind Prs 1Sg |
|||
"orða" V Inf |
|||
"orða" V Imp Sg |
|||
[[Category:Discussions]] |
[[Category:Discussions]] |
Latest revision as of 14:29, 13 April 2008
Modularity[edit]
To what extent is the apertium system modular? To be specific: In our work we do not use agglutinative paradigms, but cascaded finite-state transducers (the Sámi languages we analyse have all to much non-concatenative morphology to be treated with such methods). We use Xerox tools, but the open-source sfst tools would be a possible alternative. Also, our disambiguation and syntactic analysis does not use HMM, but rather constraint grammar (the vislcg variety). Now, the question is to what extent the system is modular enough to take out one or more apertium component(s) and replace them with other components? Trondtr 14:03, 24 February 2008 (UTC).
- The Apertium system is implemented as a unix pipeline. Currently most language pairs have the following components:
- Deformat
- Morphological analyser
- POS Tagger
- Transfer (syntactic + lexical)
- Generation
- Post-generation
- Reformat
- The more advanced pairs replace the "Transfer" stage with a three-stage transfer, which incorporates a chunker (rule-based) and the ability to move around chunks (NP, V, etc.) rather than lexical units.
- All modules communicate using text streams, so it would be definitely possible to say use your morphological analysis + tagging, and then use our format management and transfer. e.g.:
- Apertium deformat
- sfst morphological analysis
- VISL constraint grammar
- Transfer (syntactic + lexical)
- sfst morphological generation
- Reformat
- -Francis Tyers 14:20, 24 February 2008 (UTC)
- Problems in incompatibility between text formats can be either taken care of in the code (e.g. add a mode to SFST to output Apertium style analyses), or using perl/python/awk scripts. - Francis Tyers 14:23, 24 February 2008 (UTC)
- Thank you for the answer. This was exactly the answer I hoped for. This really makes it a moduler system, and different projects could try out different modules, where the core translation engine probably would be the common ground. Trondtr 17:00, 25 February 2008 (UTC).
- See Apertium and Constraint Grammar. - Francis Tyers 15:29, 13 April 2008 (BST)
Example formats[edit]
- xsft
orða orð+N+Neu+Pl+Gen+Indef orða orða+V+Inf orða orða+V+Ind+Prs+1Sg orða orða+V+Imp+Sg
- Apertium
before tagging: ^orða/orða<vblex><inf>/orð<n><nt><pl><gen><ind>$ after tagging : ^orð<n><nt><pl><gen><ind>$
- vislcg
"<orða>" "orða" V Ind Prs Pl "orð" N Neu Pl Gen Indef "orða" V Ind Prs 1Sg "orða" V Inf "orða" V Imp Sg