Difference between revisions of "Infrastructure discussion"
Jump to navigation
Jump to search
Line 25: | Line 25: | ||
:-[[User:Francis Tyers|Francis Tyers]] 14:20, 24 February 2008 (UTC) |
:-[[User:Francis Tyers|Francis Tyers]] 14:20, 24 February 2008 (UTC) |
||
::Problems in incompatibility between text formats can be either taken care of in the code (e.g. add a mode to SFST to output Apertium style analyses), or using perl/python/awk scripts. - [[User:Francis Tyers|Francis Tyers]] 14:23, 24 February 2008 (UTC) |
|||
===Example formats=== |
|||
;xsft |
|||
<pre> |
|||
orða orð+N+Neu+Pl+Gen+Indef |
|||
orða orða+V+Inf |
|||
orða orða+V+Ind+Prs+1Sg |
|||
orða orða+V+Imp+Sg |
|||
</pre> |
|||
;Apertium |
|||
<pre> |
|||
^orða/orða<vblex><inf>/orð<n><nt><pl><ind><gen>$ |
|||
</pre> |
|||
[[Category:Discussions]] |
[[Category:Discussions]] |
Revision as of 14:23, 24 February 2008
Modularity
To what extent is the apertium system modular? To be specific: In our work we do not use agglutinative paradigms, but cascaded finite-state transducers (the Sámi languages we analyse have all to much non-concatenative morphology to be treated with such methods). We use Xerox tools, but the open-source sfst tools would be a possible alternative. Also, our disambiguation and syntactic analysis does not use HMM, but rather constraint grammar (the vislcg variety). Now, the question is to what extent the system is modular enough to take out one or more apertium component(s) and replace them with other components? Trondtr 14:03, 24 February 2008 (UTC).
- The Apertium system is implemented as a unix pipeline. Currently most language pairs have the following components:
- Deformat
- Morphological analyser
- POS Tagger
- Transfer (syntactic + lexical)
- Generation
- Post-generation
- Reformat
- The more advanced pairs replace the "Transfer" stage with a three-stage transfer, which incorporates a chunker (rule-based) and the ability to move around chunks (NP, V, etc.) rather than lexical units.
- All modules communicate using text streams, so it would be definitely possible to say use your morphological analysis + tagging, and then use our format management and transfer. e.g.:
- Apertium deformat
- sfst morphological analysis
- VISL constraint grammar
- Transfer (syntactic + lexical)
- sfst morphological generation
- Reformat
- -Francis Tyers 14:20, 24 February 2008 (UTC)
- Problems in incompatibility between text formats can be either taken care of in the code (e.g. add a mode to SFST to output Apertium style analyses), or using perl/python/awk scripts. - Francis Tyers 14:23, 24 February 2008 (UTC)
Example formats
- xsft
orða orð+N+Neu+Pl+Gen+Indef orða orða+V+Inf orða orða+V+Ind+Prs+1Sg orða orða+V+Imp+Sg
- Apertium
^orða/orða<vblex><inf>/orð<n><nt><pl><ind><gen>$