Google Summer of Code/Report 2010

From Apertium
Jump to navigation Jump to search

Tokenisation with HFST (aikoniv)

apertium-pl-cs (Aha)

apertium-fr-pt (Jalopuera)

This project was mentored by Francis Tyers and Gema Ramírez Sánchez and was worked on by Sean Healy.

apertium-fr-pt has advanced a reasonable amount during the project. The transfer lexicon contains around 16,000 items, which are also reflected in the morphological analysers/generators. The pair is testvoc clean, but does not yet pass a corpus check due to missing rules -- for example for verbal participles. Some rules have been worked on, but are in an incomplete state.

The pair used apertium-dixtools to produce a transfer lexicon from the apertium-fr-es (French to Spanish) and apertium-es-pt (Portuguese to Spanish) pairs. This transfer lexicon was then reviewed and fixed. The transfer rules from apertium-fr-es pair were copied and the Spanish side "translated" to Portuguese. This gives an adequate basis for further work. Some extra rules were added for common patterns missing between French and Portuguese.

The GSOC week plan was not stuck to, and some parts were not completed. The transfer rules are not sufficiently advanced, and the system is not in a releasable state. No evaluation has been performed. However, sufficient progress has been made in order to gain a pass grade as the amount of work to bring the system to release and evaluate it would be under a week.

apertium-fin-sme (pyry`)

Java runtime port (Kanmuri)

VM for transfer (darthxaher)

Easy dictionary maintenance (AlessioJr)

Post-edition tool (unaszole)

Multiword handling (skh)