Ideas for Google Summer of Code

This is the ideas page for Google Summer of Code, here you can find ideas on interesting projects that would make Apertium more useful for people and improve or expand our functionality. If you have an idea please add it below, if you think you could mentor someone in a particular area — or just have interests or ideas for that, add your name to "Interested parties" using ~~~

The page is intended as an overview of the kind of projects we have in mind. If one of them particularly piques your interest, please come and discuss with us on #apertium on irc.freenode.net, mail the mailing list, or draw attention to yourself in some other way.

Maybe take a look at some open bugs ?

Difficulty = 1 (Very Hard) ... 4 (Entry level)

See the list sorted by: difficulty level, theme.

Notes

Simard, Michel (1998). "Automatic Insertion of Accents in French Texts". Proceedings of EMNLP-3. Granada, Spain.
Rada F. Mihalcea. (2002). "Diacritics Restoration: Learning from Letters versus Learning from Words". Lecture Notes in Computer Science 2276/2002 pp. 96--113
G. De Pauw, P. W. Wagacha; G.M. de Schryver (2007) "Automatic diacritic restoration for resource-scarce languages". Proceedings of Text, Speech and Dialogue, Tenth International Conference. pp. 170--179
P.W. Wagacha; G. De Pauw; P.W. Githinji (2006) "A grapheme-based approach to accent restoration in Gĩkũyũ". Proceedings of the Fifth International Conference on Language Resources and Evaluation
D. Yarowsky (1994) "A Comparison Of Corpus-Based Techniques For Restoring Accents In Spanish And French Text". Proceedings, 2nd annual workshop on very large corpora. pp. 19--32

Lexical selection

Ide, N. and Véronis, J. (1998) "Word Sense Disambiguation: The State of the Art". Computational Linguistics 24(1)

Automated lexical extraction

M. Forsberg H. Hammarström A. Ranta. "Morphological Lexicon Extraction from Raw Text Data". FinTAL 2006, LNAI 4139, pp. 488--499.

Support for agglutinative languages

Beesley, K. R and Karttunen, L. (2000) "Finite-State Non-Concatenative Morphotactics". SIGPHON-2000, Proceedings of the Fifth Workshop of the ACLSpecial Interest Group in Computational Phonology, pp. 1--12,

Transfer rule learning

Sánchez-Martínez, F. and Forcada, M.L. (2007) "Automatic induction of shallow-transfer rules for open-source machine translation", in Proceedings of TMI 2007, pp.181-190 (paper, poster)

Compounding and de-compounding

Koehn, P. and Knight, K. (2003) "Empirical Methods for Compound Splitting". 11th Conference of the European Chapter of the Association for Computational Linguistics, (EACL2003).
Brown, R. (2002) "Corpus-Driven Splitting of Compound Words". TMI 2002
Moa, H. (2005) "Compounds and other oddities in machine translation". Proceedings of the 15th NODALIDA conference, Joensuu 2005.

Multi-engine machine translation

Sergei Nirenburg and Robert Frederking (1994) "Toward Multi-Engine Machine Translation". Proceedings of the workshop on Human Language Technology. pp. 147 - 151
Shyamsundar Jayaraman and Alon Lavie (2005) "Multi-Engine Machine Translation Guided by Explicit Word Matching". ACL 2005

Ideas for Google Summer of Code

Contents

Notes

Further reading

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools