Ideas for Google Summer of Code
Revision as of 23:37, 7 March 2009 by Francis Tyers (talk | contribs)
Contents |
This is the ideas page for Google Summer of Code, here you can find ideas on interesting projects that would make Apertium more useful for people and improve or expand our functionality. If you have an idea please add it below, if you think you could mentor someone in a particular area — or just have interests or ideas for that, add your name to "Interested parties" using ~~~
The page is intended as an overview of the kind of projects we have in mind. If one of them particularly piques your interest, please come and discuss with us on #apertium
on irc.freenode.net
, mail the mailing list, or draw attention to yourself in some other way.
Maybe take a look at some open bugs ?
- Difficulty = 1 (Very Hard) ... 4 (Entry level)
See the list sorted by: difficulty level, theme.
Notes
Further reading
- Accent and diacritic restoration
- Simard, Michel (1998). "Automatic Insertion of Accents in French Texts". Proceedings of EMNLP-3. Granada, Spain.
- Rada F. Mihalcea. (2002). "Diacritics Restoration: Learning from Letters versus Learning from Words". Lecture Notes in Computer Science 2276/2002 pp. 96--113
- G. De Pauw, P. W. Wagacha; G.M. de Schryver (2007) "Automatic diacritic restoration for resource-scarce languages". Proceedings of Text, Speech and Dialogue, Tenth International Conference. pp. 170--179
- P.W. Wagacha; G. De Pauw; P.W. Githinji (2006) "A grapheme-based approach to accent restoration in Gĩkũyũ". Proceedings of the Fifth International Conference on Language Resources and Evaluation
- D. Yarowsky (1994) "A Comparison Of Corpus-Based Techniques For Restoring Accents In Spanish And French Text". Proceedings, 2nd annual workshop on very large corpora. pp. 19--32
- Lexical selection
- Ide, N. and Véronis, J. (1998) "Word Sense Disambiguation: The State of the Art". Computational Linguistics 24(1)
- Automated lexical extraction
- M. Forsberg H. Hammarström A. Ranta. "Morphological Lexicon Extraction from Raw Text Data". FinTAL 2006, LNAI 4139, pp. 488--499.
- Support for agglutinative languages
- Beesley, K. R and Karttunen, L. (2000) "Finite-State Non-Concatenative Morphotactics". SIGPHON-2000, Proceedings of the Fifth Workshop of the ACLSpecial Interest Group in Computational Phonology, pp. 1--12,
- Transfer rule learning
- Sánchez-Martínez, F. and Forcada, M.L. (2007) "Automatic induction of shallow-transfer rules for open-source machine translation", in Proceedings of TMI 2007, pp.181-190 (paper, poster)
- Compounding and de-compounding
- Koehn, P. and Knight, K. (2003) "Empirical Methods for Compound Splitting". 11th Conference of the European Chapter of the Association for Computational Linguistics, (EACL2003).
- Brown, R. (2002) "Corpus-Driven Splitting of Compound Words". TMI 2002
- Moa, H. (2005) "Compounds and other oddities in machine translation". Proceedings of the 15th NODALIDA conference, Joensuu 2005.
- Multi-engine machine translation
- Sergei Nirenburg and Robert Frederking (1994) "Toward Multi-Engine Machine Translation". Proceedings of the workshop on Human Language Technology. pp. 147 - 151
- Shyamsundar Jayaraman and Alon Lavie (2005) "Multi-Engine Machine Translation Guided by Explicit Word Matching". ACL 2005