Difference between revisions of "Task ideas for Google Code-in/Grow bilingual"

From Apertium
Jump to navigation Jump to search
(Created page with "# select a language pair, ideally such that the source language is a language you know (L₂) and the target language a language you use every day (L₁), such that it has rat...")
 
Line 1: Line 1:
# select a language pair, ideally such that the source language is a language you know (L₂) and the target language a language you use every day (L₁), such that it has rather good monolingual dictionaries in Apertium but no reasonable bilingual dictionary (these language pairs are usually in the incubator), for instance apertium-spa-pol
+
# '''select a language pair''', ideally such that the source language is a language you know (L₂) and the target language a language you use every day (L₁), such that it has rather good monolingual dictionaries in Apertium but no reasonable bilingual dictionary (these language pairs are usually in the incubator), for instance apertium-spa-pol
# Install Apertium locally from the Subversion repository; install the language pair; make sure that it works and/or get [http://wiki.apertium.org/wiki/Apertium_VirtualBox Apertium VirtualBox] and update, check out & compile the language pair.
+
# '''Install Apertium''' locally from the Subversion repository; install the language pair; make sure that it works and/or get [http://wiki.apertium.org/wiki/Apertium_VirtualBox Apertium VirtualBox] and update, check out & compile the language pair.
# Using a large enough corpus of representative text in the source language (e.g. plain text taken from Wikipedia, newspapers, literature, etc.) detect the 50 most frequent unknown words (source words which are not in the bilingual dictionaries of the language pair).
+
# Using a large enough corpus of representative text in the source language (e.g. plain text taken from Wikipedia, newspapers, literature, etc.) '''detect the 200 most frequent unknown words''' (source words which are not in the bilingual dictionaries of the language pair).
# add these correspondences to the bilingual dictionary (so that they are not unknown anymore).
+
# '''add these correspondences to the bilingual dictionary''' (so that they are not unknown anymore).
# Compile and test again
+
# '''Compile and test again'''
# Submit a patch to your mentor (or commit it if you have already gained developer access)
+
# '''Submit''' a patch to your mentor (or commit it if you have already gained developer access)
   
 
[[Category:Tasks for Google Code-in|Grow bilingual]]
 
[[Category:Tasks for Google Code-in|Grow bilingual]]

Revision as of 04:51, 14 December 2017

  1. select a language pair, ideally such that the source language is a language you know (L₂) and the target language a language you use every day (L₁), such that it has rather good monolingual dictionaries in Apertium but no reasonable bilingual dictionary (these language pairs are usually in the incubator), for instance apertium-spa-pol
  2. Install Apertium locally from the Subversion repository; install the language pair; make sure that it works and/or get Apertium VirtualBox and update, check out & compile the language pair.
  3. Using a large enough corpus of representative text in the source language (e.g. plain text taken from Wikipedia, newspapers, literature, etc.) detect the 200 most frequent unknown words (source words which are not in the bilingual dictionaries of the language pair).
  4. add these correspondences to the bilingual dictionary (so that they are not unknown anymore).
  5. Compile and test again
  6. Submit a patch to your mentor (or commit it if you have already gained developer access)