Difference between revisions of "Polish and Russian/Project description"

From Apertium
Jump to navigation Jump to search
Line 9: Line 9:
   
 
== Project description ==
 
== Project description ==
  +
  +
=== Description of the main package components ===
  +
  +
  +
'''Monolingual and bilingual dictionaries'''
  +
  +
  +
  +
'''Constraint grammar and transfer rules'''
  +
  +
  +
  +
'''Corpora and language data'''
  +
  +
The corpora which were used for testing are the Russian National Corpus and Polish wikinews corpus.
  +
  +
  +
'''Auxiliary scripts'''
   
 
== Statistics ==
 
== Statistics ==

Revision as of 16:56, 23 August 2016

Commitment

The list of all commits: https://apertium.projectjj.com/gsoc2016/maryszmary.html

Monolingual Polish package: https://svn.code.sf.net/p/apertium/svn/languages/apertium-pol/

Monolingual Russian package: https://svn.code.sf.net/p/apertium/svn/languages/apertium-rus/

Bilingual Polish-Russian package: https://svn.code.sf.net/p/apertium/svn/incubator/apertium-pol-rus/

Project description

Description of the main package components

Monolingual and bilingual dictionaries


Constraint grammar and transfer rules


Corpora and language data

The corpora which were used for testing are the Russian National Corpus and Polish wikinews corpus.


Auxiliary scripts

Statistics

At first, the goal of the project was to achieve 90% coverage of corpora used. It turned out to be a challenging task for three months of work, partly because of the peculiarities of Slavic morphology and morphophonology and the lack of available bilingual electronic dictionaries (and poverty of the latter). As a result, starting from the end of July our main task was to lowering the number of mistakes than on achieving high coverage.

Coverage Polish → Russian (%) Russian → Polish (%)
Trimmed coverage 85.1% 83.8%
Coverage Russian (%) Polish (%)
Raw coverage 94.2% 87.6%

The number of lemmas in bilingual dictionary: 48,836.

The number of lemmas in Polish dictionary: 10,023.

Future work

Resources