Search results

Using linguistic resources
...ich can be inserted into them. This data might consist of wordlists, word-corpora derived from web-crawlers such as [http://borel.slu.edu/crubadan/ Crubadán

13 KB (2,112 words) - 12:11, 26 May 2023
Google Code-in/Application 2015
...ents immensely good at helping us out with these: for instance, annotating corpora that are needed to train Apertium modules, or finding bugs in the handling

7 KB (1,111 words) - 10:10, 15 November 2015
Multi-engine translation synthesiser
...maximum usage of available resources for marginalised languages. Parallel corpora, user-feedback, other translation systems.

5 KB (802 words) - 07:04, 10 May 2012
Google Summer of Code/Application 2009
...ger and an initial set of translation rules from monolingual and bilingual corpora.

10 KB (1,543 words) - 19:50, 12 April 2021
Crossdics Example
...uistics resources: morphological and bilingual dictionaries, cross models, corpora, etc.</description>

6 KB (689 words) - 22:58, 25 October 2018
Préparation de données pour Moses
Pour le corpus parallèle, on va utiliser Europarl, la page [[corpora]] (seulement en anglais) en liste d'autres :

5 KB (699 words) - 07:52, 8 October 2014
Running the MaxEnt rule learning
TRAIN=/home/philip/Apertium/corpora/raw/setimes-hr-mk-nikola

3 KB (520 words) - 21:25, 14 February 2014
Google Code-in/Application 2014
...ents immensely good at helping us out with these: for instance, annotating corpora that are needed to train Apertium modules, or finding bugs in the handling

6 KB (987 words) - 10:21, 7 November 2014
Calculating coverage
$ bzcat ~/corpora/nno.txt.bz2 |./make-freqlist.sh > nno.freqlist

4 KB (583 words) - 15:18, 10 January 2022
Preparing data for Moses factored training using Apertium
For the parallel corpus we're going to use Europarl, the page [[corpora]] lists some others:

4 KB (647 words) - 07:45, 8 October 2014
Xml grep
Some corpora are formatted in XML and put e.g. the real text contents inside a particula

5 KB (863 words) - 09:04, 10 October 2017
Running the monolingual rule learning
MODEL=/home/philip/Apertium/corpora/language-models/mk/setimes.mk.5.blm

4 KB (503 words) - 19:01, 17 August 2018
Apertium cat-srd/ Apertium ita-srd: relata finale
...biguadore morfològicu chi at a èssere ùtile pro sa disambiguatzione de sos corpora e non si nd'at a pòdere fàghere a mancu pro isvilupare àteras crobas lin

13 KB (2,173 words) - 19:17, 24 June 2018
Automatic postediting at GSoC 2018
...nually filtered and corrected source - target pairs from OpenSubtitles2018 corpora preprocessed with bicleaner + both ways Apertium translations (ukr -> rus,

7 KB (1,033 words) - 15:27, 15 August 2018
Development
...UD Annotatrix] - in-browser software for annotating Universal Dependencies corpora.

2 KB (251 words) - 10:07, 27 June 2022
Turkish and Kyrgyz/Final report
...lexicon database with part of speech. We achieved coverage of % on SETimes corpora. And i am really happy with kymorph. Special thanks to firspeaker.

5 KB (680 words) - 07:14, 26 August 2011
Dravidian languages
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc

19 KB (2,201 words) - 09:21, 9 December 2019
Indirect contribution guide
* corpora.

9 KB (1,494 words) - 05:58, 18 March 2015
Apertium Turkic/TODO
* consider including the web concordancer on the site (and consider what corpora to provide search access to...)

4 KB (514 words) - 21:24, 19 August 2015
Tagger training
Some pre-processed corpora can be found [http://corpora.informatik.uni-leipzig.de/download.html here] and [http://wt.jrc.it/lt/Acqu

7 KB (1,058 words) - 07:37, 4 July 2016

Search results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools