Search results

Uighur and Turkish/GSoC2018 report
== Corpora and Coverage == Our main corpora consisted of [https://www.rfa.org/uyghur/ RFA], [http://uy.ts.cn/ Tanritor]

5 KB (607 words) - 13:25, 12 August 2018
Celtic languages
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc

10 KB (1,263 words) - 06:04, 23 December 2014
Lexical feature transfer - Second report
== Corpora, sets and alignment == The parallel corpora for the Macedonian - English pair, a total of 207.778 parallel sentences, c

5 KB (620 words) - 12:21, 27 July 2012
Apertium-aze/stats
== Corpora == * wikipage: <section begin=azadliq2012-wikipage />RFERL corpora<section end=azadliq2012-wikipage />

1,013 bytes (115 words) - 22:42, 12 August 2014
Uighur and Turkish/Paper
* Evaluate system on corpora === Various Potential Corpora ===

10 KB (1,483 words) - 07:00, 14 August 2018
Indonesian
=== Corpora === * [https://corpora.uni-leipzig.de/en?corpusId=ind_mixed_2013 Leipzig Corpora Collection - Indonesian]

5 KB (629 words) - 13:08, 21 December 2019
Tungusic languages
===Corpora===

2 KB (172 words) - 17:09, 27 March 2017
Constraint-based lexical selection module
...ine. Rules can be manually written, or learnt from monolingual or parallel corpora. {{main|Learning rules from parallel and non-parallel corpora}}

19 KB (2,820 words) - 15:26, 11 April 2023
Building a pseudo-parallel corpus
...language model for the target language in order to create pseudo-parallel corpora, and use them in the same way as parallel ones. IRSTLM is a tool for building n-gram language models from corpora. It supports different smoothing and interpolation methods, including Writt

3 KB (364 words) - 23:25, 23 August 2012
Génération de règles de sélection lexicale depuis un corpus parallèle
* d'un corpus parallèle (voir [[Corpora]]) Nous alors faire l'exemple avec [[Corpora|EuroParl]] et la paire anglais vers espagnol d'Apertium.

9 KB (1,445 words) - 14:05, 7 October 2014
Apertium-nno-nob/kjektåkunne
$ xzcat ~/corpora/nob/*ntb*.xz | head -100000 | apertium -d . nob-nno_e > 2019-09-30.before $ xzcat ~/corpora/nob/*ntb*.xz | head -100000 | apertium -d . nob-nno_e > 2019-09-30.after

2 KB (327 words) - 08:02, 1 October 2019
Iranian languages
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc

22 KB (2,532 words) - 11:36, 30 July 2018
Task ideas for Google Code-in (2013)
...like the page on [[Aromanian]]—i.e., all available dictionaries, grammars, corpora, machine translators, etc., print or digital, where available, whether Free

68 KB (10,323 words) - 15:37, 25 October 2014
Sardo e italiano/Rapporto finale
...essicale e dell’analisi contrastiva è stata provvidenziale la creazione di corpora costituiti da testi redatti nella variante LSC, estrapolati da riviste on-l ...vocabolario Logudorese-italiano di Mario Casu e l’analisi approfondita dei corpora paralleli che ci hanno permesso di capire quale fosse, caso per caso, il ma

13 KB (1,910 words) - 11:34, 23 August 2016
Tatar and Russian
* Corpus testvoc = apertium-tat-rus/testvoc/corpus/trimmed-coverage.sh. Corpora can be found in the turkiccorpora repository. |colspan="2" rowspan="4"| Corpus testvoc clean on all of the available corpora ||rowspan="4"| ||rowspan="4" colspan="2" style="text-align: center"| ✗||r

8 KB (1,006 words) - 12:48, 9 March 2018
Flyer
or adapting the software to fit your needs. Existing free (GPL) data and corpora easily reusable to feed Apertium's dictionaries are also welcome. ...esidades particulares. También se agradece la disponibilización de datos y corpora libres (GPL) que sean reutilizables para mejorar los diccionarios de Aperti

26 KB (3,122 words) - 06:25, 27 May 2021
Mayan languages
=== Annotated corpora ===

3 KB (241 words) - 20:44, 9 September 2020
Google Summer of Code/Report 2013
===Application for "Interface for creating tagged corpora" GSOC 2013===

2 KB (200 words) - 08:21, 13 January 2015
Kazakh and Tatar
...pertium-kaz/stats|~{{:apertium-kaz/stats/average}}%]] coverage over random corpora ...pertium-tat/stats|~{{:apertium-tat/stats/average}}%]] coverage over random corpora

4 KB (586 words) - 01:53, 10 March 2018
Hindi and Bengali
===Corpora=== * [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]

4 KB (557 words) - 05:45, 25 August 2021

Search results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools