Search results

Kazakh and Tatar/Diary
...t that they could increase the coverage significantly, because the testing corpora are either news or WP).

8 KB (1,205 words) - 21:50, 19 July 2012
Kazakh and Tatar/Remaining unanalysed forms
...rds is a semi-standard convention (it's occurring at least some in all the corpora). We should figure out where this is happening and see if it's something w

28 KB (769 words) - 11:34, 13 April 2013
English to Polish
{{see-also|Corpora}}

11 KB (1,750 words) - 13:24, 10 December 2010
L'outil ReTraTos
.... (2008) "Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation". ''Machine Translatio

8 KB (1,301 words) - 09:43, 6 October 2014
Fabriquer des dictionnaires
WORDLIST=/home/spectre/corpora/afrikaans-meester-utf8.txt

11 KB (1,852 words) - 07:04, 8 October 2014
Kazakh and Tatar/TODO
...ns: apertium-kaz-tat has at least 15000 top stems, 95% coverage on all the corpora we have, and no more than 15% Word-Error-Rate on any randomly selected text

4 KB (603 words) - 21:20, 31 August 2015
Apertium-neural
* Optimised for small corpora (under 100k parallel sentences)

869 bytes (111 words) - 15:06, 29 June 2020
ReTraTos
...li08j.pdf Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation]". ''Machine Translati

8 KB (1,273 words) - 09:32, 3 May 2024
Bosnian-Croatian-Montenegrin-Serbian and Slovenian
The corpora used for this task can be found here: http://www.statmt.org/europarl/v7/sl-

6 KB (625 words) - 16:54, 1 July 2013
Corpus test
Before you start you first need a [[Corpora|corpus]]. Look in apertium-eo-en/corpa/enwiki.crp.txt.bz2 (run bunzip2 -c e

6 KB (966 words) - 20:16, 23 July 2021
Transducers as flag diacritics and their topology learning
...learning to construct such n-level transducers, working with some learning corpora, and mostly using the OSTIA state-merging algorithm.

6 KB (842 words) - 06:41, 20 October 2014
Apertium-quality/Application Documentation
-x, --xml Output corpora in XML format

9 KB (1,003 words) - 11:02, 30 August 2011
Test de corpus
Avant que vous vous commenciez vous avez d'abord besoin d'un [[Corpora|corpus]]. Regardez dans apertium-eo-en/corpa/enwiki.crp.txt.bz2 ? Lancez

7 KB (1,057 words) - 11:52, 7 October 2014
Concordancer
* Efficiency: Make it scale up to corpora of millions of words. This might involve doing (a) pre-analysis of the corp

3 KB (549 words) - 02:11, 10 March 2018
Incorporating guessing into Apertium
...to make a translation guesser using the existing bidix and two monolingual corpora in a similar way.

4 KB (558 words) - 13:07, 26 June 2020
Chinese and Spanish
...ingual dictionaries: At the beginning we started using Chinese and Spanish corpora in order to obtain lots of Chinese-Spanish word pairs. Using the Stanford S

7 KB (830 words) - 21:33, 30 September 2013
Ideas for Google Summer of Code/Adopt a language pair
...story] (or [https://github.com/taruen/apertiumpp/tree/master/data4apertium/corpora/jam from here] ) as possible — Minimum one sentence. ...stvoc]] clean, and has a coverage of around 80% or more on a range of free corpora.

6 KB (1,024 words) - 15:22, 20 April 2021
Resources
* [[Corpora]]

1 KB (164 words) - 05:20, 4 December 2019
Shallow syntactic function labeller
...l trained on prepared datasets which were made from parsed syntax-labelled corpora (mostly UD-treebanks). The classifier analyzes the given sequence of morpho

5 KB (764 words) - 01:40, 8 March 2018
Tatar and Bashkir/Work plan
|0|| || collecting Tatar and Bashkir corpora, scraping a parallel corpus, making a frequency dictionary

2 KB (228 words) - 10:55, 9 May 2018

Search results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools