Search results
Jump to navigation
Jump to search
- Word Sense Disambiguation for WordNet corpora ...oblem is that SMT requires a lot of data in the form of parallel languages corpora, since they very addicted to data, and many languages cannot afford it. Whi8 KB (1,094 words) - 13:10, 14 April 2019
- Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc20 KB (2,336 words) - 18:10, 14 April 2015
- ...t consisting of a set of linguistic resources (dictionaries, cross models, corpora, links to other LRDs, etc.). ...uistics resources: morphological and bilingual dictionaries, cross models, corpora, etc.</description>8 KB (902 words) - 09:19, 6 October 2014
- * [[Learning rules from parallel and non-parallel corpora]] ...sing statistics and requires 1) slightly pre-processed dictionaries and 2) corpora to train the module. '''The module is turned off in most cases as it does n4 KB (625 words) - 08:36, 29 April 2015
- -lm 0:5:/home/fran/corpora/europarl/europarl.lm:0 >log 2>&1 & ...e> script to generate relative frequency lists of in-domain and out-domain corpora.5 KB (680 words) - 11:53, 26 September 2016
- xzcat corpora/nno.xz | tr -d '#@/' | apertium -d . nno-nob-dgen | grep '.\{0,6\}[#@/].\{09 KB (1,400 words) - 22:30, 18 January 2021
- ** Overall accuracy (over parallel corpora): WER/PER/BLEU ...generalised script that supports hfst and lttoolbox binaries and arbitrary corpora would be good. It should also (optionally) output hitparades (e.g., freque2 KB (246 words) - 02:32, 1 June 2019
- ...an give unsatisfactory results (WER ≈ 30%, coverage below 85% in Wikipedia corpora). Both were published in 2009 and, apparently, no one has worked on them si16 KB (2,285 words) - 06:46, 12 April 2019
- Corpora in { gap } are large collections of texts enhanced with special markup. The Corpora in { gap } are large collections of { gap } with { gap }. They allow lingui9 KB (1,368 words) - 09:04, 23 April 2015
- corpora. ...the precess, creating an easy-to-use framework for using constraints with corpora in order to obtain information about words of interest; and later to provid6 KB (928 words) - 13:57, 3 April 2009
- * large monolingual corpora of the language * parallel corpora of the language and some other language1 KB (202 words) - 19:55, 12 April 2021
- ...ment specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc).5 KB (633 words) - 13:29, 6 October 2017
- ...ger and an initial set of translation rules from monolingual and bilingual corpora.8 KB (1,255 words) - 19:50, 12 April 2021
- ...are creating unvaluable linguistic resources such as disambiguated tagged corpora. [[Category:Ideas for Google Summer of Code|Interface for creating tagged corpora]]2 KB (269 words) - 21:26, 5 April 2013
- Had an idea of fixing of out sync corpora automatically and started an "MVP" here: https://github.com/frankier/aperti ...opt (rather than the quality of its output). It could be used to help keep corpora, tagger models and morphologies in sync (though poking and possible automat3 KB (456 words) - 18:17, 29 August 2016
- # all pronouns from Crimean Tatar corpora are translated without debug symbols * analyse corpora with crh-morph mode4 KB (496 words) - 18:27, 19 June 2017
- ...an vocabulary were used to good effect to reach a high coverage on all the corpora.4 KB (551 words) - 23:52, 28 August 2017
- ...ing with dictionaries, lexical selection rules, transfer rules, scripting, corpora. The objective is to facilitate the generation of varieties for languages w ...cial languages of EU? : Then this task will be hard. Pairs which have huge corpora of parallel texts, like the 24 official EU languages or the 3 EU working la2 KB (377 words) - 19:18, 25 January 2023
- ...edia dumps are useful for quickly getting a corpus. They are also the best corpora for making your language pair are useful for Wikipedia's [[Content Translat $ zcat ~/Nedlastingar/cx-corpora.nb2nn.text.tmx.gz \3 KB (436 words) - 05:40, 10 April 2019
- ...(grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.), along with the licences they are under. See for example the page [[14 KB (2,007 words) - 03:06, 27 October 2013