Search results

Jump to navigation Jump to search
  • Word Sense Disambiguation for WordNet corpora ...oblem is that SMT requires a lot of data in the form of parallel languages corpora, since they very addicted to data, and many languages cannot afford it. Whi
    8 KB (1,094 words) - 13:10, 14 April 2019
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    20 KB (2,336 words) - 18:10, 14 April 2015
  • ...t consisting of a set of linguistic resources (dictionaries, cross models, corpora, links to other LRDs, etc.). ...uistics resources: morphological and bilingual dictionaries, cross models, corpora, etc.</description>
    8 KB (902 words) - 09:19, 6 October 2014
  • * [[Learning rules from parallel and non-parallel corpora]] ...sing statistics and requires 1) slightly pre-processed dictionaries and 2) corpora to train the module. '''The module is turned off in most cases as it does n
    4 KB (625 words) - 08:36, 29 April 2015
  • -lm 0:5:/home/fran/corpora/europarl/europarl.lm:0 >log 2>&1 & ...e> script to generate relative frequency lists of in-domain and out-domain corpora.
    5 KB (680 words) - 11:53, 26 September 2016
  • xzcat corpora/nno.xz | tr -d '#@/' | apertium -d . nno-nob-dgen | grep '.\{0,6\}[#@/].\{0
    9 KB (1,400 words) - 22:30, 18 January 2021
  • ** Overall accuracy (over parallel corpora): WER/PER/BLEU ...generalised script that supports hfst and lttoolbox binaries and arbitrary corpora would be good. It should also (optionally) output hitparades (e.g., freque
    2 KB (246 words) - 02:32, 1 June 2019
  • ...an give unsatisfactory results (WER ≈ 30%, coverage below 85% in Wikipedia corpora). Both were published in 2009 and, apparently, no one has worked on them si
    16 KB (2,285 words) - 06:46, 12 April 2019
  • Corpora in { gap } are large collections of texts enhanced with special markup. The Corpora in { gap } are large collections of { gap } with { gap }. They allow lingui
    9 KB (1,368 words) - 09:04, 23 April 2015
  • corpora. ...the precess, creating an easy-to-use framework for using constraints with corpora in order to obtain information about words of interest; and later to provid
    6 KB (928 words) - 13:57, 3 April 2009
  • * large monolingual corpora of the language * parallel corpora of the language and some other language
    1 KB (202 words) - 19:55, 12 April 2021
  • ...ment specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc).
    5 KB (633 words) - 13:29, 6 October 2017
  • ...ger and an initial set of translation rules from monolingual and bilingual corpora.
    8 KB (1,255 words) - 19:50, 12 April 2021
  • Had an idea of fixing of out sync corpora automatically and started an "MVP" here: https://github.com/frankier/aperti ...opt (rather than the quality of its output). It could be used to help keep corpora, tagger models and morphologies in sync (though poking and possible automat
    3 KB (456 words) - 18:17, 29 August 2016
  • # all pronouns from Crimean Tatar corpora are translated without debug symbols * analyse corpora with crh-morph mode
    4 KB (496 words) - 18:27, 19 June 2017
  • ...an vocabulary were used to good effect to reach a high coverage on all the corpora.
    4 KB (551 words) - 23:52, 28 August 2017
  • ...ing with dictionaries, lexical selection rules, transfer rules, scripting, corpora. The objective is to facilitate the generation of varieties for languages w ...cial languages of EU? : Then this task will be hard. Pairs which have huge corpora of parallel texts, like the 24 official EU languages or the 3 EU working la
    2 KB (377 words) - 19:18, 25 January 2023
  • ...are creating unvaluable linguistic resources such as disambiguated tagged corpora. [[Category:Ideas for Google Summer of Code|Interface for creating tagged corpora]]
    2 KB (269 words) - 21:26, 5 April 2013
  • ...edia dumps are useful for quickly getting a corpus. They are also the best corpora for making your language pair are useful for Wikipedia's [[Content Translat $ zcat ~/Nedlastingar/cx-corpora.nb2nn.text.tmx.gz \
    3 KB (436 words) - 05:40, 10 April 2019
  • ...(grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.), along with the licences they are under. See for example the page [[
    14 KB (2,007 words) - 03:06, 27 October 2013

View (previous 20 | next 20) (20 | 50 | 100 | 250 | 500)