Search results

Narimann/GSOC 2019 proposal: Kazakh-Turkish and Turkish-Kazakh
Word Sense Disambiguation for WordNet corpora ...oblem is that SMT requires a lot of data in the form of parallel languages corpora, since they very addicted to data, and many languages cannot afford it. Whi

8 KB (1,094 words) - 13:10, 14 April 2019
Semitic languages
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc

20 KB (2,336 words) - 18:10, 14 April 2015
Linguistic Resources Document
...t consisting of a set of linguistic resources (dictionaries, cross models, corpora, links to other LRDs, etc.). ...uistics resources: morphological and bilingual dictionaries, cross models, corpora, etc.</description>

8 KB (902 words) - 09:19, 6 October 2014
Lexical selection
* [[Learning rules from parallel and non-parallel corpora]] ...sing statistics and requires 1) slightly pre-processed dictionaries and 2) corpora to train the module. '''The module is turned off in most cases as it does n

4 KB (625 words) - 08:36, 29 April 2015
Extracting bilingual dictionaries with Giza++
-lm 0:5:/home/fran/corpora/europarl/europarl.lm:0 >log 2>&1 & ...e> script to generate relative frequency lists of in-domain and out-domain corpora.

5 KB (680 words) - 11:53, 26 September 2016
Testvoc
xzcat corpora/nno.xz | tr -d '#@/' | apertium -d . nno-nob-dgen | grep '.\{0,6\}[#@/].\{0

9 KB (1,400 words) - 22:30, 18 January 2021
Meta-evaluation
** Overall accuracy (over parallel corpora): WER/PER/BLEU ...generalised script that supports hfst and lttoolbox binaries and arbitrary corpora would be good. It should also (optionally) output hitparades (e.g., freque

2 KB (246 words) - 02:32, 1 June 2019
Hectoralos/GSOC 2019 proposal: Catalan-Italian and Catalan-Portuguese
...an give unsatisfactory results (WER ≈ 30%, coverage below 85% in Wikipedia corpora). Both were published in 2009 and, apparently, no one has worked on them si

16 KB (2,285 words) - 06:46, 12 April 2019
Assimilation Evaluation Toolkit
Corpora in { gap } are large collections of texts enhanced with special markup. The Corpora in { gap } are large collections of { gap } with { gap }. They allow lingui

9 KB (1,368 words) - 09:04, 23 April 2015
Automated extraction of lexical resources
corpora. ...the precess, creating an easy-to-use framework for using constraints with corpora in order to obtain information about words of interest; and later to provid

6 KB (928 words) - 13:57, 3 April 2009
Task ideas for Google Code-in/Documentation of resources
* large monolingual corpora of the language * parallel corpora of the language and some other language

1 KB (202 words) - 19:55, 12 April 2021
Crossdics
...ment specifying a set of linguistic resources (dictionaries, cross models, corpora, other LRD files, etc).

5 KB (633 words) - 13:29, 6 October 2017
Google Summer of Code/Application 2008
...ger and an initial set of translation rules from monolingual and bilingual corpora.

8 KB (1,255 words) - 19:50, 12 April 2021
Ideas for Google Summer of Code/Interface for creating tagged corpora
...are creating unvaluable linguistic resources such as disambiguated tagged corpora. [[Category:Ideas for Google Summer of Code|Interface for creating tagged corpora]]

2 KB (269 words) - 21:26, 5 April 2013
Frankier/GSOC 2016 submission
Had an idea of fixing of out sync corpora automatically and started an "MVP" here: https://github.com/frankier/aperti ...opt (rather than the quality of its output). It could be used to help keep corpora, tagger models and morphologies in sync (though poking and possible automat

3 KB (456 words) - 18:17, 29 August 2016
Crimean Tatar and Turkish/Work plan
# all pronouns from Crimean Tatar corpora are translated without debug symbols * analyse corpora with crh-morph mode

4 KB (496 words) - 18:27, 19 June 2017
Crimean Tatar and Turkish/GSoC Report
...an vocabulary were used to good effect to reach a high coverage on all the corpora.

4 KB (551 words) - 23:52, 28 August 2017
Ideas for Google Summer of Code/Add a new variety to an existing language
...ing with dictionaries, lexical selection rules, transfer rules, scripting, corpora. The objective is to facilitate the generation of varieties for languages w ...cial languages of EU? : Then this task will be hard. Pairs which have huge corpora of parallel texts, like the 24 official EU languages or the 3 EU working la

2 KB (377 words) - 19:18, 25 January 2023
Wikipedia dumps
...edia dumps are useful for quickly getting a corpus. They are also the best corpora for making your language pair are useful for Wikipedia's [[Content Translat $ zcat ~/Nedlastingar/cx-corpora.nb2nn.text.tmx.gz \

3 KB (436 words) - 05:40, 10 April 2019
Task ideas for Google Code-in (2012)
...(grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.), along with the licences they are under. See for example the page [[

14 KB (2,007 words) - 03:06, 27 October 2013

Search results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools