Search results

Talk:Task ideas for Google Code-in
|title=vim mode/tools for annotating dependency corpora in CG3 format |title=vim mode/tools for annotating dependency corpora in CoNLL-U format

397 KB (52,731 words) - 11:22, 10 December 2019
Talk:Ideas for Google Summer of Code
...ce of software that generates shallow-transfer rules from aligned parallel corpora. It could greatly speed up the creation of new language pairs by generating ...sfer-training-tools generates shallow-transfer rules from aligned parallel corpora. It uses an small set of lexicalised categories, categories that are usuall

71 KB (10,374 words) - 21:14, 18 January 2021
Turkic languages
The ultimate goal is to have multi-purpose transducers and annotated corpora (i.e. treebanks) for a variety of Turkic languages. These can then be pair Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc

35 KB (3,577 words) - 15:24, 1 October 2021
Germanic languages
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc

32 KB (3,684 words) - 06:16, 28 December 2018
Ideas for Google Code-In (2011)
...(grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.) for Aromanian, along with the licences they are under. || || [[User: ...=center| {{sc|research}} || 3. Easy || Create manually tagged corpora: Occitan || Fix tagging errors in a piece of analysed text, for use in tag

187 KB (21,006 words) - 22:14, 12 November 2012
User:N0nick/Application
====Morphological Analysers and Corpora==== Both said projects have collected and published vast Hebrew corpora files, collected from various sources.

13 KB (2,014 words) - 20:05, 4 June 2011
Languages
...however, a language package should have over 60% coverage on a variety of corpora and should probably have at least 2500 stems to be considered minimally use * The coverage of the transducer on a variety of corpora

15 KB (1,783 words) - 22:33, 1 February 2019
Learning rules from parallel and non-parallel corpora
== Estimating rules using parallel corpora == ...see [[Running the monolingual rule learning]] if you only have monolingual corpora).

14 KB (2,181 words) - 19:01, 17 August 2018
User:Naan Dhaan/User friendly lexical training
...in the face of updates to the third-party tools. Also, train on different corpora and add lexical selection rules to the languages which have few to no lexic * initiated non-parallel corpora training script(bash)

4 KB (645 words) - 16:41, 24 August 2021
Languages of the Volga-Kama region
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc === Corpora and corpora projects ===

9 KB (987 words) - 23:25, 22 December 2014
Contributing to an existing pair
* directory es-tagger-data : Contains data needed for the Spanish tagger (corpora, etc.) * directory ca-tagger-data : Contains data needed for the Catalan tagger (corpora, etc.)

50 KB (7,915 words) - 00:04, 10 March 2019
User:Deltamachine/proposal2018
...a are still less specific than Mediawiki articles.</p> <p>'''+''' parallel corpora are more likely to contain less noise.</p> <p>'''-''' the target side might ...parably small amount of postedited data and more or less suitable parallel corpora. I'm still looking for data, but current situation looks like this:

16 KB (2,445 words) - 09:19, 26 March 2018
Ideas for Google Summer of Code
| name = Dictionary induction from parallel corpora / Revive ReTraTos | description = Extract dictionaries from parallel corpora

23 KB (3,198 words) - 09:15, 4 March 2024
Corpora formats
...quirements for corpora, and a number of different formats for storing such corpora have sprung up. Some examples include: ...ng on). The following is an idea Jonathan has for implementing a standard corpora format for use by apertium.

5 KB (813 words) - 00:08, 28 December 2011
Sardu abbarra bivu!
...millions of words as in the statistical methods: it takes only two smaller corpora and a dictionary containing rules to conjugate verbs and to match nouns and ...anguage pairs of the same linguistic family without the need of linguistic corpora. The experience of Apertium with several minoritised languages such as Occi

15 KB (2,339 words) - 00:41, 4 June 2018
Uralic languages
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc

22 KB (2,520 words) - 23:09, 22 December 2014
Romance languages
=== Annotated corpora ===

18 KB (2,312 words) - 18:25, 18 September 2016
User:Daedalus/GSoC2024Proposal
...t bidirectional dictionaries for a language pair, given a pair of parallel corpora - i.e., the same content in two different languages using a single program. ...the source code can be accessed [https://github.com/gs-chaitanya/parallel-corpora-alignment here]. I built a Python script that used the Apertium monolingual

6 KB (918 words) - 06:00, 2 April 2024
User:Sushain/GermanicLanguages
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc

26 KB (3,036 words) - 07:04, 14 December 2014
Balkan languages
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc

12 KB (1,308 words) - 19:27, 27 August 2017

Search results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools