Search results

Turkic languages
The ultimate goal is to have multi-purpose transducers and annotated corpora (i.e. treebanks) for a variety of Turkic languages. These can then be pair Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc

35 KB (3,577 words) - 15:24, 1 October 2021
Germanic languages
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc

32 KB (3,684 words) - 06:16, 28 December 2018
Ideas for Google Code-In (2011)
...(grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.) for Aromanian, along with the licences they are under. || || [[User: ...=center| {{sc|research}} || 3. Easy || Create manually tagged corpora: Occitan || Fix tagging errors in a piece of analysed text, for use in tag

187 KB (21,006 words) - 22:14, 12 November 2012
Languages
...however, a language package should have over 60% coverage on a variety of corpora and should probably have at least 2500 stems to be considered minimally use * The coverage of the transducer on a variety of corpora

15 KB (1,783 words) - 22:33, 1 February 2019
Learning rules from parallel and non-parallel corpora
== Estimating rules using parallel corpora == ...see [[Running the monolingual rule learning]] if you only have monolingual corpora).

14 KB (2,181 words) - 19:01, 17 August 2018
Languages of the Volga-Kama region
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc === Corpora and corpora projects ===

9 KB (987 words) - 23:25, 22 December 2014
Contributing to an existing pair
* directory es-tagger-data : Contains data needed for the Spanish tagger (corpora, etc.) * directory ca-tagger-data : Contains data needed for the Catalan tagger (corpora, etc.)

50 KB (7,915 words) - 00:04, 10 March 2019
Ideas for Google Summer of Code
| name = Dictionary induction from parallel corpora / Revive ReTraTos | description = Extract dictionaries from parallel corpora

23 KB (3,198 words) - 09:15, 4 March 2024
Corpora formats
...quirements for corpora, and a number of different formats for storing such corpora have sprung up. Some examples include: ...ng on). The following is an idea Jonathan has for implementing a standard corpora format for use by apertium.

5 KB (813 words) - 00:08, 28 December 2011
Sardu abbarra bivu!
...millions of words as in the statistical methods: it takes only two smaller corpora and a dictionary containing rules to conjugate verbs and to match nouns and ...anguage pairs of the same linguistic family without the need of linguistic corpora. The experience of Apertium with several minoritised languages such as Occi

15 KB (2,339 words) - 00:41, 4 June 2018
Uralic languages
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc

22 KB (2,520 words) - 23:09, 22 December 2014
Romance languages
=== Annotated corpora ===

18 KB (2,312 words) - 18:25, 18 September 2016
Balkan languages
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc

12 KB (1,308 words) - 19:27, 27 August 2017
English and Kazakh
* Collect parallel kaz-eng corpora! By new plan, we focused on adding vocabulary from 4 corpora.

20 KB (2,856 words) - 06:26, 27 May 2021
Mandarin Chinese
* [http://corpus.leeds.ac.uk/query-zh.html A Collection of Chinese Corpora and Frequency Lists.] ===Corpora===

16 KB (2,148 words) - 03:28, 16 December 2015
Xhosa
....za/Faculties/ART/Xhosa/Pages/Research-.aspx "Cross linguistics upon Xhosa Corpora Research"] == Monolingual/Parallel Corpora ==

4 KB (566 words) - 05:57, 18 April 2020
Generating lexical-selection rules from a parallel corpus
{{deprecated2|Learning rules from parallel and non-parallel corpora}} * a parallel corpus (see [[Corpora]])

15 KB (2,206 words) - 13:58, 7 October 2014
Building dictionaries
==Getting corpora== WORDLIST=/home/spectre/corpora/afrikaans-meester-utf8.txt

16 KB (2,566 words) - 21:36, 15 March 2020
UD annotatrix/UD annotatrix at GSoC 2017
...ert the data between the formats. It also allows to either upload or paste corpora in plain text and then convert them into CoNLL-U. ...des support for saving user corpora on server and then accessing the saved corpora via unique URL.

6 KB (930 words) - 15:59, 29 August 2017
Getting started with Annotatrix
...is an open source tool included on the Apertium project that let you train corpora and manage related files with a friendly user interface and letting you foc ...rom this view you are able to see corpora and training details, insert new corpora and train them easily

8 KB (1,376 words) - 11:14, 29 October 2014

Search results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools