Search results

Jump to navigation Jump to search
  • The ultimate goal is to have multi-purpose transducers and annotated corpora (i.e. treebanks) for a variety of Turkic languages. These can then be pair Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    35 KB (3,577 words) - 15:24, 1 October 2021
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    32 KB (3,684 words) - 06:16, 28 December 2018
  • ...(grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.) for Aromanian, along with the licences they are under. || || [[User: ...=center| {{sc|research}} || 3. Easy || Create manually tagged corpora: Occitan || Fix tagging errors in a piece of analysed text, for use in tag
    187 KB (21,006 words) - 22:14, 12 November 2012
  • ...however, a language package should have over 60% coverage on a variety of corpora and should probably have at least 2500 stems to be considered minimally use * The coverage of the transducer on a variety of corpora
    15 KB (1,783 words) - 22:33, 1 February 2019
  • == Estimating rules using parallel corpora == ...see [[Running the monolingual rule learning]] if you only have monolingual corpora).
    14 KB (2,181 words) - 19:01, 17 August 2018
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc === Corpora and corpora projects ===
    9 KB (987 words) - 23:25, 22 December 2014
  • * directory es-tagger-data : Contains data needed for the Spanish tagger (corpora, etc.) * directory ca-tagger-data : Contains data needed for the Catalan tagger (corpora, etc.)
    50 KB (7,915 words) - 00:04, 10 March 2019
  • | name = Dictionary induction from parallel corpora / Revive ReTraTos | description = Extract dictionaries from parallel corpora
    23 KB (3,198 words) - 09:15, 4 March 2024
  • ...quirements for corpora, and a number of different formats for storing such corpora have sprung up. Some examples include: ...ng on). The following is an idea Jonathan has for implementing a standard corpora format for use by apertium.
    5 KB (813 words) - 00:08, 28 December 2011
  • ...millions of words as in the statistical methods: it takes only two smaller corpora and a dictionary containing rules to conjugate verbs and to match nouns and ...anguage pairs of the same linguistic family without the need of linguistic corpora. The experience of Apertium with several minoritised languages such as Occi
    15 KB (2,339 words) - 00:41, 4 June 2018
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    22 KB (2,520 words) - 23:09, 22 December 2014
  • === Annotated corpora ===
    18 KB (2,312 words) - 18:25, 18 September 2016
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    12 KB (1,308 words) - 19:27, 27 August 2017
  • * Collect parallel kaz-eng corpora! By new plan, we focused on adding vocabulary from 4 corpora.
    20 KB (2,856 words) - 06:26, 27 May 2021
  • * [http://corpus.leeds.ac.uk/query-zh.html A Collection of Chinese Corpora and Frequency Lists.] ===Corpora===
    16 KB (2,148 words) - 03:28, 16 December 2015
  • ....za/Faculties/ART/Xhosa/Pages/Research-.aspx "Cross linguistics upon Xhosa Corpora Research"] == Monolingual/Parallel Corpora ==
    4 KB (566 words) - 05:57, 18 April 2020
  • {{deprecated2|Learning rules from parallel and non-parallel corpora}} * a parallel corpus (see [[Corpora]])
    15 KB (2,206 words) - 13:58, 7 October 2014
  • ==Getting corpora== WORDLIST=/home/spectre/corpora/afrikaans-meester-utf8.txt
    16 KB (2,566 words) - 21:36, 15 March 2020
  • ...ert the data between the formats. It also allows to either upload or paste corpora in plain text and then convert them into CoNLL-U. ...des support for saving user corpora on server and then accessing the saved corpora via unique URL.
    6 KB (930 words) - 15:59, 29 August 2017
  • ...is an open source tool included on the Apertium project that let you train corpora and manage related files with a friendly user interface and letting you foc ...rom this view you are able to see corpora and training details, insert new corpora and train them easily
    8 KB (1,376 words) - 11:14, 29 October 2014

View (previous 20 | next 20) (20 | 50 | 100 | 250 | 500)