Search results

Jump to navigation Jump to search
  • |title=vim mode/tools for annotating dependency corpora in CG3 format |title=vim mode/tools for annotating dependency corpora in CoNLL-U format
    397 KB (52,731 words) - 11:22, 10 December 2019
  • ...ce of software that generates shallow-transfer rules from aligned parallel corpora. It could greatly speed up the creation of new language pairs by generating ...sfer-training-tools generates shallow-transfer rules from aligned parallel corpora. It uses an small set of lexicalised categories, categories that are usuall
    71 KB (10,374 words) - 21:14, 18 January 2021
  • The ultimate goal is to have multi-purpose transducers and annotated corpora (i.e. treebanks) for a variety of Turkic languages. These can then be pair Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    35 KB (3,577 words) - 15:24, 1 October 2021
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    32 KB (3,684 words) - 06:16, 28 December 2018
  • ...(grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.) for Aromanian, along with the licences they are under. || || [[User: ...=center| {{sc|research}} || 3. Easy || Create manually tagged corpora: Occitan || Fix tagging errors in a piece of analysed text, for use in tag
    187 KB (21,006 words) - 22:14, 12 November 2012
  • ====Morphological Analysers and Corpora==== Both said projects have collected and published vast Hebrew corpora files, collected from various sources.
    13 KB (2,014 words) - 20:05, 4 June 2011
  • ...however, a language package should have over 60% coverage on a variety of corpora and should probably have at least 2500 stems to be considered minimally use * The coverage of the transducer on a variety of corpora
    15 KB (1,783 words) - 22:33, 1 February 2019
  • == Estimating rules using parallel corpora == ...see [[Running the monolingual rule learning]] if you only have monolingual corpora).
    14 KB (2,181 words) - 19:01, 17 August 2018
  • ...in the face of updates to the third-party tools. Also, train on different corpora and add lexical selection rules to the languages which have few to no lexic * initiated non-parallel corpora training script(bash)
    4 KB (645 words) - 16:41, 24 August 2021
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc === Corpora and corpora projects ===
    9 KB (987 words) - 23:25, 22 December 2014
  • * directory es-tagger-data : Contains data needed for the Spanish tagger (corpora, etc.) * directory ca-tagger-data : Contains data needed for the Catalan tagger (corpora, etc.)
    50 KB (7,915 words) - 00:04, 10 March 2019
  • ...a are still less specific than Mediawiki articles.</p> <p>'''+''' parallel corpora are more likely to contain less noise.</p> <p>'''-''' the target side might ...parably small amount of postedited data and more or less suitable parallel corpora. I'm still looking for data, but current situation looks like this:
    16 KB (2,445 words) - 09:19, 26 March 2018
  • | name = Dictionary induction from parallel corpora / Revive ReTraTos | description = Extract dictionaries from parallel corpora
    23 KB (3,198 words) - 09:15, 4 March 2024
  • ...quirements for corpora, and a number of different formats for storing such corpora have sprung up. Some examples include: ...ng on). The following is an idea Jonathan has for implementing a standard corpora format for use by apertium.
    5 KB (813 words) - 00:08, 28 December 2011
  • ...millions of words as in the statistical methods: it takes only two smaller corpora and a dictionary containing rules to conjugate verbs and to match nouns and ...anguage pairs of the same linguistic family without the need of linguistic corpora. The experience of Apertium with several minoritised languages such as Occi
    15 KB (2,339 words) - 00:41, 4 June 2018
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    22 KB (2,520 words) - 23:09, 22 December 2014
  • === Annotated corpora ===
    18 KB (2,312 words) - 18:25, 18 September 2016
  • ...t bidirectional dictionaries for a language pair, given a pair of parallel corpora - i.e., the same content in two different languages using a single program. ...the source code can be accessed [https://github.com/gs-chaitanya/parallel-corpora-alignment here]. I built a Python script that used the Apertium monolingual
    6 KB (918 words) - 06:00, 2 April 2024
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    26 KB (3,036 words) - 07:04, 14 December 2014
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    12 KB (1,308 words) - 19:27, 27 August 2017

View (previous 20 | next 20) (20 | 50 | 100 | 250 | 500)