Search results

Jump to navigation Jump to search
  • The ultimate goal is to have multi-purpose transducers and annotated corpora (i.e. treebanks) for a variety of Turkic languages. These can then be pair Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    35 KB (3,577 words) - 15:24, 1 October 2021
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    32 KB (3,684 words) - 06:16, 28 December 2018
  • ...(grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.) for Aromanian, along with the licences they are under. || || [[User: ...=center| {{sc|research}} || 3. Easy || Create manually tagged corpora: Occitan || Fix tagging errors in a piece of analysed text, for use in tag
    187 KB (21,006 words) - 22:14, 12 November 2012
  • ...however, a language package should have over 60% coverage on a variety of corpora and should probably have at least 2500 stems to be considered minimally use * The coverage of the transducer on a variety of corpora
    15 KB (1,783 words) - 22:33, 1 February 2019
  • == Estimating rules using parallel corpora == ...see [[Running the monolingual rule learning]] if you only have monolingual corpora).
    14 KB (2,181 words) - 19:01, 17 August 2018
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc === Corpora and corpora projects ===
    9 KB (987 words) - 23:25, 22 December 2014
  • * directory es-tagger-data : Contains data needed for the Spanish tagger (corpora, etc.) * directory ca-tagger-data : Contains data needed for the Catalan tagger (corpora, etc.)
    50 KB (7,915 words) - 00:04, 10 March 2019
  • | name = Dictionary induction from parallel corpora / Revive ReTraTos | description = Extract dictionaries from parallel corpora
    23 KB (3,198 words) - 09:15, 4 March 2024
  • ...quirements for corpora, and a number of different formats for storing such corpora have sprung up. Some examples include: ...ng on). The following is an idea Jonathan has for implementing a standard corpora format for use by apertium.
    5 KB (813 words) - 00:08, 28 December 2011
  • ...millions of words as in the statistical methods: it takes only two smaller corpora and a dictionary containing rules to conjugate verbs and to match nouns and ...anguage pairs of the same linguistic family without the need of linguistic corpora. The experience of Apertium with several minoritised languages such as Occi
    15 KB (2,339 words) - 00:41, 4 June 2018
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    22 KB (2,520 words) - 23:09, 22 December 2014
  • === Annotated corpora ===
    18 KB (2,312 words) - 18:25, 18 September 2016
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    12 KB (1,308 words) - 19:27, 27 August 2017
  • * Collect parallel kaz-eng corpora! By new plan, we focused on adding vocabulary from 4 corpora.
    20 KB (2,856 words) - 06:26, 27 May 2021
  • * [http://corpus.leeds.ac.uk/query-zh.html A Collection of Chinese Corpora and Frequency Lists.] ===Corpora===
    16 KB (2,148 words) - 03:28, 16 December 2015
  • ....za/Faculties/ART/Xhosa/Pages/Research-.aspx "Cross linguistics upon Xhosa Corpora Research"] == Monolingual/Parallel Corpora ==
    4 KB (566 words) - 05:57, 18 April 2020
  • {{deprecated2|Learning rules from parallel and non-parallel corpora}} * a parallel corpus (see [[Corpora]])
    15 KB (2,206 words) - 13:58, 7 October 2014
  • ==Getting corpora== WORDLIST=/home/spectre/corpora/afrikaans-meester-utf8.txt
    16 KB (2,566 words) - 21:36, 15 March 2020
  • ...ert the data between the formats. It also allows to either upload or paste corpora in plain text and then convert them into CoNLL-U. ...des support for saving user corpora on server and then accessing the saved corpora via unique URL.
    6 KB (930 words) - 15:59, 29 August 2017
  • ...is an open source tool included on the Apertium project that let you train corpora and manage related files with a friendly user interface and letting you foc ...rom this view you are able to see corpora and training details, insert new corpora and train them easily
    8 KB (1,376 words) - 11:14, 29 October 2014
  • * [[Learning rules from parallel and non-parallel corpora]] – this is the current documentation on training/inferring rules ** preprocess corpora
    4 KB (541 words) - 13:46, 29 March 2021
  • ===Corpora=== * [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus] Creative Common
    6 KB (806 words) - 00:45, 7 December 2018
  • Apertium-regtest is a program for managing regression tests and [[Corpus test|corpora]]. # in the browser, select one or all of the corpora to rerun tests for
    11 KB (1,823 words) - 12:17, 6 June 2023
  • ===Corpora=== * [http://childes.talkbank.org/access/French/ CHIDES Corpora]. [http://talkbank.org/share/rules.html ''Requires reference'']
    15 KB (2,081 words) - 07:14, 12 August 2020
  • ..., Tommi Pirinen, Jonathan Washington. "Finite-state morphologies and text corpora as resources for improving morphological descriptions". [https://sites.goog ...f Inferring shallow-transfer machine translation rules from small parallel corpora]". In Journal of Artificial Intelligence Research. volume 34, p. 605-635.
    33 KB (4,418 words) - 11:52, 29 December 2021
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    12 KB (1,017 words) - 09:06, 18 January 2022
  • ====Training on Corpora with Ambiguous Lexical Units==== ...m-tagger</code> '', the tagger prints warnings about ambiguous analyses in corpora to stderr.''
    20 KB (3,229 words) - 20:06, 12 March 2018
  • * Tufiş, D., A. M. Barbu, V. Pătraşcu, G. Rotariu, and C. Popescu. "Corpora and Corpus-Based Morpho-Lexical Processing."&nbsp;''Recent Advances in Roma ===Corpora===
    7 KB (889 words) - 09:53, 28 November 2018
  • ...why the errors are so high. Re-evaluation will be done as soon as the two corpora are manually realigned. ...nslated corpora, and the label will be extracted from the manually written corpora. This method might provide better results since the model will be trained o
    6 KB (838 words) - 17:47, 25 July 2012
  • ==Corpora== The tagged corpora used in the experiments are found in the monolingual packages in [[language
    16 KB (1,448 words) - 16:50, 22 August 2017
  • ...n be found [https://github.com/taruen/apertiumpp/tree/master/data4apertium/corpora here]. ...m/corpora/jam/uzb.txt | apertium -d . uzb-kaa) -ref ../../../data4apertium/corpora/jam/kaa.txt
    5 KB (515 words) - 14:34, 1 September 2019
  • ===Corpora=== * [http://opus.nlpl.eu/ Hindi-English Parallel Corpora]
    8 KB (1,079 words) - 11:17, 3 December 2018
  • ===Corpora === * [http://www.elra.info/en/catalogues/free-resources/nepali-corpora/ ''"Nepali"'']
    8 KB (948 words) - 19:59, 30 December 2017
  • === Obtaining corpora (and getAlignmentWithText.pl) === The corpora need to be untarred, and inserted into a new, common directory.
    7 KB (973 words) - 02:52, 20 May 2021
  • Automatic shallow-transfer rules generation from parallel corpora ...in statistical machine translation, that have been extracted from parallel corpora and extended with a set of restrictions controlling their application.
    4 KB (525 words) - 19:21, 17 September 2009
  • ~/source/corpora/lm/en.blm europarl.en-es.es.multi-trimmed -f > europarl.en-es.es.annotated MODEL=/home/philip/Apertium/corpora/language-models/en/setimes.en.5.blm
    12 KB (1,634 words) - 18:26, 26 September 2016
  • ...e project was a success. All the goals have been achieved: the creation of corpora in LSC; italian monodix: apertium-srd-srd.dix: 51,743 words; apertium-ita-i ...rned to use markup languages (XML and HTML) for the creation of linguistic corpora. At present, I attend a Master’s Degree in Translation of specialized tex
    21 KB (3,171 words) - 14:34, 3 April 2017
  • == Corpora and Coverage == ...he help of mentors on Kipchak languages. Most frequent unknown tokens from corpora of each language (mostly consisting of Wikipedia entries, Bible and Quran)
    7 KB (798 words) - 18:30, 26 August 2019
  • ===Corpora=== * [https://korpora.zim.uni-duisburg-essen.de/Limas/ Corpora from Limas z.Hd. Prof. Dr. Bernhard Schröder Universität Duisburg-Essen,
    8 KB (900 words) - 10:15, 4 December 2018
  • == Corpora and Coverage == Our main corpora consisted of [https://www.rfa.org/uyghur/ RFA], [http://uy.ts.cn/ Tanritor]
    5 KB (607 words) - 13:25, 12 August 2018
  • == Corpora == ...8ba2a9c0e50bc885bfad3bfbff3b4afbd.pdf Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech]
    7 KB (881 words) - 13:11, 12 December 2018
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    10 KB (1,263 words) - 06:04, 23 December 2014
  • == Corpora, sets and alignment == The parallel corpora for the Macedonian - English pair, a total of 207.778 parallel sentences, c
    5 KB (620 words) - 12:21, 27 July 2012
  • == Corpora == * wikipage: <section begin=azadliq2012-wikipage />RFERL corpora<section end=azadliq2012-wikipage />
    1,013 bytes (115 words) - 22:42, 12 August 2014
  • * Evaluate system on corpora === Various Potential Corpora ===
    10 KB (1,483 words) - 07:00, 14 August 2018
  • === Corpora === * [https://corpora.uni-leipzig.de/en?corpusId=ind_mixed_2013 Leipzig Corpora Collection - Indonesian]
    5 KB (629 words) - 13:08, 21 December 2019
  • ===Corpora===
    2 KB (172 words) - 17:09, 27 March 2017
  • ...ine. Rules can be manually written, or learnt from monolingual or parallel corpora. {{main|Learning rules from parallel and non-parallel corpora}}
    19 KB (2,820 words) - 15:26, 11 April 2023
  • * d'un corpus parallèle (voir [[Corpora]]) Nous alors faire l'exemple avec [[Corpora|EuroParl]] et la paire anglais vers espagnol d'Apertium.
    9 KB (1,445 words) - 14:05, 7 October 2014
  • ...language model for the target language in order to create pseudo-parallel corpora, and use them in the same way as parallel ones. IRSTLM is a tool for building n-gram language models from corpora. It supports different smoothing and interpolation methods, including Writt
    3 KB (364 words) - 23:25, 23 August 2012

View (previous 50 | next 50) (20 | 50 | 100 | 250 | 500)