Search results

Jump to navigation Jump to search
  • The ultimate goal is to have multi-purpose transducers and annotated corpora (i.e. treebanks) for a variety of Turkic languages. These can then be pair Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    35 KB (3,577 words) - 15:24, 1 October 2021
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    32 KB (3,684 words) - 06:16, 28 December 2018
  • ...(grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.) for Aromanian, along with the licences they are under. || || [[User: ...=center| {{sc|research}} || 3. Easy || Create manually tagged corpora: Occitan || Fix tagging errors in a piece of analysed text, for use in tag
    187 KB (21,006 words) - 22:14, 12 November 2012
  • ...however, a language package should have over 60% coverage on a variety of corpora and should probably have at least 2500 stems to be considered minimally use * The coverage of the transducer on a variety of corpora
    15 KB (1,783 words) - 22:33, 1 February 2019
  • == Estimating rules using parallel corpora == ...see [[Running the monolingual rule learning]] if you only have monolingual corpora).
    14 KB (2,181 words) - 19:01, 17 August 2018
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc === Corpora and corpora projects ===
    9 KB (987 words) - 23:25, 22 December 2014
  • * directory es-tagger-data : Contains data needed for the Spanish tagger (corpora, etc.) * directory ca-tagger-data : Contains data needed for the Catalan tagger (corpora, etc.)
    50 KB (7,915 words) - 00:04, 10 March 2019
  • | name = Dictionary induction from parallel corpora / Revive ReTraTos | description = Extract dictionaries from parallel corpora
    23 KB (3,198 words) - 09:15, 4 March 2024
  • ...quirements for corpora, and a number of different formats for storing such corpora have sprung up. Some examples include: ...ng on). The following is an idea Jonathan has for implementing a standard corpora format for use by apertium.
    5 KB (813 words) - 00:08, 28 December 2011
  • ...millions of words as in the statistical methods: it takes only two smaller corpora and a dictionary containing rules to conjugate verbs and to match nouns and ...anguage pairs of the same linguistic family without the need of linguistic corpora. The experience of Apertium with several minoritised languages such as Occi
    15 KB (2,339 words) - 00:41, 4 June 2018
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    22 KB (2,520 words) - 23:09, 22 December 2014
  • === Annotated corpora ===
    18 KB (2,312 words) - 18:25, 18 September 2016
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    12 KB (1,308 words) - 19:27, 27 August 2017
  • * Collect parallel kaz-eng corpora! By new plan, we focused on adding vocabulary from 4 corpora.
    20 KB (2,856 words) - 06:26, 27 May 2021
  • * [http://corpus.leeds.ac.uk/query-zh.html A Collection of Chinese Corpora and Frequency Lists.] ===Corpora===
    16 KB (2,148 words) - 03:28, 16 December 2015
  • ....za/Faculties/ART/Xhosa/Pages/Research-.aspx "Cross linguistics upon Xhosa Corpora Research"] == Monolingual/Parallel Corpora ==
    4 KB (566 words) - 05:57, 18 April 2020
  • {{deprecated2|Learning rules from parallel and non-parallel corpora}} * a parallel corpus (see [[Corpora]])
    15 KB (2,206 words) - 13:58, 7 October 2014
  • ==Getting corpora== WORDLIST=/home/spectre/corpora/afrikaans-meester-utf8.txt
    16 KB (2,566 words) - 21:36, 15 March 2020
  • ...ert the data between the formats. It also allows to either upload or paste corpora in plain text and then convert them into CoNLL-U. ...des support for saving user corpora on server and then accessing the saved corpora via unique URL.
    6 KB (930 words) - 15:59, 29 August 2017
  • ...is an open source tool included on the Apertium project that let you train corpora and manage related files with a friendly user interface and letting you foc ...rom this view you are able to see corpora and training details, insert new corpora and train them easily
    8 KB (1,376 words) - 11:14, 29 October 2014
  • * [[Learning rules from parallel and non-parallel corpora]] – this is the current documentation on training/inferring rules ** preprocess corpora
    4 KB (541 words) - 13:46, 29 March 2021
  • Apertium-regtest is a program for managing regression tests and [[Corpus test|corpora]]. # in the browser, select one or all of the corpora to rerun tests for
    11 KB (1,823 words) - 12:17, 6 June 2023
  • ===Corpora=== * [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus] Creative Common
    6 KB (806 words) - 00:45, 7 December 2018
  • ===Corpora=== * [http://childes.talkbank.org/access/French/ CHIDES Corpora]. [http://talkbank.org/share/rules.html ''Requires reference'']
    15 KB (2,081 words) - 07:14, 12 August 2020
  • ..., Tommi Pirinen, Jonathan Washington. "Finite-state morphologies and text corpora as resources for improving morphological descriptions". [https://sites.goog ...f Inferring shallow-transfer machine translation rules from small parallel corpora]". In Journal of Artificial Intelligence Research. volume 34, p. 605-635.
    33 KB (4,418 words) - 11:52, 29 December 2021
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    12 KB (1,017 words) - 09:06, 18 January 2022
  • ====Training on Corpora with Ambiguous Lexical Units==== ...m-tagger</code> '', the tagger prints warnings about ambiguous analyses in corpora to stderr.''
    20 KB (3,229 words) - 20:06, 12 March 2018
  • * Tufiş, D., A. M. Barbu, V. Pătraşcu, G. Rotariu, and C. Popescu. "Corpora and Corpus-Based Morpho-Lexical Processing."&nbsp;''Recent Advances in Roma ===Corpora===
    7 KB (889 words) - 09:53, 28 November 2018
  • ===Corpora=== * [http://opus.nlpl.eu/ Hindi-English Parallel Corpora]
    8 KB (1,079 words) - 11:17, 3 December 2018
  • ...why the errors are so high. Re-evaluation will be done as soon as the two corpora are manually realigned. ...nslated corpora, and the label will be extracted from the manually written corpora. This method might provide better results since the model will be trained o
    6 KB (838 words) - 17:47, 25 July 2012
  • ==Corpora== The tagged corpora used in the experiments are found in the monolingual packages in [[language
    16 KB (1,448 words) - 16:50, 22 August 2017
  • ...n be found [https://github.com/taruen/apertiumpp/tree/master/data4apertium/corpora here]. ...m/corpora/jam/uzb.txt | apertium -d . uzb-kaa) -ref ../../../data4apertium/corpora/jam/kaa.txt
    5 KB (515 words) - 14:34, 1 September 2019
  • === Obtaining corpora (and getAlignmentWithText.pl) === The corpora need to be untarred, and inserted into a new, common directory.
    7 KB (973 words) - 02:52, 20 May 2021
  • ===Corpora === * [http://www.elra.info/en/catalogues/free-resources/nepali-corpora/ ''"Nepali"'']
    8 KB (948 words) - 19:59, 30 December 2017
  • Automatic shallow-transfer rules generation from parallel corpora ...in statistical machine translation, that have been extracted from parallel corpora and extended with a set of restrictions controlling their application.
    4 KB (525 words) - 19:21, 17 September 2009
  • ~/source/corpora/lm/en.blm europarl.en-es.es.multi-trimmed -f > europarl.en-es.es.annotated MODEL=/home/philip/Apertium/corpora/language-models/en/setimes.en.5.blm
    12 KB (1,634 words) - 18:26, 26 September 2016
  • ...e project was a success. All the goals have been achieved: the creation of corpora in LSC; italian monodix: apertium-srd-srd.dix: 51,743 words; apertium-ita-i ...rned to use markup languages (XML and HTML) for the creation of linguistic corpora. At present, I attend a Master’s Degree in Translation of specialized tex
    21 KB (3,171 words) - 14:34, 3 April 2017
  • ===Corpora=== * [https://korpora.zim.uni-duisburg-essen.de/Limas/ Corpora from Limas z.Hd. Prof. Dr. Bernhard Schröder Universität Duisburg-Essen,
    8 KB (900 words) - 10:15, 4 December 2018
  • == Corpora and Coverage == Our main corpora consisted of [https://www.rfa.org/uyghur/ RFA], [http://uy.ts.cn/ Tanritor]
    5 KB (607 words) - 13:25, 12 August 2018
  • == Corpora == ...8ba2a9c0e50bc885bfad3bfbff3b4afbd.pdf Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech]
    7 KB (881 words) - 13:11, 12 December 2018
  • == Corpora and Coverage == ...he help of mentors on Kipchak languages. Most frequent unknown tokens from corpora of each language (mostly consisting of Wikipedia entries, Bible and Quran)
    7 KB (798 words) - 18:30, 26 August 2019
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    10 KB (1,263 words) - 06:04, 23 December 2014
  • == Corpora == * wikipage: <section begin=azadliq2012-wikipage />RFERL corpora<section end=azadliq2012-wikipage />
    1,013 bytes (115 words) - 22:42, 12 August 2014
  • * Evaluate system on corpora === Various Potential Corpora ===
    10 KB (1,483 words) - 07:00, 14 August 2018
  • === Corpora === * [https://corpora.uni-leipzig.de/en?corpusId=ind_mixed_2013 Leipzig Corpora Collection - Indonesian]
    5 KB (629 words) - 13:08, 21 December 2019
  • == Corpora, sets and alignment == The parallel corpora for the Macedonian - English pair, a total of 207.778 parallel sentences, c
    5 KB (620 words) - 12:21, 27 July 2012
  • ===Corpora===
    2 KB (172 words) - 17:09, 27 March 2017
  • ...ine. Rules can be manually written, or learnt from monolingual or parallel corpora. {{main|Learning rules from parallel and non-parallel corpora}}
    19 KB (2,820 words) - 15:26, 11 April 2023
  • * d'un corpus parallèle (voir [[Corpora]]) Nous alors faire l'exemple avec [[Corpora|EuroParl]] et la paire anglais vers espagnol d'Apertium.
    9 KB (1,445 words) - 14:05, 7 October 2014
  • ...language model for the target language in order to create pseudo-parallel corpora, and use them in the same way as parallel ones. IRSTLM is a tool for building n-gram language models from corpora. It supports different smoothing and interpolation methods, including Writt
    3 KB (364 words) - 23:25, 23 August 2012
  • $ xzcat ~/corpora/nob/*ntb*.xz | head -100000 | apertium -d . nob-nno_e > 2019-09-30.before $ xzcat ~/corpora/nob/*ntb*.xz | head -100000 | apertium -d . nob-nno_e > 2019-09-30.after
    2 KB (327 words) - 08:02, 1 October 2019
  • Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "produc
    22 KB (2,532 words) - 11:36, 30 July 2018
  • ...like the page on [[Aromanian]]—i.e., all available dictionaries, grammars, corpora, machine translators, etc., print or digital, where available, whether Free
    68 KB (10,323 words) - 15:37, 25 October 2014
  • ...essicale e dell’analisi contrastiva è stata provvidenziale la creazione di corpora costituiti da testi redatti nella variante LSC, estrapolati da riviste on-l ...vocabolario Logudorese-italiano di Mario Casu e l’analisi approfondita dei corpora paralleli che ci hanno permesso di capire quale fosse, caso per caso, il ma
    13 KB (1,910 words) - 11:34, 23 August 2016
  • * Corpus testvoc = apertium-tat-rus/testvoc/corpus/trimmed-coverage.sh. Corpora can be found in the turkiccorpora repository. |colspan="2" rowspan="4"| Corpus testvoc clean on all of the available corpora ||rowspan="4"| ||rowspan="4" colspan="2" style="text-align: center"| ✗||r
    8 KB (1,006 words) - 12:48, 9 March 2018
  • or adapting the software to fit your needs. Existing free (GPL) data and corpora easily reusable to feed Apertium's dictionaries are also welcome. ...esidades particulares. También se agradece la disponibilización de datos y corpora libres (GPL) que sean reutilizables para mejorar los diccionarios de Aperti
    26 KB (3,122 words) - 06:25, 27 May 2021
  • === Annotated corpora ===
    3 KB (241 words) - 20:44, 9 September 2020
  • ===Application for "Interface for creating tagged corpora" GSOC 2013===
    2 KB (200 words) - 08:21, 13 January 2015
  • ...pertium-kaz/stats|~{{:apertium-kaz/stats/average}}%]] coverage over random corpora ...pertium-tat/stats|~{{:apertium-tat/stats/average}}%]] coverage over random corpora
    4 KB (586 words) - 01:53, 10 March 2018
  • ===Corpora=== * [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]
    4 KB (557 words) - 05:45, 25 August 2021
  • ...aled and combined. The formulas for combining them can be learnt from gold corpora unsupervisedly. * gold-standard tagged corpora and
    5 KB (816 words) - 02:32, 13 February 2018
  • == Corpora ==
    2 KB (242 words) - 19:49, 3 January 2018
  • === Corpora ===
    7 KB (943 words) - 20:51, 6 September 2018
  • == Corpora ==
    4 KB (479 words) - 02:06, 28 February 2020
  • *Trainer on_fly working and training corpora '''Done''' ***Now the trainer is able to train corpora using just the keyboard, with a friendly user interface, cleaned corpus to
    12 KB (1,602 words) - 15:47, 10 October 2013
  • == Corpora ==
    2 KB (286 words) - 10:51, 4 June 2017
  • ...do some of the tests like generation testing or coverage testing, we need corpora, right? Have no fear, for `aq-wikicrp` is here! Let us get a Maltese wikipe ...t you'd expect, tests the dictionary for coverage. Using our newly created corpora, we can test the coverage! Feel free to use either one, but be consistent;
    12 KB (1,931 words) - 17:06, 24 October 2018
  • === Corpora ===
    773 bytes (75 words) - 19:17, 8 June 2014
  • === Corpora ===
    2 KB (302 words) - 16:23, 26 December 2017
  • == Corpora ==
    2 KB (278 words) - 00:24, 15 June 2021
  • == Corpora ==
    2 KB (290 words) - 02:07, 24 July 2019
  • == Corpora ==
    1 KB (158 words) - 03:34, 13 July 2021
  • ...tion quality. This will involve improving coverage to 95-98% on a range of corpora and decreasing word error rate by 30-50%. For example if the current word e ...cial languages of EU? : Then this task will be hard. Pairs which have huge corpora of parallel texts, like the 24 official EU languages or the 3 EU working la
    2 KB (383 words) - 19:46, 2 March 2023
  • === Corpora ===
    4 KB (538 words) - 02:40, 27 December 2016
  • === Corpora ===
    3 KB (390 words) - 09:39, 27 December 2017
  • == Corpora ==
    1 KB (146 words) - 20:16, 24 March 2020
  • ===Corpora===
    8 KB (1,143 words) - 18:45, 11 August 2015
  • * http://permalink.gmane.org/gmane.science.linguistics.corpora/22281 Arabic names from dbpedia ===Corpora===
    3 KB (437 words) - 10:23, 21 November 2021
  • === Corpora ===
    2 KB (194 words) - 04:52, 31 December 2017
  • == Corpora ==
    4 KB (440 words) - 21:41, 15 December 2019
  • == Corpora ==
    2 KB (272 words) - 21:51, 15 December 2019
  • == Corpora ==
    2 KB (213 words) - 17:55, 16 December 2017
  • == Corpora ==
    2 KB (250 words) - 16:26, 11 April 2015
  • == Corpora ==
    3 KB (342 words) - 21:33, 15 December 2019
  • == Corpora ==
    867 bytes (90 words) - 20:14, 24 March 2020
  • ...r the lexical analysis and selection contrastive was providential creating corpora consist of texts written in the LSC variant, taken from magazines on -line ...io Casu's Logudorese-Italian vocabulary and in-depth analysis of parallel corpora that have allowed us to understand what, case by case, the greatest number
    7 KB (1,110 words) - 11:34, 23 August 2016
  • == Corpora ==
    3 KB (367 words) - 06:16, 1 October 2021
  • ...Pirinen, Jonathan Washington (2015). "Finite-state morphologies and text corpora as resources for improving morphological descriptions". [https://sites.goog
    13 KB (1,710 words) - 20:32, 30 August 2018
  • == Corpora ==
    1 KB (173 words) - 06:04, 16 December 2014
  • == Corpora ==
    1 KB (135 words) - 06:03, 16 December 2014
  • == Corpora ==
    2 KB (199 words) - 06:51, 6 July 2018
  • == Corpora ==
    1 KB (173 words) - 06:03, 16 December 2014
  • == Corpora ==
    1 KB (177 words) - 06:01, 16 December 2014
  • == Corpora ==
    2 KB (211 words) - 06:02, 16 December 2014
  • == Corpora ==
    1 KB (173 words) - 06:06, 16 December 2014
  • ...is a program for aligning words and sequences of words in sentence aligned corpora. If you have parallel corpus you can use GIZA++ to make bilingual dictionar *[[Corpora]]
    4 KB (589 words) - 11:51, 29 April 2015
  • == Corpora ==
    417 bytes (42 words) - 20:17, 24 March 2020
  • == Corpora ==
    2 KB (246 words) - 21:49, 15 December 2019
  • == Corpora ==
    1 KB (154 words) - 06:02, 16 December 2014
  • ==Parallel corpora==
    766 bytes (87 words) - 08:07, 20 January 2009

View (previous 100 | next 100) (20 | 50 | 100 | 250 | 500)