Search results

Jump to navigation Jump to search

Page title matches

  • ...f the big language data sets. You do not want to add to or modify language data, you want to use it. <span style="color:darkorange;">'''Data may be outdated'''</span>, use only for system assessment. See the main sec
    3 KB (445 words) - 12:38, 24 April 2017
  • ...]. The instructions are very different. This page is for existing language data. ...mar or HFST. If that happens, follow instructions under [[Install language data by compiling#Missing dependencies | missing dependencies]].
    5 KB (843 words) - 19:44, 2 March 2023

Page text matches

  • * an Apertium language pair Make a folder called data-en-es. We are going to keep all the generated files there.
    15 KB (2,206 words) - 13:58, 7 October 2014
  • * A language pair (e.g. apertium-br-fr) ** The language pair should have the following two modes:
    12 KB (1,634 words) - 18:26, 26 September 2016
  • Your language pair should be fully set up in the direction that you're training for, and * an Apertium language pair
    14 KB (2,181 words) - 19:01, 17 August 2018
  • * Train a target side language model (http://hermes.fbk.eu/people/bertoldi/teaching/lab_2010-2011/img/irst * The language pair must support the pretransfer and multi modes. See apertium-sh-mk/modes
    4 KB (503 words) - 19:01, 17 August 2018
  • This is a guide on how to add linguistic data directly to an existing language pair in Apertium. It gets a bit technical – if you just want to notify us ...t-of-speech tagger, which is in charge of the disambiguation of the source language text.
    50 KB (7,915 words) - 00:04, 10 March 2019
  • ...iew to the kind of data and resources that can be useful in building a new language pair for Apertium, and how to go about building them if they do not already Each Apertium language pair requires 3 dictionary files. For instance, for the English-Afrikaans
    13 KB (2,112 words) - 12:11, 26 May 2023
  • ...nguage (<code>SL</code>) will be trained using information from the target language (<code>TL</code>). ==Language pair==
    11 KB (1,470 words) - 08:16, 8 October 2014
  • ...converted or expanded in the [[incubator]]. Consider doing or improving a language pair (see [[incubator]], [[nursery]] and [[staging]] for pairs that need wo == Language Data ==
    23 KB (3,198 words) - 09:15, 4 March 2024
  • ...language pair XX-YY by adding 50 words to its vocabulary || Add words to language pair XX-YY and test that the new vocabulary works. [[/Add words|Read more]] ...language pair || Add or correct a structural transfer rule to an existing language pair and test that it works. [[/Add transfer rule|Read more]]... || [[User
    68 KB (10,323 words) - 15:37, 25 October 2014
  • ...of any language in Russia in areas smaller than the Federal Subjects. The data is in Russian and comes from the official 2010 Russian Census website. ===Complete guide to accessing the data===
    3 KB (561 words) - 17:58, 14 January 2018
  • ...f the big language data sets. You do not want to add to or modify language data, you want to use it. <span style="color:darkorange;">'''Data may be outdated'''</span>, use only for system assessment. See the main sec
    3 KB (445 words) - 12:38, 24 April 2017
  • ...rtium machine translation system from scratch. You can check the [[list of language pairs]] that have already been started. ...translation systems. The only thing you need to do is write the data. The data consists, on a basic level, of three dictionaries and a few rules (to deal
    19 KB (3,164 words) - 20:58, 2 April 2021
  • ...]. The instructions are very different. This page is for existing language data. ...mar or HFST. If that happens, follow instructions under [[Install language data by compiling#Missing dependencies | missing dependencies]].
    5 KB (843 words) - 19:44, 2 March 2023
  • ...on Ubuntu/Debian, using the Voikko plugins and Giellatekno/Divvun language data. ==Install the language data==
    4 KB (596 words) - 21:02, 2 April 2021
  • |title=Add recursive transfer support to a language pair that doesn't support it |description=Make a branch of an Apertium language pair that doesn't support recursive transfer and call it "recursive transfe
    32 KB (4,862 words) - 06:23, 5 December 2019
  • * https://apertium.org is the official site, and offers all the released language pairs ...Apertium platform, and also offers a simple web interface to the released language pairs
    6 KB (848 words) - 12:51, 1 April 2024
  • ...rtium.org page uses an installation which currently only runs ''released'' language pairs (also available from https://apertium.org/apy if you prefer). However $ curl -G --data "lang=kir&modes=morph&q=алдым" https://beta.apertium.org/apy/analyse
    37 KB (5,132 words) - 16:36, 5 June 2020
  • ...chine translation to understand the general meaning of the text in foreign language. The other approach is instead that of "dissemination" in which the MT is a ...(coding and decoding), data (linguistic data) and support tools to convert data and make them compatible with the engine. Even if most RBMT systems are pri
    21 KB (3,171 words) - 14:34, 3 April 2017
  • [[Target-language tagger training|In English]] ...t changez les variables <code>DATA</code> et <code>DIRECTION</code>. <code>DATA</code> doit pointer vers le répertoire contenant les données de la paire
    12 KB (1,625 words) - 08:20, 8 October 2014
  • ...epository scheme. (Originally, all monolingual language data was found in language pairs, meaning that there was a lot of duplication.) If you feel something ...hat constitutes a minimally-useful language package; generally, however, a language package should have over 60% coverage on a variety of corpora and should pr
    15 KB (1,783 words) - 22:33, 1 February 2019
  • ====When running configure script for language pair data==== ====Workaround when language pairs need updated configure.ac's====
    20 KB (3,153 words) - 08:13, 24 May 2019
  • DATA=/home/philip/Apertium/gsoc2013/monolingual/data ...atterns-frac-maxent.py $DATA/setimes.sh-mk.freq $DATA/setimes.sh-mk.ambig $DATA/setimes.sh-mk.annotated > events 2>ngrams
    3 KB (520 words) - 21:25, 14 February 2014
  • ...to be translated. For example, HTML tags must not be translated in another language, but only the text of the Web page. ...e same software are used for every language pairs. It is the format of the data to be translated which will take to use a particular deformatter.
    58 KB (8,365 words) - 20:16, 26 June 2018
  • Owing to the different syntactic structure of the phrases in each language, some Although the details of the modules and the linguistic data is presented in
    58 KB (8,964 words) - 11:11, 14 May 2016
  • ...Iberian peninsula, but is now being used to translate between more distant language pairs. ...ngineering ([http://www.prompsit.com http://www.prompsit.com]). Linguistic data are being developed by Transducens, the Seminario
    26 KB (3,122 words) - 06:25, 27 May 2021
  • ...ngsnes (ed.) Bauta: Janne Bondi Johannessen in memoriam, Oslo Studies in Language 11(2), 2020. 489–501. (ISSN 1890-9639 / ISBN 978-82-91398-12-9) ...system/files/swj1419.pdf The apertium bilingual dictionaries on the web of data]. Semantic Web, 9(2), 231-240.
    33 KB (4,418 words) - 11:52, 29 December 2021
  • ...tion of each module with more precision. They may also introduce technical language which linguists and/or computer coders would use. The technical description References to 'xxx' and 'yyy' refer to a language code, for example 'en-es'; 'English' to 'Spanish'.
    29 KB (4,687 words) - 16:28, 5 June 2020
  • ...of any language in Russia in areas smaller than the Federal Subjects. The data is in Russian and comes from the official 2010 Russian Census website. Here are the steps to access the data:
    2 KB (296 words) - 21:12, 13 January 2018
  • ...//d3js.org/ D3.js] tool that depicts all Apertium [[list of language pairs|language pairs]] in an interactive graph initially developed sometime before the [[G === Updating language data by scraping ===
    5 KB (702 words) - 01:34, 9 December 2018
  • === Language pairs === .../github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin] Linguistic data for the Apertium Urdu-Hindi machine translator
    6 KB (806 words) - 00:45, 7 December 2018
  • '''Apertium New Language Pair HOWTO''' ...rtium machine translation system from scratch. You can check the [[list of language pairs]] that have already been started.
    36 KB (5,933 words) - 16:14, 22 February 2021
  • The number of language pairs in development for Apertium is increasing, and so is the complexity o language pairs. With better tools, more people will be able to develop language pairs.
    29 KB (4,382 words) - 07:53, 6 October 2019
  • ...he implementation of the algorithms must be free/open-source, but also the data themselves. Nowadays, there are many machine translation packages of this t ...morphologically rich languages, which even with large corpora suffer from data sparseness.
    6 KB (905 words) - 17:26, 18 October 2010
  • ...-supervised.make this one] from en-eo. You will need modify it to fit your language pair. This usually means editing the first few lines. ===Tagger data directory===
    3 KB (537 words) - 13:44, 18 June 2014
  • |Language You will need to install NLTK and NLTK data. Unfortunately, they both only support Python versions 2.6-2.7. If you are
    14 KB (2,232 words) - 12:51, 26 September 2018
  • ...uide on how to use a development version of Apertium to make a change in a language pair. ...ou should try this to make sure things work before you move on to whatever language pair you plan on working on.
    10 KB (1,626 words) - 17:46, 13 January 2020
  • ...http://wiki.apertium.org/wiki/Mandarin_Chinese#In_Apertium some linguistic data in Apertium]. ...fers to the most commonly spoken form of Chinese that is the sole official language of China and Taiwan. It is also known as Putonghua or Standard Chinese ([[W
    16 KB (2,148 words) - 03:28, 16 December 2015
  • ...mpire, as did all Romance languages. There are currently 4 released French language pairs ...the sixth most spoken language in the world and is the second most studied language worldwide.
    15 KB (2,081 words) - 07:14, 12 August 2020
  • ...MT based on corpora: adding new languages ​​is very easy. To create a new language pair, in fact, it is not necessary to include corpora with millions of word ...airs can be added by creating dictionaries and rules containing linguistic data in XML format.
    15 KB (2,339 words) - 00:41, 4 June 2018
  • ...ind that are incorrectly translated, to getting involved in creating a new language pair or programming on tools or user interfaces. Here are some question fre Our language agnostic tools are native and written in [https://en.wikipedia.org/wiki/C++
    7 KB (1,139 words) - 06:27, 27 May 2021
  • ...are basically for Anel, Aizhan and Assem who have started to develop this language pair... And Aida too... === Download apertium, lttoolbox and eng-kaz data from SVN ===
    20 KB (2,856 words) - 06:26, 27 May 2021
  • ...ll these language pairs. This means that the data can be re-used by other language projects (e.g. in developing spelling or grammar checkers, thesauri, etc). This project was accepted as part of our "adopt a language pair" idea
    12 KB (1,917 words) - 15:54, 12 September 2009
  • *'''langpair''': language pair to use for translation curl -G --data "langpair=eng|spa&q=run" http://localhost:2737/dictionaryLookup
    5 KB (712 words) - 21:27, 16 August 2016
  • ...appear at the beginning of a sentence. The unique thing about the persian language though, is that they use prepositions which is quite uncommon in many SOV l ...designed a Two-sided morphology analyst of nouns and adjectives in Persian language, using Xerox Finite State Technology as giving input word (adjective or nou
    16 KB (2,597 words) - 20:58, 12 January 2013
  • * Apertium language pairs .../engine of Apertium installed (including the requirement lttoolbox, but no language pairs yet).
    9 KB (1,367 words) - 09:17, 26 May 2021
  • ...of the main five data files in any language pair (see also: [[Apertium New Language Pair HOWTO]]). ....dix'' where ''apertium-A-B'' is the name of the [[List of language pairs| language pair]]. For example file ''apertium-af-nl.af-nl.dix'' is the bilingual dict
    7 KB (1,244 words) - 16:41, 17 March 2018
  • ...getting new contributors to Apertium and to helping spread our passion for language technology. ...of other things, live in our '''[[subversion|svn repo]]'''. The language data is found in the following places:
    7 KB (1,091 words) - 19:54, 12 April 2021
  • ...olving the antecedent of the anaphors in text becomes essential in several language pairs. ...ge it to the correct anaphor''' using a macro in the transfer rules of the language pair. (t1x)
    20 KB (3,107 words) - 21:13, 24 June 2022
  • ...nders , specially for Indian Languages because we still do not have enough data ...oreign languages. I am specially interested in MT systems where the source language is English and the target languages are Indian Languages. It is impossible
    6 KB (923 words) - 17:57, 3 April 2010
  • First, make a directory called <code><lang>-tagger-data</code>. Put your corpus into there with a name like <code><lang>.crp.txt</c ...cifies how to generate the probability file. You can grab one from another language package. For <code>apertium-en-af</code> I took the Makefile from <code>ape
    7 KB (1,177 words) - 08:34, 8 October 2014
  • '''Track:''' Data Science Dynamic Language Interpreter implementation
    8 KB (1,094 words) - 13:10, 14 April 2019
  • ==Install language module== A language module supporting spelling may be installed, either from our repository, or
    3 KB (387 words) - 12:21, 26 September 2016
  • ...ion of machine translation. The tasks consist of sentences in the original language, reference translation with keywords omitted and the machine translation of ...various { gap } in order to discover phenomena and patterns in the natural language.
    9 KB (1,368 words) - 09:04, 23 April 2015
  • ...duce translations which are less fluent, but more preserving of the source language meaning. ...er and number between a determiner and head noun will remain in the target language output.
    12 KB (1,464 words) - 12:00, 31 January 2012
  • ...duce translations which are less fluent, but more preserving of the source language meaning. ...er and number between a determiner and head noun will remain in the target language output.
    11 KB (1,519 words) - 06:51, 11 May 2013
  • ...duce translations which are less fluent, but more preserving of the source language meaning. ...er and number between a determiner and head noun will remain in the target language output.
    11 KB (1,519 words) - 18:27, 16 October 2015
  • More convincing if you have a language pair on the computer somewhere :) ...this should work for both packaged and compiled Apertium. Without language data you can't see a translation, but you can see the help. Try,
    2 KB (368 words) - 06:02, 24 April 2017
  • ...probably try this to make sure things work before you move on to whatever language pair you plan on working on. Note that some existing language pairs have external dependencies, like HFST or Constraint Grammar. The [[In
    10 KB (1,715 words) - 12:29, 28 May 2018
  • ...tended to show how you can make an "indirect" contribution, by documenting language resources, helping us to build bilingual test sets, translating, promoting, ...first language, and translate them to the other. A translation in a third language may be useful in enlisting help, but is not required.
    9 KB (1,494 words) - 05:58, 18 March 2015
  • ...ed translation, morphological analysis, natural language processing, human language technologies ...Spanish–Catalan) but which has been expanded to deal with more divergent language pairs (such as English-Catalan and even Basque→English). The platform pro
    10 KB (1,500 words) - 16:23, 18 February 2016
  • '''apertium-get''' is a little script to fetch and compile language data, with monolingual dependencies, from Github. ...d and compiled by just going to the directory where you want your language data to be, and running
    2 KB (317 words) - 20:45, 23 March 2019
  • ...probably just search for, tick off and install Apertium and your favorite language pairs in Synaptic. There's a friendly [https://help.ubuntu.com/community/Sy Step 2: '''Download apertium, lttoolbox and language pairs from SVN.'''
    3 KB (475 words) - 16:28, 27 April 2017
  • ==== Data preparation ==== There were three attempts to extract postediting operations for each language pair: with threshold = 0.8 and -m, -M = (1, 3).
    7 KB (1,033 words) - 15:27, 15 August 2018
  • <li>- 4: preprocessing : dictionary data needs some changes to be used in a graph, this step prepares it for further ...recommends what languages will be the most efficient to enrich particular language pair</li>
    19 KB (2,541 words) - 15:44, 12 August 2018
  • ...d was exposed to different languages. This led to me being fascinated with language translation and I wanted to contribute to help in making communication easi I am going to work on “ Adopt an unreleased language pair: Hindi - Telugu”. I want to get the pair released in both the direct
    9 KB (1,391 words) - 16:41, 31 March 2020
  • == Language data packages == If you've installed tools with install-nightly.sh, you can install language data with
    4 KB (665 words) - 11:57, 18 November 2022
  • ...um project is a project which works on open-source machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...versitat d'Alacant] (Alacant, Spain) and [http://www.prompsit.com Prompsit Language Engineering].
    10 KB (1,543 words) - 19:50, 12 April 2021
  • ...f language pairs that may be used to infer new entries for existing or new language pairs using graphs. ...a graph and relevant information is stated about them. The cloud of linked data is intended to be navigated by software agents primarily. In the case of Ap
    3 KB (452 words) - 19:50, 24 March 2020
  • ...oject goal is to create a machine translation package for Sicilian-Spanish language pair on the base of Apertium’s machine translation system. This project i ...he Sicilian dictionary was the abundance of spelling forms in the Sicilian language. For instance, one Sicilian verb with the meaning 'to join' can have the fo
    9 KB (1,370 words) - 13:58, 23 August 2016
  • ...language particularly suitable for various reasons. First, because it is a language in process of standardization, so both the linguistic resources (written do ...he near future, it will be possible to operate in the translation of other language pairs as Sardinian-Catalan and Sardinian-Spanish.
    7 KB (1,110 words) - 11:34, 23 August 2016
  • ...declarative language. A good intro would be to look through [[Apertium New Language Pair HOWTO]], see also [[Contributing to an existing pair]]. If the pair ha #* If there is no translation, translate it into the languages of your language pair first.
    6 KB (1,024 words) - 15:22, 20 April 2021
  • ...rs independent free-software developers. There are currently 40 published language pairs within the project (including a number of "firsts" — for example Sp natural language processing, machine translation, grammar, python, c++, linguistics, languag
    7 KB (1,111 words) - 10:10, 15 November 2015
  • ==Install language module== * To install Kazakh language module, first get it
    4 KB (492 words) - 02:54, 10 March 2018
  • You can replace cy-en by different language pair. For the list of language pairs go [http://wiki.apertium.org/wiki/List_of_language_pairs#Trunk_.28rel === Install language-pair data ===
    5 KB (808 words) - 02:48, 9 March 2018
  • 1. All needed data for North Sami, Kurmanji, Breton, Kazakh and English was prepared: there ar ...Also the testpack for two language pairs was built: it contains all needed data for sme-nob and kmr-eng, the labeller and installation script.
    5 KB (764 words) - 01:40, 8 March 2018
  • #* If you can't understand the language the website is written in, ask for help in IRC or use a translator and look ...er when calling <code>Writer()</code>. For example if we want to write the data every 30 seconds call <code>Writer(30)</code>.</li>
    14 KB (2,389 words) - 05:20, 29 March 2019
  • ...family of some three dozen related languages descended from a Proto-Uralic language and spoken by more than 25 million people throughout Europe and Northern As ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.
    22 KB (2,520 words) - 23:09, 22 December 2014
  • ...e Summer of Code 2018. It also includes information on the upgrade of four language pairs which was carried out during the same period. For a more detailed wor ...tem and develop it to bring it to release quality. In addition, four other language pairs have been upgraded to the monolingual package system to ease future d
    7 KB (1,071 words) - 10:48, 14 August 2018
  • ...l be available. For various reasons, the author has successfully developed language pairs using public repository versions of Apertium core. ...tes and Apertium tools. You also get, for optional install; release-level language pairs, service providers, constraint grammar code, and more. All under pack
    6 KB (1,006 words) - 18:26, 27 April 2021
  • ...m project develops a free/open-source platform for machine translation and language technology. We try to focus our efforts on lesser-resourced and marginalise ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also independent free-software developers play a huge rol
    13 KB (2,013 words) - 12:21, 20 June 2019
  • ...m project develops a free/open-source platform for machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also independent free-software developers play a huge rol
    11 KB (1,802 words) - 19:51, 12 April 2021
  • ===Download and compile data=== ...</code> and <code>apertium-is-en</code>. You can find others at: [[list of language pairs]] and [[list of dictionaries]].
    4 KB (647 words) - 07:45, 8 October 2014
  • ...) constitute a group of related languages and a branch of the Afro-Asiatic language family. Spoken by more than 470 million people throughout North Africa and ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.
    20 KB (2,336 words) - 18:10, 14 April 2015
  • ...dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs. !rowspan=2| Language
    18 KB (2,312 words) - 18:25, 18 September 2016
  • == Improving language pairs by mining MediaWiki Content Translation postedits == ...and bidix entries to improve the performance of an Apertium language pair. Data is available from Wikimedia content translation through an [API https://www
    3 KB (383 words) - 19:56, 24 March 2020
  • ...language, as Apertium offers the only machine translation system for this language pair. The idea is to make Occitan output easier to postedit and French outp ...guage data], [https://github.com/apertium/apertium-fra the French language data], and [https://github.com/apertium/apertium-oci-fra the Apertium Occitan-F
    2 KB (213 words) - 19:48, 24 March 2020
  • === Altai Language Resources === Crúbadán language data for Southern Altai. Kevin Scannell. 2015. The Crúbadán Project. oai:cruba
    2 KB (217 words) - 06:57, 5 December 2017
  • ...in some cases data or tools from Freeling could be useful to apertium, and data from apertium could be useful to Freeling. Also, to install the data, I had to change the lines in freeling/data/Makefile.am that looked like
    5 KB (720 words) - 02:20, 10 March 2018
  • ...Everything in Apertium is free/open source: engine, data for more than 29 language pairs and tools to translate at a speed of more than 20,000 words per secon === Useful data ===
    1 KB (175 words) - 14:19, 25 July 2012
  • (in this example, I use eng as language resp. eng-deu as pair) the file ./eng-tagger-data/eng.dic for some reasons is empty (has a file size of 0).
    1 KB (165 words) - 14:16, 28 August 2016
  • ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. ...dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
    22 KB (2,532 words) - 11:36, 30 July 2018
  • ...e>[http://www.ethnologue.com/subgroups/dravidian dra]</code>) constitute a language family of about 70 languages spoken primarily in South Asia. The four most ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.
    19 KB (2,201 words) - 09:21, 9 December 2019
  • ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. ...ictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
    35 KB (3,577 words) - 15:24, 1 October 2021
  • ...y aimed at related-language pairs but expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides * a language-independent machine translation engine
    776 bytes (114 words) - 19:07, 12 September 2018
  • ...on-months (four people, 18 months) to develop (both engine, and linguistic data). It was widely used, with thousands of requests per day. ...sh State to rewrite the code as open-source, and to convert the linguistic data. After one person year, the first version of the Spanish--Catalan translato
    12 KB (1,679 words) - 12:00, 31 January 2012
  • ...m project develops a free/open-source platform for machine translation and language technology. We try to focus our efforts on lesser-resourced and marginalise ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but independent free-software developers also play a huge rol
    11 KB (1,680 words) - 12:22, 20 June 2019
  • ...on-months (four people, 18 months) to develop (both engine, and linguistic data). It was widely used, with thousands of requests per day. ...sh State to rewrite the code as open-source, and to convert the linguistic data. After one person year, the first version of the Spanish--Catalan translato
    12 KB (1,683 words) - 08:42, 10 May 2013
  • ...on-months (four people, 18 months) to develop (both engine, and linguistic data). It was widely used, with thousands of requests per day. ...sh State to rewrite the code as open-source, and to convert the linguistic data. After one person year, the first version of the Spanish--Catalan translato
    12 KB (1,683 words) - 11:00, 30 October 2015
  • ...ll the unigram models from “A set of open-source tools for Turkish natural language processing.”<ref name="trmorph-tools">http://coltekin.net/cagri/papers/tr ...tuff.”<ref name="prerequisites">[[Installation#If you want to add language data / do more advanced stuff]]</ref>
    20 KB (3,229 words) - 20:06, 12 March 2018
  • ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. ...dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
    10 KB (1,263 words) - 06:04, 23 December 2014
  • '''Language pair packages''' are standalone JARs that can be run independently as well Since JAR files are nothing but renamed ZIP files, you can easily edit language pair packages to fit your needs. Note that the packages are ready to be use
    11 KB (1,497 words) - 08:23, 7 April 2020
  • ...ogue.com/subgroups/germanic gem]) constitute a branch of the Indo-European language family spoken primarily in Europe, Anglo-America and Australasia. The commo ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.
    32 KB (3,684 words) - 06:16, 28 December 2018
  • ...s one of the official languages of India, and has around 33 million native language speakers globally. .../ktpress.org.in/pdf/evolution_of_oriya_language.pdf The Evolution of Oriya Language and Script], ''Utkal University, Cuttack,''
    13 KB (1,770 words) - 06:56, 3 December 2017
  • Make a program which tests Apertium data files for suspicious or unrecommended constructs (likely to be bugs). Some ...x]] (dix) dictionary data, perhaps also transfer rules. The [[Apertium New Language Pair HOWTO]] should introduce most of the terminology and background you ne
    5 KB (789 words) - 10:36, 31 May 2016
  • ...cant] (Alacant, Spain); the other one is [http://www.prompsit.com Prompsit Language Engineering]. These two organizations are currently responsible for most of ...systems to translate less-closely related languages. We have 10 published language pairs, and three more currently in development.
    8 KB (1,255 words) - 19:50, 12 April 2021
  • ...the mnemonic (starting on the first column) must be kept unchanged from a language to another, while the string farther to the right is translated. By defaut, as for lttoolbox, apertium, and the language pairs, the installation is done in <code>/usr/local/bin</code> and <code>/u
    5 KB (789 words) - 12:16, 15 June 2018
  • ...r/>words !! data-sort-type="number"|WER !! data-sort-type="number"|PWER !! data-sort-type="number"|BLEU !! Reference / Notes ...forms that get some analysis, may give an indication of the maturity of a language pair.
    9 KB (1,233 words) - 09:10, 21 November 2021
  • ...Javanese language]]) is an [[Wikipedia:Austronesian languages|Austronesian language]] from Indonesia, spoken by the Javanese people from the central and easter Its language code is '''jv''' and '''jav'''.
    7 KB (881 words) - 13:11, 12 December 2018
  • ...e language pairs (which haven't been started or have currentlu very little data in Apertium) and write an usable version which provides intelligible output * If there is some data for the language pair in the Apertium Github server, check it out and install it.
    2 KB (238 words) - 13:45, 24 February 2023
  • ...guage pairs <code>aa-bb</code> and <code>bb-cc</code> it will create a new language pair for <code>aa-cc</code>. * '''sl-tl''': source language (sl) and target language (tl).
    5 KB (633 words) - 13:29, 6 October 2017
  • ...eof, and following that the development of a prototype pair for a minority language of Russia. Russia has a long history of work in machine translation, but ve ...h oil, as Tatarstan and Sakha) students with good knowledge of a minorised language seldom have a computer and/or access to the internet. That is the case at l
    18 KB (2,991 words) - 22:24, 3 August 2013
  • * Individual repos for each pair, language module, and tool (preserving all commit history). ...ch|talk]]) 13:04, 7 February 2018 (CET) To install apertium and one or two language pairs, you (just) have to follow few wiki pages and then, you get the only
    22 KB (3,325 words) - 14:06, 12 March 2018
  • ...D0%BE%D1%81%D1%81%D0%B8%D0%B8 Šupaškar Apertium Workshop]. Russian part of language pair was created using [[lttoolbox]], and all files, needed for Russian, we === Some data ===
    3 KB (299 words) - 06:39, 30 January 2012
  • ...tps://apertium.github.io/apertium-on-github/source-browser.html. It houses language pairs which haven't completely matured and are under work. ==Specific resources per language==
    10 KB (1,336 words) - 20:40, 11 December 2019
  • for every sentence s in the source language corpus: for every sentence in the source language corpus:
    6 KB (838 words) - 17:47, 25 July 2012
  • Apertium has some naming conventions for the various files used in language data: Files compiled when you do "make" in a language pair:
    890 bytes (126 words) - 10:10, 14 March 2017
  • {{see-also|Incubator|Specific resources per language}} ...Pair HOWTO|making a language pair]], feel free to make a new page for the language in question and paste it there. Stuff like basic dictionaries, paradigms, r
    1 KB (164 words) - 05:20, 4 December 2019
  • ;Get some data! Now try it on your own data.
    5 KB (822 words) - 19:43, 9 March 2020
  • == Data sources == * Often a word can be disambiguated using its translation in another language, for example the triple (estació, gare, station) defines a building meanin
    5 KB (949 words) - 15:27, 15 June 2020
  • ...t plan on working on the core C++ packages (but only want to work on / use language pairs), you can install all prerequisites with yum/zypper, using [[User:Tin For a list of available language pairs and other packages, see https://build.opensuse.org/project/show/home:
    1 KB (231 words) - 10:03, 12 January 2022
  • ...you have something, immediately, it to try invoke a tool. Without language data you can't see a translation, but you can see the help. Try, ...language data by compiling]]. Or, if your system has packaging, download a language package (but beware, a package manager may pull in a old package of Apertiu
    5 KB (821 words) - 02:55, 27 July 2022
  • I’m a sociolinguist working on language maintenance and shift. I'm very interested in creating resources for minori '''1.2 Bring a released language pair up to state-of-the-art quality''': I'd like to improve the pairs Catal
    16 KB (2,285 words) - 06:46, 12 April 2019
  • ...erator.<ref>Typically this goes for both translation direction, although a language pair only released for one direction might only be trimmed in that directio ...at when post-editing, the post-editor has to constantly look at the source language text (whereas an unknown word would be possible to translate there and then
    4 KB (679 words) - 16:06, 3 May 2020
  • ...ly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The In contrast to many other types of systems for natural language processing — such as morphological analysers and part-of-speech taggers,
    18 KB (2,490 words) - 12:00, 31 January 2012
  • ...ly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The In contrast to many other types of systems for natural language processing — such as morphological analysers and part-of-speech taggers,
    18 KB (2,493 words) - 08:39, 10 May 2013
  • ...eir buddies (both incoming and outgoing messages). If the user has set the language pair eng-spa (English &rarr; Spanish) for incoming messages from buddy1, th *'''/apertium_check''' Shows the current language pairs associated with the buddy whose conversation you issued the command o
    8 KB (1,263 words) - 02:18, 9 March 2018
  • ...ly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The In contrast to many other types of systems for natural language processing — such as morphological analysers and part-of-speech taggers,
    18 KB (2,493 words) - 10:59, 30 October 2015
  • ...u can distinguish an element from an attribute and can recognise character data. If you want a quick recap, this should help: :<element attribute="value">character data</element>
    11 KB (1,851 words) - 07:42, 16 February 2015
  • ...t. It most likely won't let you in order to guarantee the integrity of the data. Morph testing isn't supported by the language we're using, but it is as simple to run as regression testing. One simply r
    12 KB (1,931 words) - 17:06, 24 October 2018
  • ...Machine Translation] - This looks interesting, 200K sentences of bilingual data collected, we should contact the authors to see if we can access it [https: ...eb interface [http://nmt.cloudtrans.org/ here], but unclear wrt details of data/evals [https://scholar.googleusercontent.com/scholar.bib?q=info:A6cMdf1SuHw
    10 KB (1,483 words) - 07:00, 14 August 2018
  • Websites referencing Apertium categorised by language of the website. News about Apertium categorised by language of report.
    13 KB (1,689 words) - 21:42, 28 February 2021
  • ...ium-init to bootstrap a new language pair (optionally with new monolingual data packages as well). ...is script in your working directory where you will be downloading language data. You can get the script from https://apertium.org/apertium-init
    5 KB (824 words) - 15:30, 20 April 2021
  • ...m project develops a free/open-source platform for machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...eloped around the world, both in universities and companies (e.g. Prompsit Language Engineering) and by a growing numbers independent free-software developers.
    6 KB (1,057 words) - 15:34, 28 October 2013
  • !rowspan=2| Language ==Existing language pairs==
    5 KB (538 words) - 15:52, 11 April 2015
  • ...ipedia:Indonesian language]]) is an Austronesian language and the official language of Indonesia. Since it is a register of [[Malay]], it is also often general In [[Apertium]], there is a language pair of [[Indonesian and Malaysian]] already in the [[Trunk|trunk category]
    5 KB (629 words) - 13:08, 21 December 2019
  • | width=320 | '''[[Apertium New Language Pair HOWTO]]''' | [[Become a language pair developer for Apertium]]
    13 KB (1,601 words) - 23:31, 23 July 2021
  • If you're working on language data, <code>sudo</code> is pretty much only for running package managers like <c ...exception is <code>sudo make install</code>, but when working on language data you should never have to do this.
    856 bytes (144 words) - 12:52, 3 May 2018
  • ...rs independent free-software developers. There are currently 40 published language pairs within the project (including a number of "firsts" — for example Sp ...ommunication) often occurs at this age, and if we can show them that their language is useful, and other people care, and there is no barrier for its use in th
    6 KB (987 words) - 10:21, 7 November 2014
  • ...e official web site – it serves only the ''released'' (stable) versions of language pairs ** This is the official "beta" site – it serves the latest work in all language pairs (so things may work better, but also may have weird bugs). You can al
    3 KB (457 words) - 07:42, 18 June 2021
  • ...ecifies the parameters and data files specific to that language pair. Each language pair can contain a number of modes; most of these are used for debugging ea ...b server. We use apertium-nn-nb as an example, but it should work with any language pair; the modules lt-proc/cg-proc/apertium-{tagger,pretransfer,transfer,int
    13 KB (2,039 words) - 11:56, 3 June 2022
  • ...ding period &mdash; and for documentation. Anyone thinking of working on a language pair should make sure that they read about [[testvoc]] and other quality co ...all]] Apertium and a language pair; read through the [[:Category:HOWTO|new language pair HOWTO]]. This might even give you some more ideas!
    9 KB (1,509 words) - 23:51, 27 February 2023
  • ...thub. What this actually means is that you can set an apertium language or language pair on github to automatically build and test on each commit. You only nee This is an example for a monolingual data using hfst (from [apertium-fin]):
    2 KB (249 words) - 06:26, 27 May 2021
  • Apertium language data for Iraqi Turkmen. [[Category:Language data]]
    1 KB (144 words) - 20:07, 15 July 2021
  • ...temen kan maken. Het enige wat je zelf moet doen, is de data schrijven. De data bestaat uit 3 belangrijke delen, de woordenboeken, en enkele regels (woordv ...ems van de oorspronkelijke taal(source language='sl')of de doeltaal(target language='tl') kan kiezen en veranderen.
    36 KB (5,761 words) - 14:34, 4 December 2011
  • ...temen kan maken. Het enige wat je zelf moet doen, is de data schrijven. De data bestaat uit 3 belangrijke delen, de woordenboeken, en enkele regels (woordv ...ems van de oorspronkelijke taal(source language='sl')of de doeltaal(target language='tl') kan kiezen en veranderen.
    36 KB (5,767 words) - 07:07, 16 February 2015
  • ...textbook distinction in language, isn't it? When you start exploring real data the boundaries fade very fast and everything looks much more complicated.
    22 KB (2,150 words) - 20:21, 24 April 2013
  • ...statistical parser, which in turn can serve different purposes of natural language processing. For creating a good treebank, manual annotation and/or disambig ...interface allows to work with CoNLL-U and CG3 formats, and to convert the data between the formats. It also allows to either upload or paste corpora in pl
    6 KB (930 words) - 15:59, 29 August 2017
  • ...d of existing trained models. Successful tries are saved into new training data.<ref>https://static.googleusercontent.com/media/research.google.com/en//pub ...butions can also be found [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files-Contributions here].
    2 KB (305 words) - 14:36, 28 October 2018
  • ...er]] or [[CG]] files. It creates fully working Makefiles and stub language data, so you can compile and test straight away (assuming you've [[Installation|
    744 bytes (108 words) - 20:38, 13 January 2021
  • | 64 || Apertium-tolk should give proper warning when no linguistic data is installed || 2008-03-31 || Wynand Winte ...rg/cgi-bin/bugzilla/index.cgi here]. Please feel to report your bug in any language you are comfortable with.
    12 KB (1,254 words) - 22:08, 7 March 2018
  • | clip || - || N/A || part &rarr; value || Obtains the part in the only language there is (inter/post-chunk) and pushes the value onto the stack ...|| - || link-to || part, pos &rarr; value || Obtains the 'part' in source language in position 'pos' and pushes the 'value' onto the stack. An optional operan
    14 KB (2,020 words) - 13:58, 7 October 2014
  • While training can be done directly in the language directory, it is a better idea to train the tagger with copies of the files ...e the training directory (replace <code>lang</code> with the corresponding language code).
    4 KB (651 words) - 13:36, 23 August 2017
  • {{Language Kashmiri is an Indo-Aryan language spoken in the Kashmir Valley and regions around it that were historically a
    6 KB (811 words) - 10:42, 2 July 2018
  • ..., transfer rules, scripting, corpora. The objective is to make an Apertium language pair state-of-the-art, or close to state-of-the-art in terms of translation ...ge pair of your choice in Apertium and install it. (see [[Install language data by compiling]])
    2 KB (383 words) - 19:46, 2 March 2023
  • ...our language data directory (replacing "apertium-foo" for your monolingual data dir):
    725 bytes (111 words) - 09:24, 2 March 2016
  • By defaut, as for lttoolbox, apertium, and the language pairs, the installation is done in <code>/usr/local/bin</code> and <code>/u ...ium</code> command, there is the '''<code>-f</code>''' option to translate data produced in this format without having to call "by hand" a deformatter and
    5 KB (780 words) - 11:48, 15 June 2018
  • ...;13:00&nbsp; || &nbsp; '''Practical''': Installing Apertium and creating a language pair ....sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session7a.pdf Data consistency, quality] and [https://svn.code.sf.net/p/apertium/svn/branches/
    8 KB (720 words) - 15:18, 20 March 2015
  • # Most language pairs don't need to specify anything else for install-data-local: install-data-local: install-modes
    4 KB (612 words) - 13:09, 18 February 2015
  • This page contains data for CLD2 coverage. If need help to obtain CLD2 coverage of a certain language, contact [[User:Wei2912]].
    75 KB (7,440 words) - 17:12, 8 August 2014
  • Where LANGUAGE_PAIR is language pair (e.g. en-eo) wget http://sunsite.unc.edu/pub/Linux/system/keyboards/console-data-1999.08.29.tar.gz
    2 KB (281 words) - 02:58, 9 March 2018
  • ** Select a language ** Use the Apertium morphological analyser to analyse the test data
    1 KB (213 words) - 21:13, 18 March 2019
  • ...is it possible to achieve pretty good results having very small amount of data (like in case of Breton) ...ad of the original syntax module in kmr-eng pipeline. The testpack for two language pairs was built. All code was cleaned up, some docstrings were written. Als
    6 KB (833 words) - 12:56, 22 August 2017
  • ...s, data, and other system resources with applications, software tools, and data of the Unix-like environment. Therefore it is possible to launch Windows ap Now you're ready to download and build language pairs and use them under Cygwin's shell.
    12 KB (1,883 words) - 22:06, 7 March 2018
  • If you want to work on Apertium language pairs or tools, some knowledge of the Unix shell / command-line scripting w ...hell/ shell scripting] and [https://hacker-tools.github.io/data-wrangling/ Data wrangling] are relevant and succinct
    746 bytes (101 words) - 09:20, 8 February 2019
  • * répertoire es-tagger-data : Contient les données nécessaires pour le tagger espagnol (corpus, etc.) * répertoire ca-tagger-data : Contient les données nécessaires pour le tagger catalan (corpus, etc.)
    54 KB (8,480 words) - 18:55, 10 April 2017
  • .../presentation/d/1LBcBs3KdzfS7vl6Sxe0UtOMLpWNMM6ciGS_YPCnxTr0 Reading-bound data as inline secondary tags]", Tino Didriksen *** "Reading-bound data is best transported as inline secondary tags, proven both by practical expe
    3 KB (509 words) - 15:49, 2 July 2020
  • ** We can haz. Data is now checked in on Victorio at /langtech/trunk/words/dicts/algu, with a r ...ns Finnish and Northern Sámi. Ryan can contact them if it seems like their data would be of use.
    16 KB (2,457 words) - 08:19, 12 April 2017
  • ...m project develops a free/open-source platform for machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also independent free-software developers play a huge rol
    3 KB (424 words) - 19:24, 29 October 2010
  • ...m project develops a free/open-source platform for machine translation and language technology. We try to focus our efforts on lesser-resourced and marginalise ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but independent free-software developers also play a huge rol
    9 KB (1,376 words) - 15:24, 22 March 2013
  • ...ces for the involvement of Croatian researchers and developers in Apertium language pairs involving Croatian as part of the [http://cordis.europa.eu/projects/r ...for the more inclusive ISO-639-2 code hbs to be used to refer to it in all language pairs developed inside Apertium for components of this macrolanguage.
    6 KB (987 words) - 22:27, 3 August 2013
  • ** a language pair, ** the reference files for a language.
    8 KB (1,327 words) - 21:34, 17 February 2019
  • ...t plan on working on the core C++ packages (but only want to work on / use language pairs), you can install all prerequisites with apt-get, using [[User:Tino D # or, to get all dependencies for building a language from git:
    2 KB (311 words) - 21:05, 2 April 2021
  • ...nslation pairs as a service and provides '''translate''' and '''detect''' (language recognition) capabilities over an '''XML-RPC''' interface, as well as '''RE ...for discussion). It also manages a ''resource pool'' of e.g. language pair data, both (eagerly) pre-allocated and (lazily) allocated at need, up to a high
    13 KB (1,764 words) - 03:29, 6 November 2019
  • ...rossdics|crossdics]] package) to get cheap bilingual dictionaries from any language pair available in [http://www.omegawiki.org OmegaWiki] database. ...downloads/omegawiki-lexical.sql.gz download] the latest version of lexical data from the OmegaWiki database (see also [http://www.omegawiki.org/Help:Downlo
    2 KB (202 words) - 00:55, 24 January 2018
  • Different language-pair packages use different strategies to generate .dix dictionaries ([[mon ...t versions of a translator (for instance, for two different varieties of a language, such as Brazilian and European Portuguese) whose names could be ideally ti
    11 KB (1,733 words) - 08:24, 25 April 2016
  • * '''sl''': source language (for example, in morphological and bilingual dictionaries) * '''tl''': target language (for example, in bilingual dictionaries)
    8 KB (902 words) - 09:19, 6 October 2014
  • ...ictionary for languages A and C is built from dictionaries for A-B and B-C language pairs. (or some other Unicode language installed - I use eo.UTF-8) and run the tests again.
    8 KB (1,070 words) - 01:29, 26 October 2018
  • If you are the unlucky owner of a language pair where you must maintain the synthetic adjective tag (<sint>) in the bi -prepare attempts to detect and insert autoconcord data into the monodices,
    7 KB (1,185 words) - 08:39, 6 October 2014
  • !rowspan=3|System !!colspan=7|Language ...ives]/[words with a correct analysis from the morphological parser]). This data is also available in box plot form [https://frankier.github.io/apertium-tag
    16 KB (1,448 words) - 16:50, 22 August 2017
  • The purpose of this project is to allow Apertium language-pair developers to better translate "seperable" or "discontiguous" multiwor * (for language developers: have the language-data writer write it explicitly in the .lsx file)
    1 KB (205 words) - 18:36, 15 November 2017
  • ...an, Portuguese, there is support for generating a particular standard of a language (e.g. Brazilian Portuguese, Valencian). The way this is done may need to be * Language specific sections of monodix files.
    3 KB (461 words) - 15:31, 26 September 2016
  • * you get to decide what kinds of crazy half-finished language pairs to serve (or you can just serve a few of the high-quality ones that y Now install APY and the language pairs you want:
    5 KB (653 words) - 21:00, 2 April 2021
  • ...ary is to model the rules that govern the internal structure of words in a language. ...o begin with, some terminology; if you are familiar with graphs (as in the data structure), this might help. A finite-state automaton can be visualised as
    15 KB (2,200 words) - 12:04, 6 October 2014
  • * you get to decide what kinds of crazy half-finished language pairs to serve (or you can just serve a few of the high-quality ones that y Now install APY and the language pairs you want:
    5 KB (640 words) - 21:02, 2 April 2021
  • ...ter], which has been trained for more than 100 languages using web crawled data. Details are in his paper linked below. You can try the system [http://l ...issue is to optimize smoothing of the statistical models on a language-by-language basis.
    2 KB (307 words) - 19:50, 24 March 2020
  • Choose a language pair. For this example, it will be Italian (it) and English (en). To use ...ing a few million lines of xml. It will refer frequently to s1 (the first language of the two in the filename jrc-lang1-lang2.xml, which is jrc-en-it.xml in t
    7 KB (973 words) - 02:52, 20 May 2021
  • This is a test of all .t1x files in all language pairs in http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/. There Please find your favorite language pair below and check.
    67 KB (9,057 words) - 06:52, 24 September 2013
  • A corpus should be easily parsed by software that needs to get data from it. There is also metadata that should be stored in the corpus, and t * language of content (per article)!
    5 KB (813 words) - 00:08, 28 December 2011
  • The language pair seems to work OK in the sh→sl sense but not so well in sl→sh (appa Improving this language pair would be nice for the first milestone of the project Abu-MaTran (June
    2 KB (380 words) - 22:26, 3 August 2013
  • ** Learn shift/reduce using target-language information ? *: If a language uses CG, the rule SN -> @A→ @N would only match where CG mapped @A→ (an
    5 KB (788 words) - 10:50, 9 February 2015
  • ...se dictionaries where each node is a word in a language, and each arc is a language pair. For example like: http://i.imgur.com/SFOsRMv.png * Only one word per input language
    3 KB (487 words) - 00:02, 22 March 2018
  • == New Language or Pair Package == Import, push new branch data, push new upstream tag:
    8 KB (1,106 words) - 19:51, 26 April 2018
  • Bonus: use closely related language treebanks in UDPipe; transfer the lemmas, assume the POS tags remain the sa '''Week 6:''' stealing Apertium data
    4 KB (657 words) - 08:58, 3 April 2017
  • ...ioni possibili) e assicura che tutte abbiano un'equivalente nella ''Target Language''. Il risultato migliore sarebbe che non ci sia nessun errore nel Testvoc. ...Testvoc riguardano il verbo "stare". Non crediamo che siano errori "reali" data l'impossibilità nel riprodurli.
    13 KB (1,910 words) - 11:34, 23 August 2016
  • ...e ('lt-toolbox', 'apertium-lex-tools') is a collection of tools which pipe data one to another. You can use these tools individually. There are many instru However, to ease the use of the tools, Apertium language-builds pre-configure chains of tools into scripts. These pre-configured cha
    6 KB (992 words) - 17:25, 22 September 2016
  • The lack of documentation regarding the language pair, the monolingual dictionaries or even the tagger has made me put an ef ...r to create wikitables with a lot of information about transfer rules from data embedded into the rule files (T1X, T2X and T3X). There are other scripts th
    5 KB (887 words) - 22:24, 31 August 2017
  • ...modular, documented, open platform for machine translation and other human language processing tasks</li> <li>To favour the interchange and reuse of existing linguistic data.</li>
    8 KB (1,215 words) - 18:14, 3 March 2018
  • The Java port needs the C++ binaries for preparing/developing a language pair, i.a. to compile transfer files and train the tagger. ...ed Apertium JAR file, only dependent on JRE and an additional JAR file per language pair.
    9 KB (1,370 words) - 09:49, 7 April 2020
  • ...modular, documented, open platform for machine translation and other human language processing tasks</li> <li>To favour the interchange and reuse of existing linguistic data.</li>
    9 KB (1,356 words) - 18:34, 3 March 2018
  • ...[How to bootstrap a new pair]]. For existing pairs, see [[Install language data by compiling]],
    717 bytes (103 words) - 22:05, 7 March 2018
  • It requires internet permission to enable users to download language pairs (and developers to showcase their work from a phone). * language detection - for example using https://code.google.com/p/language-detection/
    3 KB (449 words) - 01:06, 4 June 2020
  • ...stem. Currently the transfer system becomes the main bottleneck in case of language pair with complex transfer systems because of the XML processing associated ...he very moment the user inserts or deletes text. This allows for a further data mining on the edits to detect commonly modified structures in a given trans
    16 KB (2,571 words) - 12:21, 20 June 2019
  • ...lect Corpus and Lexicon. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018.] :* A Parallel corpus of arz-ara-apc/ajp (2,994 sentences). The data was manually translated by professional translators. Sentences are collecte
    2 KB (192 words) - 11:11, 19 January 2022
  • ...modular, documented, open platform for machine translation and other human language processing tasks</li> <li>To favour the interchange and reuse of existing linguistic data.</li>
    8 KB (1,214 words) - 22:30, 3 August 2013
  • ! Language ..., the package 'spa' shown [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files here<sup>3</sup>], to be able to identify by the app texts in Spanish
    3 KB (450 words) - 16:23, 10 December 2018
  • ...article/download/3355/1843 . I ([[User:Mlforcada|Mlforcada]]) believe this language is much easier to write; it should be upgraded and documented. The preproce [[Category:Ideas for Google Summer of Code|Plain-text formats for Apertium data]]
    2 KB (324 words) - 11:37, 16 February 2016
  • ...he parts about lttoolbox/apertium, just install the language pair/language data itself if you ran [https://apertium.projectjj.com/osx/install-release.sh in
    2 KB (355 words) - 19:36, 12 May 2019
  • If you are editing Apertium language data (e.g. [[dix]] and [[transfer]] files), you should use a real XML editor. Th
    5 KB (783 words) - 14:25, 29 December 2020
  • ...m_New_Language_Pair_HOWTO]] – using lt-comp, lt-proc etc. to test language data
    443 bytes (64 words) - 16:56, 27 April 2017
  • If you want to work on Apertium language data and/or tools, you most likely want to use the binaries from Tino Didriksens
    2 KB (279 words) - 20:52, 2 April 2021
  • Apertium has migrated all the language data, the core, and a few tools to [https://github.com/apertium GitHub]. Many to
    1 KB (215 words) - 04:45, 9 March 2018
  • ...şmak'' istiyorsanız [[Minimal installation from SVN|check out the language data from SVN]] sayfasını okumalı ve derlemelisiniz ( Hala apertium/lttoolbox
    2 KB (313 words) - 21:02, 2 April 2021
  • * All the language data files:
    972 bytes (144 words) - 12:09, 26 September 2016
  • * Maintain consistency in the data present in the <r> tag in pardef entries. ...h modes.xml present in the same directory as the other files for the given language pair, this function checks and prompts incase a file defined in a program.
    9 KB (1,459 words) - 19:41, 15 May 2021
  • == Compiling the language pair == If you don't need to work on monolingual data use the nightly repos:
    1 KB (163 words) - 16:53, 28 May 2017
  • ...air setup nowadays is using transducers from Giellatekno and pair-specific data in Apertium. This is a tricky set up because there is a lot of machinery ar
    6 KB (984 words) - 17:56, 12 March 2016
  • * Prefer containers over home made data structures. It's going to make it impossible to build for language pair authors.
    5 KB (823 words) - 15:40, 26 September 2016
  • ...held back validation scripts for a few languages & give them reproducible language models ...Note that averaged here refers to averaging over time so that new training data isn’t given too much weight.
    2 KB (254 words) - 15:18, 13 June 2016
  • ...ima leme su povezane s paradigmama koje nam dozvoljavaju da opišemo kako se data reč menja bez pisanja svakog pojedinačnog nastavka. ...to see. In Serbo-Croatian this is videti. Serbo-Croatian is a null-subject language, this means that it doesn't typically use personal pronouns before the conj
    26 KB (4,259 words) - 07:00, 16 February 2015
  • Then, ''after'' reordering (for instance, into a Turkic-style language) to generate ''sister my Wales in lives'', ** I disagree. One of the key aspects of "my way" is that non-textual data between block tags are ''not'' sent through the translation chain at all, m
    9 KB (1,486 words) - 19:56, 24 March 2020
  • Metadixes are currently used in some language pairs, such as English-Catalan and Occitan-Catalan. linguistic data are compiled these dictionaries are pre-processed, so
    5 KB (744 words) - 08:26, 25 April 2016
  • '''Corpora and language data'''
    4 KB (570 words) - 18:43, 23 August 2016
  • ...dictionaries and transfer rules. The induction systems and open linguistic data can be used with the [[Apertium]] toolbox to build open-source MT systems. ...bes how to use ReTraTos to create a bilingual dictionary for your Apertium language pair. You will need:
    8 KB (1,253 words) - 09:42, 6 October 2014
  • Many language pairs in Apertium are unique, such as Breton-French, and many of them are u * Contact [User:mlforcada Mikel L. Forcada] to obtain the data cited in the paper.
    2 KB (238 words) - 19:49, 24 March 2020
  • Other deformatters and reformatters were written directly in C or C++ language without using XML files. So, they don't follow format specification descri ...ated from a format specification in XML. Rules for format, like linguistic data, are specified in XML, and they contain regular expressions with flex synta
    13 KB (1,781 words) - 09:49, 6 October 2014