Search results

Jump to navigation Jump to search

Page title matches

Install language data using packaging
...f the big language data sets. You do not want to add to or modify language data, you want to use it. <span style="color:darkorange;">'''Data may be outdated'''</span>, use only for system assessment. See the main sec

3 KB (445 words) - 12:38, 24 April 2017
Install language data by compiling
...]. The instructions are very different. This page is for existing language data. ...mar or HFST. If that happens, follow instructions under [[Install language data by compiling#Missing dependencies | missing dependencies]].

5 KB (843 words) - 19:44, 2 March 2023

Page text matches

Generating lexical-selection rules from a parallel corpus
* an Apertium language pair Make a folder called data-en-es. We are going to keep all the generated files there.

15 KB (2,206 words) - 13:58, 7 October 2014
Generating lexical-selection rules from monolingual corpora
* A language pair (e.g. apertium-br-fr) ** The language pair should have the following two modes:

12 KB (1,634 words) - 18:26, 26 September 2016
Learning rules from parallel and non-parallel corpora
Your language pair should be fully set up in the direction that you're training for, and * an Apertium language pair

14 KB (2,181 words) - 19:01, 17 August 2018
Running the monolingual rule learning
* Train a target side language model (http://hermes.fbk.eu/people/bertoldi/teaching/lab_2010-2011/img/irst * The language pair must support the pretransfer and multi modes. See apertium-sh-mk/modes

4 KB (503 words) - 19:01, 17 August 2018
Contributing to an existing pair
This is a guide on how to add linguistic data directly to an existing language pair in Apertium. It gets a bit technical – if you just want to notify us ...t-of-speech tagger, which is in charge of the disambiguation of the source language text.

50 KB (7,915 words) - 00:04, 10 March 2019
Using linguistic resources
...iew to the kind of data and resources that can be useful in building a new language pair for Apertium, and how to go about building them if they do not already Each Apertium language pair requires 3 dictionary files. For instance, for the English-Afrikaans

13 KB (2,112 words) - 12:11, 26 May 2023
Target-language tagger training
...nguage (<code>SL</code>) will be trained using information from the target language (<code>TL</code>). ==Language pair==

11 KB (1,470 words) - 08:16, 8 October 2014
Ideas for Google Summer of Code
...converted or expanded in the [[incubator]]. Consider doing or improving a language pair (see [[incubator]], [[nursery]] and [[staging]] for pairs that need wo == Language Data ==

23 KB (3,198 words) - 09:15, 4 March 2024
Task ideas for Google Code-in (2013)
...language pair XX-YY by adding 50 words to its vocabulary || Add words to language pair XX-YY and test that the new vocabulary works. [[/Add words|Read more]] ...language pair || Add or correct a structural transfer rule to an existing language pair and test that it works. [[/Add transfer rule|Read more]]... || [[User

68 KB (10,323 words) - 15:37, 25 October 2014
Languages Of Russia
...of any language in Russia in areas smaller than the Federal Subjects. The data is in Russian and comes from the official 2010 Russian Census website. ===Complete guide to accessing the data===

3 KB (561 words) - 17:58, 14 January 2018
Install language data using packaging
...f the big language data sets. You do not want to add to or modify language data, you want to use it. <span style="color:darkorange;">'''Data may be outdated'''</span>, use only for system assessment. See the main sec

3 KB (445 words) - 12:38, 24 April 2017
The quick and dirty guide to making a new language pair
...rtium machine translation system from scratch. You can check the [[list of language pairs]] that have already been started. ...translation systems. The only thing you need to do is write the data. The data consists, on a basic level, of three dictionaries and a few rules (to deal

19 KB (3,164 words) - 20:58, 2 April 2021
Install language data by compiling
...]. The instructions are very different. This page is for existing language data. ...mar or HFST. If that happens, follow instructions under [[Install language data by compiling#Missing dependencies | missing dependencies]].

5 KB (843 words) - 19:44, 2 March 2023
Using Giellatekno Divvun spellers with LibreOffice-Voikko on Debian
...on Ubuntu/Debian, using the Voikko plugins and Giellatekno/Divvun language data. ==Install the language data==

4 KB (596 words) - 21:02, 2 April 2021
Task ideas for Google Code-in
|title=Add recursive transfer support to a language pair that doesn't support it |description=Make a branch of an Apertium language pair that doesn't support recursive transfer and call it "recursive transfe

32 KB (4,862 words) - 06:23, 5 December 2019
Installation
* https://apertium.org is the official site, and offers all the released language pairs ...Apertium platform, and also offers a simple web interface to the released language pairs

6 KB (848 words) - 12:51, 1 April 2024
Apertium-apy
...rtium.org page uses an installation which currently only runs ''released'' language pairs (also available from https://apertium.org/apy if you prefer). However $ curl -G --data "lang=kir&modes=morph&q=алдым" https://beta.apertium.org/apy/analyse

37 KB (5,132 words) - 16:36, 5 June 2020
Grfro3d/proposal apertium cat-srd and ita-srd
...chine translation to understand the general meaning of the text in foreign language. The other approach is instead that of "dissemination" in which the MT is a ...(coding and decoding), data (linguistic data) and support tools to convert data and make them compatible with the engine. Even if most RBMT systems are pri

21 KB (3,171 words) - 14:34, 3 April 2017
Entraînement d'un tagueur de langue cible
[[Target-language tagger training|In English]] ...t changez les variables <code>DATA</code> et <code>DIRECTION</code>. <code>DATA</code> doit pointer vers le répertoire contenant les données de la paire

12 KB (1,625 words) - 08:20, 8 October 2014
Languages
...epository scheme. (Originally, all monolingual language data was found in language pairs, meaning that there was a lot of duplication.) If you feel something ...hat constitutes a minimally-useful language package; generally, however, a language package should have over 60% coverage on a variety of corpora and should pr

15 KB (1,783 words) - 22:33, 1 February 2019
Installation troubleshooting
====When running configure script for language pair data==== ====Workaround when language pairs need updated configure.ac's====

20 KB (3,153 words) - 08:13, 24 May 2019
Running the MaxEnt rule learning
DATA=/home/philip/Apertium/gsoc2013/monolingual/data ...atterns-frac-maxent.py $DATA/setimes.sh-mk.freq $DATA/setimes.sh-mk.ambig $DATA/setimes.sh-mk.annotated > events 2>ngrams

3 KB (520 words) - 21:25, 14 February 2014
Transfer rules examples
...to be translated. For example, HTML tags must not be translated in another language, but only the text of the Web page. ...e same software are used for every language pairs. It is the format of the data to be translated which will take to use a particular deformatter.

58 KB (8,365 words) - 20:16, 26 June 2018
Documentation of Matxin 1.0
Owing to the different syntactic structure of the phrases in each language, some Although the details of the modules and the linguistic data is presented in

58 KB (8,964 words) - 11:11, 14 May 2016
Flyer
...Iberian peninsula, but is now being used to translate between more distant language pairs. ...ngineering ([http://www.prompsit.com http://www.prompsit.com]). Linguistic data are being developed by Transducens, the Seminario

26 KB (3,122 words) - 06:25, 27 May 2021
Publications
...ngsnes (ed.) Bauta: Janne Bondi Johannessen in memoriam, Oslo Studies in Language 11(2), 2020. 489–501. (ISSN 1890-9639 / ISBN 978-82-91398-12-9) ...system/files/swj1419.pdf The apertium bilingual dictionaries on the web of data]. Semantic Web, 9(2), 231-240.

33 KB (4,418 words) - 11:52, 29 December 2021
Workflow reference
...tion of each module with more precision. They may also introduce technical language which linguists and/or computer coders would use. The technical description References to 'xxx' and 'yyy' refer to a language code, for example 'en-es'; 'English' to 'Spanish'.

29 KB (4,687 words) - 16:28, 5 June 2020
Finding numbers of speakers from the Russian census
...of any language in Russia in areas smaller than the Federal Subjects. The data is in Russian and comes from the official 2010 Russian Census website. Here are the steps to access the data:

2 KB (296 words) - 21:12, 13 January 2018
Pairviewer
...//d3js.org/ D3.js] tool that depicts all Apertium [[list of language pairs|language pairs]] in an interactive graph initially developed sometime before the [[G === Updating language data by scraping ===

5 KB (702 words) - 01:34, 9 December 2018
Hindi
=== Language pairs === .../github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin] Linguistic data for the Apertium Urdu-Hindi machine translator

6 KB (806 words) - 00:45, 7 December 2018
Apertium New Language Pair HOWTO
'''Apertium New Language Pair HOWTO''' ...rtium machine translation system from scratch. You can check the [[list of language pairs]] that have already been started.

36 KB (5,933 words) - 16:14, 22 February 2021
Easy dictionary maintenance
The number of language pairs in development for Apertium is increasing, and so is the complexity o language pairs. With better tools, more people will be able to develop language pairs.

29 KB (4,382 words) - 07:53, 6 October 2019
Freerbmt11
...he implementation of the algorithms must be free/open-source, but also the data themselves. Nowadays, there are many machine translation packages of this t ...morphologically rich languages, which even with large corpora suffer from data sparseness.

6 KB (905 words) - 17:26, 18 October 2010
Supervised tagger training
...-supervised.make this one] from en-eo. You will need modify it to fit your language pair. This usually means editing the first few lines. ===Tagger data directory===

3 KB (537 words) - 13:44, 18 June 2014
Sentence segmenting
|Language You will need to install NLTK and NLTK data. Unfortunately, they both only support Python versions 2.6-2.7. If you are

14 KB (2,232 words) - 12:51, 26 September 2018
Become a language pair developer for Apertium
...uide on how to use a development version of Apertium to make a change in a language pair. ...ou should try this to make sure things work before you move on to whatever language pair you plan on working on.

10 KB (1,626 words) - 17:46, 13 January 2020
Mandarin Chinese
...http://wiki.apertium.org/wiki/Mandarin_Chinese#In_Apertium some linguistic data in Apertium]. ...fers to the most commonly spoken form of Chinese that is the sole official language of China and Taiwan. It is also known as Putonghua or Standard Chinese ([[W

16 KB (2,148 words) - 03:28, 16 December 2015
French
...mpire, as did all Romance languages. There are currently 4 released French language pairs ...the sixth most spoken language in the world and is the second most studied language worldwide.

15 KB (2,081 words) - 07:14, 12 August 2020
Sardu abbarra bivu!
...MT based on corpora: adding new languages is very easy. To create a new language pair, in fact, it is not necessary to include corpora with millions of word ...airs can be added by creating dictionaries and rules containing linguistic data in XML format.

15 KB (2,339 words) - 00:41, 4 June 2018
Frequently Asked Questions
...ind that are incorrectly translated, to getting involved in creating a new language pair or programming on tools or user interfaces. Here are some question fre Our language agnostic tools are native and written in [https://en.wikipedia.org/wiki/C++

7 KB (1,139 words) - 06:27, 27 May 2021
English and Kazakh
...are basically for Anel, Aizhan and Assem who have started to develop this language pair... And Aida too... === Download apertium, lttoolbox and eng-kaz data from SVN ===

20 KB (2,856 words) - 06:26, 27 May 2021
Google Summer of Code/Wrap-up Report 2009
...ll these language pairs. This means that the data can be re-used by other language projects (e.g. in developing spelling or grammar checkers, thesauri, etc). This project was accepted as part of our "adopt a language pair" idea

12 KB (1,917 words) - 15:54, 12 September 2009
GSOC'16 Kira's results. Apertium website improvements: Docs diff
*'''langpair''': language pair to use for translation curl -G --data "langpair=eng|spa&q=run" http://localhost:2737/dictionaryLookup

5 KB (712 words) - 21:27, 16 August 2016
Farsi/About
...appear at the beginning of a sentence. The unique thing about the persian language though, is that they use prepositions which is quite uncommon in many SOV l ...designed a Two-sided morphology analyst of nouns and adjectives in Persian language, using Xerox Finite State Technology as giving input word (adjective or nou

16 KB (2,597 words) - 20:58, 12 January 2013
Apertium guide for Windows users
* Apertium language pairs .../engine of Apertium installed (including the requirement lttoolbox, but no language pairs yet).

9 KB (1,367 words) - 09:17, 26 May 2021
Bilingual dictionary
...of the main five data files in any language pair (see also: [[Apertium New Language Pair HOWTO]]). ....dix'' where ''apertium-A-B'' is the name of the [[List of language pairs| language pair]]. For example file ''apertium-af-nl.af-nl.dix'' is the bilingual dict

7 KB (1,244 words) - 16:41, 17 March 2018
Task ideas for Google Code-in/Getting started
...getting new contributors to Apertium and to helping spread our passion for language technology. ...of other things, live in our '''[[subversion|svn repo]]'''. The language data is found in the following places:

7 KB (1,091 words) - 19:54, 12 April 2021
Anaphora resolution module
...olving the antecedent of the anaphors in text becomes essential in several language pairs. ...ge it to the correct anaphor''' using a macro in the transfer rules of the language pair. (t1x)

20 KB (3,107 words) - 21:13, 24 June 2022
Ankush/Application
...nders , specially for Indian Languages because we still do not have enough data ...oreign languages. I am specially interested in MT systems where the source language is English and the target languages are Indian Languages. It is impossible

6 KB (923 words) - 17:57, 3 April 2010
Unsupervised tagger training
First, make a directory called <code><lang>-tagger-data</code>. Put your corpus into there with a name like <code><lang>.crp.txt</c ...cifies how to generate the probability file. You can grab one from another language package. For <code>apertium-en-af</code> I took the Makefile from <code>ape

7 KB (1,177 words) - 08:34, 8 October 2014
Narimann/GSOC 2019 proposal: Kazakh-Turkish and Turkish-Kazakh
'''Track:''' Data Science Dynamic Language Interpreter implementation

8 KB (1,094 words) - 13:10, 14 April 2019
Using Apertium spellers with LibreOffice-Voikko on Debian
==Install language module== A language module supporting spelling may be installed, either from our repository, or

3 KB (387 words) - 12:21, 26 September 2016
Assimilation Evaluation Toolkit
...ion of machine translation. The tasks consist of sentences in the original language, reference translation with keywords omitted and the machine translation of ...various { gap } in order to discover phenomena and patterns in the natural language.

9 KB (1,368 words) - 09:04, 23 April 2015
Курсы машинного перевода для языков России/Session 0
...duce translations which are less fluent, but more preserving of the source language meaning. ...er and number between a determiner and head noun will remain in the target language output.

12 KB (1,464 words) - 12:00, 31 January 2012
Helsinki Apertium Workshop/Session 0
...duce translations which are less fluent, but more preserving of the source language meaning. ...er and number between a determiner and head noun will remain in the target language output.

11 KB (1,519 words) - 06:51, 11 May 2013
Tartu Apertium Course/Session 0
...duce translations which are less fluent, but more preserving of the source language meaning. ...er and number between a determiner and head noun will remain in the target language output.

11 KB (1,519 words) - 18:27, 16 October 2015
Install quick tests
More convincing if you have a language pair on the computer somewhere :) ...this should work for both packaged and compiled Apertium. Without language data you can't see a translation, but you can see the help. Try,

2 KB (368 words) - 06:02, 24 April 2017
Apertium kullanarak dil çifti geliştir
...probably try this to make sure things work before you move on to whatever language pair you plan on working on. Note that some existing language pairs have external dependencies, like HFST or Constraint Grammar. The [[In

10 KB (1,715 words) - 12:29, 28 May 2018
Indirect contribution guide
...tended to show how you can make an "indirect" contribution, by documenting language resources, helping us to build bilingual test sets, translating, promoting, ...first language, and translate them to the other. A translation in a third language may be useful in enlisting help, but is not required.

9 KB (1,494 words) - 05:58, 18 March 2015
Google Summer of Code/Application 2016
...ed translation, morphological analysis, natural language processing, human language technologies ...Spanish–Catalan) but which has been expanded to deal with more divergent language pairs (such as English-Catalan and even Basque→English). The platform pro

10 KB (1,500 words) - 16:23, 18 February 2016
Apertium on Ubuntu or Debian
...probably just search for, tick off and install Apertium and your favorite language pairs in Synaptic. There's a friendly [https://help.ubuntu.com/community/Sy Step 2: '''Download apertium, lttoolbox and language pairs from SVN.'''

3 KB (475 words) - 16:28, 27 April 2017
Apertium-get
'''apertium-get''' is a little script to fetch and compile language data, with monolingual dependencies, from Github. ...d and compiled by just going to the directory where you want your language data to be, and running

2 KB (317 words) - 20:45, 23 March 2019
Automatic postediting at GSoC 2018
==== Data preparation ==== There were three attempts to extract postediting operations for each language pair: with threshold = 0.8 and -m, -M = (1, 3).

7 KB (1,033 words) - 15:27, 15 August 2018
Bilingual dictionary enrichment via graph completion
<li>- 4: preprocessing : dictionary data needs some changes to be used in a graph, this step prepares it for further ...recommends what languages will be the most efficient to enrich particular language pair</li>

19 KB (2,541 words) - 15:44, 12 August 2018
Chebrolutejasvi/GSOC 2020 proposal: Hindi-Telugu
...d was exposed to different languages. This led to me being fascinated with language translation and I wanted to contribute to help in making communication easi I am going to work on “ Adopt an unreleased language pair: Hindi - Telugu”. I want to get the pair released in both the direct

9 KB (1,391 words) - 16:41, 31 March 2020
Apertium on Mac OS X
== Language data packages == If you've installed tools with install-nightly.sh, you can install language data with

4 KB (665 words) - 11:57, 18 November 2022
Google Summer of Code/Application 2009
...um project is a project which works on open-source machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...versitat d'Alacant] (Alacant, Spain) and [http://www.prompsit.com Prompsit Language Engineering].

10 KB (1,543 words) - 19:50, 12 April 2021
Ideas for Google Summer of Code/Bilingual dictionary enrichment via graph completion
...f language pairs that may be used to infer new entries for existing or new language pairs using graphs. ...a graph and relevant information is stated about them. The cloud of linked data is intended to be navigated by software agents primarily. In the case of Ap

3 KB (452 words) - 19:50, 24 March 2020
Siciliano y castellano/Informe final
...oject goal is to create a machine translation package for Sicilian-Spanish language pair on the base of Apertium’s machine translation system. This project i ...he Sicilian dictionary was the abundance of spelling forms in the Sicilian language. For instance, one Sicilian verb with the meaning 'to join' can have the fo

9 KB (1,370 words) - 13:58, 23 August 2016
Sardinian and Italian/Final Report
...language particularly suitable for various reasons. First, because it is a language in process of standardization, so both the linguistic resources (written do ...he near future, it will be possible to operate in the translation of other language pairs as Sardinian-Catalan and Sardinian-Spanish.

7 KB (1,110 words) - 11:34, 23 August 2016
Ideas for Google Summer of Code/Adopt a language pair
...declarative language. A good intro would be to look through [[Apertium New Language Pair HOWTO]], see also [[Contributing to an existing pair]]. If the pair ha #* If there is no translation, translate it into the languages of your language pair first.

6 KB (1,024 words) - 15:22, 20 April 2021
Google Code-in/Application 2015
...rs independent free-software developers. There are currently 40 published language pairs within the project (including a number of "firsts" — for example Sp natural language processing, machine translation, grammar, python, c++, linguistics, languag

7 KB (1,111 words) - 10:10, 15 November 2015
Using Apertium spellers with LibreOffice-Voikko on Debian/Manual compilation
==Install language module== * To install Kazakh language module, first get it

4 KB (492 words) - 02:54, 10 March 2018
Apertium on openSUSE
You can replace cy-en by different language pair. For the list of language pairs go [http://wiki.apertium.org/wiki/List_of_language_pairs#Trunk_.28rel === Install language-pair data ===

5 KB (808 words) - 02:48, 9 March 2018
Shallow syntactic function labeller
1. All needed data for North Sami, Kurmanji, Breton, Kazakh and English was prepared: there ar ...Also the testpack for two language pairs was built: it contains all needed data for sme-nob and kmr-eng, the labeller and installation script.

5 KB (764 words) - 01:40, 8 March 2018
Writing a scraper
#* If you can't understand the language the website is written in, ask for help in IRC or use a translator and look ...er when calling <code>Writer()</code>. For example if we want to write the data every 30 seconds call <code>Writer(30)</code>.</li>

14 KB (2,389 words) - 05:20, 29 March 2019
Uralic languages
...family of some three dozen related languages descended from a Proto-Uralic language and spoken by more than 25 million people throughout Europe and Northern As ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.

22 KB (2,520 words) - 23:09, 22 December 2014
Romanian and Catalan/GSOC 2018
...e Summer of Code 2018. It also includes information on the upgrade of four language pairs which was carried out during the same period. For a more detailed wor ...tem and develop it to bring it to release quality. In addition, four other language pairs have been upgraded to the monolingual package system to ease future d

7 KB (1,071 words) - 10:48, 14 August 2018
Install Apertium core using packaging
...l be available. For various reasons, the author has successfully developed language pairs using public repository versions of Apertium core. ...tes and Apertium tools. You also get, for optional install; release-level language pairs, service providers, constraint grammar code, and more. All under pack

6 KB (1,006 words) - 18:26, 27 April 2021
Google Summer of Code/Application 2011
...m project develops a free/open-source platform for machine translation and language technology. We try to focus our efforts on lesser-resourced and marginalise ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also independent free-software developers play a huge rol

13 KB (2,013 words) - 12:21, 20 June 2019
Google Summer of Code/Application 2010
...m project develops a free/open-source platform for machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also independent free-software developers play a huge rol

11 KB (1,802 words) - 19:51, 12 April 2021
Preparing data for Moses factored training using Apertium
===Download and compile data=== ...</code> and <code>apertium-is-en</code>. You can find others at: [[list of language pairs]] and [[list of dictionaries]].

4 KB (647 words) - 07:45, 8 October 2014
Romance languages
...dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs. !rowspan=2| Language

18 KB (2,312 words) - 18:25, 18 September 2016
Semitic languages
...) constitute a group of related languages and a branch of the Afro-Asiatic language family. Spoken by more than 470 million people throughout North Africa and ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.

20 KB (2,336 words) - 18:10, 14 April 2015
Ideas for Google Summer of Code/automatic-postediting
== Improving language pairs by mining MediaWiki Content Translation postedits == ...and bidix entries to improve the performance of an Apertium language pair. Data is available from Wikimedia content translation through an [API https://www

3 KB (383 words) - 19:56, 24 March 2020
Ideas for Google Summer of Code/Apertium Occitan French
...language, as Apertium offers the only machine translation system for this language pair. The idea is to make Occitan output easier to postedit and French outp ...guage data], [https://github.com/apertium/apertium-fra the French language data], and [https://github.com/apertium/apertium-oci-fra the Apertium Occitan-F

2 KB (213 words) - 19:48, 24 March 2020
Altay
=== Altai Language Resources === Crúbadán language data for Southern Altai. Kevin Scannell. 2015. The Crúbadán Project. oai:cruba

2 KB (217 words) - 06:57, 5 December 2017
Freeling
...in some cases data or tools from Freeling could be useful to apertium, and data from apertium could be useful to Freeling. Also, to install the data, I had to change the lines in freeling/data/Makefile.am that looked like

5 KB (720 words) - 02:20, 10 March 2018
Fisl13
...Everything in Apertium is free/open source: engine, data for more than 29 language pairs and tools to translate at a speed of more than 20,000 words per secon === Useful data ===

1 KB (175 words) - 14:19, 25 July 2012
Error: A new ambiguity class was found
(in this example, I use eng as language resp. eng-deu as pair) the file ./eng-tagger-data/eng.dic for some reasons is empty (has a file size of 0).

1 KB (165 words) - 14:16, 28 August 2016
Dravidian languages
...e>[http://www.ethnologue.com/subgroups/dravidian dra]</code>) constitute a language family of about 70 languages spoken primarily in South Asia. The four most ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.

19 KB (2,201 words) - 09:21, 9 December 2019
Turkic languages
...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. ...ictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

35 KB (3,577 words) - 15:24, 1 October 2021
Iranian languages
...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. ...dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

22 KB (2,532 words) - 11:36, 30 July 2018
Apertium
...y aimed at related-language pairs but expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides * a language-independent machine translation engine

776 bytes (114 words) - 19:07, 12 September 2018
Курсы машинного перевода для языков России/Session 8
...on-months (four people, 18 months) to develop (both engine, and linguistic data). It was widely used, with thousands of requests per day. ...sh State to rewrite the code as open-source, and to convert the linguistic data. After one person year, the first version of the Spanish--Catalan translato

12 KB (1,679 words) - 12:00, 31 January 2012
Google Summer of Code/Application 2012
...m project develops a free/open-source platform for machine translation and language technology. We try to focus our efforts on lesser-resourced and marginalise ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but independent free-software developers also play a huge rol

11 KB (1,680 words) - 12:22, 20 June 2019
Helsinki Apertium Workshop/Session 8
...on-months (four people, 18 months) to develop (both engine, and linguistic data). It was widely used, with thousands of requests per day. ...sh State to rewrite the code as open-source, and to convert the linguistic data. After one person year, the first version of the Spanish--Catalan translato

12 KB (1,683 words) - 08:42, 10 May 2013
Tartu Apertium Course/Session 8
...on-months (four people, 18 months) to develop (both engine, and linguistic data). It was widely used, with thousands of requests per day. ...sh State to rewrite the code as open-source, and to convert the linguistic data. After one person year, the first version of the Spanish--Catalan translato

12 KB (1,683 words) - 11:00, 30 October 2015
Unigram tagger
...ll the unigram models from “A set of open-source tools for Turkish natural language processing.”<ref name="trmorph-tools">http://coltekin.net/cagri/papers/tr ...tuff.”<ref name="prerequisites">[[Installation#If you want to add language data / do more advanced stuff]]</ref>

20 KB (3,229 words) - 20:06, 12 March 2018
Odia
...s one of the official languages of India, and has around 33 million native language speakers globally. .../ktpress.org.in/pdf/evolution_of_oriya_language.pdf The Evolution of Oriya Language and Script], ''Utkal University, Cuttack,''

13 KB (1,770 words) - 06:56, 3 December 2017
Celtic languages
...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. ...dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

10 KB (1,263 words) - 06:04, 23 December 2014
Language pair packages
'''Language pair packages''' are standalone JARs that can be run independently as well Since JAR files are nothing but renamed ZIP files, you can easily edit language pair packages to fit your needs. Note that the packages are ready to be use

11 KB (1,497 words) - 08:23, 7 April 2020
Germanic languages
...ogue.com/subgroups/germanic gem]) constitute a branch of the Indo-European language family spoken primarily in Europe, Anglo-America and Australasia. The commo ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.

32 KB (3,684 words) - 06:16, 28 December 2018
Ideas for Google Summer of Code/lint for Apertium
Make a program which tests Apertium data files for suspicious or unrecommended constructs (likely to be bugs). Some ...x]] (dix) dictionary data, perhaps also transfer rules. The [[Apertium New Language Pair HOWTO]] should introduce most of the terminology and background you ne

5 KB (789 words) - 10:36, 31 May 2016
Google Summer of Code/Application 2008
...cant] (Alacant, Spain); the other one is [http://www.prompsit.com Prompsit Language Engineering]. These two organizations are currently responsible for most of ...systems to translate less-closely related languages. We have 10 published language pairs, and three more currently in development.

8 KB (1,255 words) - 19:50, 12 April 2021
Translating mnemonic files
...the mnemonic (starting on the first column) must be kept unchanged from a language to another, while the string farther to the right is translated. By defaut, as for lttoolbox, apertium, and the language pairs, the installation is done in <code>/usr/local/bin</code> and <code>/u

5 KB (789 words) - 12:16, 15 June 2018
Translation quality statistics
...r/>words !! data-sort-type="number"|WER !! data-sort-type="number"|PWER !! data-sort-type="number"|BLEU !! Reference / Notes ...forms that get some analysis, may give an indication of the maturity of a language pair.

9 KB (1,233 words) - 09:10, 21 November 2021
Javanese
...Javanese language]]) is an [[Wikipedia:Austronesian languages|Austronesian language]] from Indonesia, spoken by the Javanese people from the central and easter Its language code is '''jv''' and '''jav'''.

7 KB (881 words) - 13:11, 12 December 2018
Ideas for Google Summer of Code/Apertium African
...e language pairs (which haven't been started or have currentlu very little data in Apertium) and write an usable version which provides intelligible output * If there is some data for the language pair in the Apertium Github server, check it out and install it.

2 KB (238 words) - 13:45, 24 February 2023
Crossdics
...guage pairs <code>aa-bb</code> and <code>bb-cc</code> it will create a new language pair for <code>aa-cc</code>. * '''sl-tl''': source language (sl) and target language (tl).

5 KB (633 words) - 13:29, 6 October 2017
PMC proposals/Apertium Workshop in Russia
...eof, and following that the development of a prototype pair for a minority language of Russia. Russia has a long history of work in machine translation, but ve ...h oil, as Tatarstan and Sakha) students with good knowledge of a minorised language seldom have a computer and/or access to the internet. That is the case at l

18 KB (2,991 words) - 22:24, 3 August 2013
PMC proposals/Move Apertium to Github
* Individual repos for each pair, language module, and tool (preserving all commit history). ...ch|talk]]) 13:04, 7 February 2018 (CET) To install apertium and one or two language pairs, you (just) have to follow few wiki pages and then, you get the only

22 KB (3,325 words) - 14:06, 12 March 2018
Удмуртско-русский переводчик
...D0%BE%D1%81%D1%81%D0%B8%D0%B8 Šupaškar Apertium Workshop]. Russian part of language pair was created using [[lttoolbox]], and all files, needed for Russian, we === Some data ===

3 KB (299 words) - 06:39, 30 January 2012
Specific resources per language
...tps://apertium.github.io/apertium-on-github/source-browser.html. It houses language pairs which haven't completely matured and are under work. ==Specific resources per language==

10 KB (1,336 words) - 20:40, 11 December 2019
Resources
{{see-also|Incubator|Specific resources per language}} ...Pair HOWTO|making a language pair]], feel free to make a new page for the language in question and paste it there. Stuff like basic dictionaries, paradigms, r

1 KB (164 words) - 05:20, 4 December 2019
Lexical feature transfer - First report
for every sentence s in the source language corpus: for every sentence in the source language corpus:

6 KB (838 words) - 17:47, 25 July 2012
File names
Apertium has some naming conventions for the various files used in language data: Files compiled when you do "make" in a language pair:

890 bytes (126 words) - 10:10, 14 March 2017
UDPipe
;Get some data! Now try it on your own data.

5 KB (822 words) - 19:43, 9 March 2020
Semantic tagging
== Data sources == * Often a word can be disambiguated using its translation in another language, for example the triple (estació, gare, station) defines a building meanin

5 KB (949 words) - 15:27, 15 June 2020
Prerequisites for RPM
...t plan on working on the core C++ packages (but only want to work on / use language pairs), you can install all prerequisites with yum/zypper, using [[User:Tin For a list of available language pairs and other packages, see https://build.opensuse.org/project/show/home:

1 KB (231 words) - 10:03, 12 January 2022
Install Apertium core by compiling
...you have something, immediately, it to try invoke a tool. Without language data you can't see a translation, but you can see the help. Try, ...language data by compiling]]. Or, if your system has packaging, download a language package (but beware, a package manager may pull in a old package of Apertiu

5 KB (821 words) - 02:55, 27 July 2022
Plugin for Pidgin
...eir buddies (both incoming and outgoing messages). If the user has set the language pair eng-spa (English → Spanish) for incoming messages from buddy1, th *'''/apertium_check''' Shows the current language pairs associated with the buddy whose conversation you issued the command o

8 KB (1,263 words) - 02:18, 9 March 2018
Tartu Apertium Course/Session 7
...ly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The In contrast to many other types of systems for natural language processing — such as morphological analysers and part-of-speech taggers,

18 KB (2,493 words) - 10:59, 30 October 2015
Hectoralos/GSOC 2019 proposal: Catalan-Italian and Catalan-Portuguese
I’m a sociolinguist working on language maintenance and shift. I'm very interested in creating resources for minori '''1.2 Bring a released language pair up to state-of-the-art quality''': I'd like to improve the pairs Catal

16 KB (2,285 words) - 06:46, 12 April 2019
Why we trim
...erator.<ref>Typically this goes for both translation direction, although a language pair only released for one direction might only be trimmed in that directio ...at when post-editing, the post-editor has to constantly look at the source language text (whereas an unknown word would be possible to translate there and then

4 KB (679 words) - 16:06, 3 May 2020
Курсы машинного перевода для языков России/Session 7
...ly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The In contrast to many other types of systems for natural language processing — such as morphological analysers and part-of-speech taggers,

18 KB (2,490 words) - 12:00, 31 January 2012
Helsinki Apertium Workshop/Session 7
...ly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The In contrast to many other types of systems for natural language processing — such as morphological analysers and part-of-speech taggers,

18 KB (2,493 words) - 08:39, 10 May 2013
Monodix basics
...u can distinguish an element from an attribute and can recognise character data. If you want a quick recap, this should help: :<element attribute="value">character data</element>

11 KB (1,851 words) - 07:42, 16 February 2015
Apertium-quality/Quickstart
...t. It most likely won't let you in order to guarantee the integrity of the data. Morph testing isn't supported by the language we're using, but it is as simple to run as regression testing. One simply r

12 KB (1,931 words) - 17:06, 24 October 2018
Uighur and Turkish/Paper
...Machine Translation] - This looks interesting, 200K sentences of bilingual data collected, we should contact the authors to see if we can access it [https: ...eb interface [http://nmt.cloudtrans.org/ here], but unclear wrt details of data/evals [https://scholar.googleusercontent.com/scholar.bib?q=info:A6cMdf1SuHw

10 KB (1,483 words) - 07:00, 14 August 2018
Press
Websites referencing Apertium categorised by language of the website. News about Apertium categorised by language of report.

13 KB (1,689 words) - 21:42, 28 February 2021
How to bootstrap a new pair
...ium-init to bootstrap a new language pair (optionally with new monolingual data packages as well). ...is script in your working directory where you will be downloading language data. You can get the script from https://apertium.org/apertium-init

5 KB (824 words) - 15:30, 20 April 2021
Indonesian
...ipedia:Indonesian language]]) is an Austronesian language and the official language of Indonesia. Since it is a register of [[Malay]], it is also often general In [[Apertium]], there is a language pair of [[Indonesian and Malaysian]] already in the [[Trunk|trunk category]

5 KB (629 words) - 13:08, 21 December 2019
Traductions en français
| width=320 | '''[[Apertium New Language Pair HOWTO]]''' | [[Become a language pair developer for Apertium]]

13 KB (1,601 words) - 23:31, 23 July 2021
Google Code-in/Application 2013
...m project develops a free/open-source platform for machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...eloped around the world, both in universities and companies (e.g. Prompsit Language Engineering) and by a growing numbers independent free-software developers.

6 KB (1,057 words) - 15:34, 28 October 2013
Mongolic languages
!rowspan=2| Language ==Existing language pairs==

5 KB (538 words) - 15:52, 11 April 2015
Sudo
If you're working on language data, <code>sudo</code> is pretty much only for running package managers like <c ...exception is <code>sudo make install</code>, but when working on language data you should never have to do this.

856 bytes (144 words) - 12:52, 3 May 2018
Google Code-in/Application 2014
...rs independent free-software developers. There are currently 40 published language pairs within the project (including a number of "firsts" — for example Sp ...ommunication) often occurs at this age, and if we can show them that their language is useful, and other people care, and there is no barrier for its use in th

6 KB (987 words) - 10:21, 7 November 2014
Interfaces
...e official web site – it serves only the ''released'' (stable) versions of language pairs ** This is the official "beta" site – it serves the latest work in all language pairs (so things may work better, but also may have weird bugs). You can al

3 KB (457 words) - 07:42, 18 June 2021
Daemon
...ecifies the parameters and data files specific to that language pair. Each language pair can contain a number of modes; most of these are used for debugging ea ...b server. We use apertium-nn-nb as an example, but it should work with any language pair; the modules lt-proc/cg-proc/apertium-{tagger,pretransfer,transfer,int

13 KB (2,039 words) - 11:56, 3 June 2022
Top tips for GSOC applications
...ding period — and for documentation. Anyone thinking of working on a language pair should make sure that they read about [[testvoc]] and other quality co ...all]] Apertium and a language pair; read through the [[:Category:HOWTO|new language pair HOWTO]]. This might even give you some more ideas!

9 KB (1,509 words) - 23:51, 27 February 2023
Integrating Tesseract OCR into Apertium
...d of existing trained models. Successful tries are saved into new training data.<ref>https://static.googleusercontent.com/media/research.google.com/en//pub ...butions can also be found [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files-Contributions here].

2 KB (305 words) - 14:36, 28 October 2018
Apertium-init
...er]] or [[CG]] files. It creates fully working Makefiles and stub language data, so you can compile and test straight away (assuming you've [[Installation|

744 bytes (108 words) - 20:38, 13 January 2021
Travis settings for Apertium
...thub. What this actually means is that you can set an apertium language or language pair on github to automatically build and test on each commit. You only nee This is an example for a monolingual data using hfst (from [apertium-fin]):

2 KB (249 words) - 06:26, 27 May 2021
Apertium-tki
Apertium language data for Iraqi Turkmen. [[Category:Language data]]

1 KB (144 words) - 20:07, 15 July 2021
Apertium Nieuw talenpaar HOWTO
...temen kan maken. Het enige wat je zelf moet doen, is de data schrijven. De data bestaat uit 3 belangrijke delen, de woordenboeken, en enkele regels (woordv ...ems van de oorspronkelijke taal(source language='sl')of de doeltaal(target language='tl') kan kiezen en veranderen.

36 KB (5,761 words) - 14:34, 4 December 2011
Nieuw talenpaar maken
...temen kan maken. Het enige wat je zelf moet doen, is de data schrijven. De data bestaat uit 3 belangrijke delen, de woordenboeken, en enkele regels (woordv ...ems van de oorspronkelijke taal(source language='sl')of de doeltaal(target language='tl') kan kiezen en veranderen.

36 KB (5,767 words) - 07:07, 16 February 2015
Морфологический трансдуктор русского языка
...textbook distinction in language, isn't it? When you start exploring real data the boundaries fade very fast and everything looks much more complicated.

22 KB (2,150 words) - 20:21, 24 April 2013
UD annotatrix/UD annotatrix at GSoC 2017
...statistical parser, which in turn can serve different purposes of natural language processing. For creating a good treebank, manual annotation and/or disambig ...interface allows to work with CoNLL-U and CG3 formats, and to convert the data between the formats. It also allows to either upload or paste corpora in pl

6 KB (930 words) - 15:59, 29 August 2017
Bugzilla
| 64 || Apertium-tolk should give proper warning when no linguistic data is installed || 2008-03-31 || Wynand Winte ...rg/cgi-bin/bugzilla/index.cgi here]. Please feel to report your bug in any language you are comfortable with.

12 KB (1,254 words) - 22:08, 7 March 2018
VM for transfer
| clip || - || N/A || part → value || Obtains the part in the only language there is (inter/post-chunk) and pushes the value onto the stack ...|| - || link-to || part, pos → value || Obtains the 'part' in source language in position 'pos' and pushes the 'value' onto the stack. An optional operan

14 KB (2,020 words) - 13:58, 7 October 2014
Perceptron tagger
While training can be done directly in the language directory, it is a better idea to train the tagger with copies of the files ...e the training directory (replace <code>lang</code> with the corresponding language code).

4 KB (651 words) - 13:36, 23 August 2017
Kashmiri
{{Language Kashmiri is an Indo-Aryan language spoken in the Kashmir Valley and regions around it that were historically a

6 KB (811 words) - 10:42, 2 July 2018
Ideas for Google Summer of Code/Make a language pair state-of-the-art
..., transfer rules, scripting, corpora. The objective is to make an Apertium language pair state-of-the-art, or close to state-of-the-art in terms of translation ...ge pair of your choice in Apertium and install it. (see [[Install language data by compiling]])

2 KB (383 words) - 19:46, 2 March 2023
Comment contribuer à une paire de langues existante
* répertoire es-tagger-data : Contient les données nécessaires pour le tagger espagnol (corpus, etc.) * répertoire ca-tagger-data : Contient les données nécessaires pour le tagger catalan (corpus, etc.)

54 KB (8,480 words) - 18:55, 10 April 2017
Online Apertium Workshop 2020
.../presentation/d/1LBcBs3KdzfS7vl6Sxe0UtOMLpWNMM6ciGS_YPCnxTr0 Reading-bound data as inline secondary tags]", Tino Didriksen *** "Reading-bound data is best transported as inline secondary tags, proven both by practical expe

3 KB (509 words) - 15:49, 2 July 2020
North Saami and Finnish
** We can haz. Data is now checked in on Victorio at /langtech/trunk/words/dicts/algu, with a r ...ns Finnish and Northern Sámi. Ryan can contact them if it seems like their data would be of use.

16 KB (2,457 words) - 08:19, 12 April 2017
Trigger build on file save
...our language data directory (replacing "apertium-foo" for your monolingual data dir):

725 bytes (111 words) - 09:24, 2 March 2016
Translating man pages
By defaut, as for lttoolbox, apertium, and the language pairs, the installation is done in <code>/usr/local/bin</code> and <code>/u ...ium</code> command, there is the '''<code>-f</code>''' option to translate data produced in this format without having to call "by hand" a deformatter and

5 KB (780 words) - 11:48, 15 June 2018
Helsinki Apertium Workshop/Programme
...;13:00  ||   '''Practical''': Installing Apertium and creating a language pair ....sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session7a.pdf Data consistency, quality] and [https://svn.code.sf.net/p/apertium/svn/branches/

8 KB (720 words) - 15:18, 20 March 2015
Writing Makefiles
# Most language pairs don't need to specify anything else for install-data-local: install-data-local: install-modes

4 KB (612 words) - 13:09, 18 February 2015
Apertium-apy/Language identification
This page contains data for CLD2 coverage. If need help to obtain CLD2 coverage of a certain language, contact [[User:Wei2912]].

75 KB (7,440 words) - 17:12, 8 August 2014
Apertium on SliTaz
Where LANGUAGE_PAIR is language pair (e.g. en-eo) wget http://sunsite.unc.edu/pub/Linux/system/keyboards/console-data-1999.08.29.tar.gz

2 KB (281 words) - 02:58, 9 March 2018
Ideas for Google Summer of Code/Unsupervised weighting of automata
** Select a language ** Use the Apertium morphological analyser to analyse the test data

1 KB (213 words) - 21:13, 18 March 2019
Shallow syntactic function labeller/Workplan
...is it possible to achieve pretty good results having very small amount of data (like in case of Breton) ...ad of the original syntax module in kmr-eng pipeline. The testpack for two language pairs was built. All code was cleaned up, some docstrings were written. Als

6 KB (833 words) - 12:56, 22 August 2017
Apertium on Windows
...s, data, and other system resources with applications, software tools, and data of the Unix-like environment. Therefore it is possible to launch Windows ap Now you're ready to download and build language pairs and use them under Cygwin's shell.

12 KB (1,883 words) - 22:06, 7 March 2018
Shell scripting
If you want to work on Apertium language pairs or tools, some knowledge of the Unix shell / command-line scripting w ...hell/ shell scripting] and [https://hacker-tools.github.io/data-wrangling/ Data wrangling] are relevant and succinct

746 bytes (101 words) - 09:20, 8 February 2019
Apertium-service
...nslation pairs as a service and provides '''translate''' and '''detect''' (language recognition) capabilities over an '''XML-RPC''' interface, as well as '''RE ...for discussion). It also manages a ''resource pool'' of e.g. language pair data, both (eagerly) pre-allocated and (lazily) allocated at need, up to a high

13 KB (1,764 words) - 03:29, 6 November 2019
Google Code-in/Application 2010
...m project develops a free/open-source platform for machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also independent free-software developers play a huge rol

3 KB (424 words) - 19:24, 29 October 2010
Google Summer of Code/Application 2013
...m project develops a free/open-source platform for machine translation and language technology. We try to focus our efforts on lesser-resourced and marginalise ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but independent free-software developers also play a huge rol

9 KB (1,376 words) - 15:24, 22 March 2013
PMC proposals/New naming of the Bosnian-Croatian-Montenegrin-Serbian Sprachbund
...ces for the involvement of Croatian researchers and developers in Apertium language pairs involving Croatian as part of the [http://cordis.europa.eu/projects/r ...for the more inclusive ISO-639-2 code hbs to be used to refer to it in all language pairs developed inside Apertium for components of this macrolanguage.

6 KB (987 words) - 22:27, 3 August 2013
Listing Apertium element using command-line
** a language pair, ** the reference files for a language.

8 KB (1,327 words) - 21:34, 17 February 2019
Prerequisites for Debian
...t plan on working on the core C++ packages (but only want to work on / use language pairs), you can install all prerequisites with apt-get, using [[User:Tino D # or, to get all dependencies for building a language from git:

2 KB (311 words) - 21:05, 2 April 2021
Autoconcord
If you are the unlucky owner of a language pair where you must maintain the synthetic adjective tag (<sint>) in the bi -prepare attempts to detect and insert autoconcord data into the monodices,

7 KB (1,185 words) - 08:39, 6 October 2014
Comparison of part-of-speech tagging systems
!rowspan=3|System !!colspan=7|Language ...ives]/[words with a correct analysis from the morphological parser]). This data is also available in box plot form [https://frankier.github.io/apertium-tag

16 KB (1,448 words) - 16:50, 22 August 2017
Getting bilingual dictionaries from OmegaWiki
...rossdics|crossdics]] package) to get cheap bilingual dictionaries from any language pair available in [http://www.omegawiki.org OmegaWiki] database. ...downloads/omegawiki-lexical.sql.gz download] the latest version of lexical data from the OmegaWiki database (see also [http://www.omegawiki.org/Help:Downlo

2 KB (202 words) - 00:55, 24 January 2018
Unification of metadix and parametrized dictionaries
Different language-pair packages use different strategies to generate .dix dictionaries ([[mon ...t versions of a translator (for instance, for two different varieties of a language, such as Brazilian and European Portuguese) whose names could be ideally ti

11 KB (1,733 words) - 08:24, 25 April 2016
Linguistic Resources Document
* '''sl''': source language (for example, in morphological and bilingual dictionaries) * '''tl''': target language (for example, in bilingual dictionaries)

8 KB (902 words) - 09:19, 6 October 2014
Apertium-dixtools
...ictionary for languages A and C is built from dictionaries for A-B and B-C language pairs. (or some other Unicode language installed - I use eo.UTF-8) and run the tests again.

8 KB (1,070 words) - 01:29, 26 October 2018
Packaging
== New Language or Pair Package == Import, push new branch data, push new upstream tag:

8 KB (1,106 words) - 19:51, 26 April 2018
Vin-ivar/proposal ud apertium
Bonus: use closely related language treebanks in UDPipe; transfer the lemmas, assume the POS tags remain the sa '''Week 6:''' stealing Apertium data

4 KB (657 words) - 08:58, 3 April 2017
Apertium separable/report2017
The purpose of this project is to allow Apertium language-pair developers to better translate "seperable" or "discontiguous" multiwor * (for language developers: have the language-data writer write it explicitly in the .lsx file)

1 KB (205 words) - 18:36, 15 November 2017
Dictionary maintenance
...an, Portuguese, there is support for generating a particular standard of a language (e.g. Brazilian Portuguese, Valencian). The way this is done may need to be * Language specific sections of monodix files.

3 KB (461 words) - 15:31, 26 September 2016
Apertium-apy/Debian
* you get to decide what kinds of crazy half-finished language pairs to serve (or you can just serve a few of the high-quality ones that y Now install APY and the language pairs you want:

5 KB (653 words) - 21:00, 2 April 2021
Morphological dictionary
...ary is to model the rules that govern the internal structure of words in a language. ...o begin with, some terminology; if you are familiar with graphs (as in the data structure), this might help. A finite-state automaton can be visualised as

15 KB (2,200 words) - 12:04, 6 October 2014
Apertium-apy/Fedora
* you get to decide what kinds of crazy half-finished language pairs to serve (or you can just serve a few of the high-quality ones that y Now install APY and the language pairs you want:

5 KB (640 words) - 21:02, 2 April 2021
Ideas for Google Summer of Code/Automatic diacritic restoration
...ter], which has been trained for more than 100 languages using web crawled data. Details are in his paper linked below. You can try the system [http://l ...issue is to optimize smoothing of the statistical models on a language-by-language basis.

2 KB (307 words) - 19:50, 24 March 2020
Getting started with induction tools
Choose a language pair. For this example, it will be Italian (it) and English (en). To use ...ing a few million lines of xml. It will refer frequently to s1 (the first language of the two in the filename jrc-lang1-lang2.xml, which is jrc-en-it.xml in t

7 KB (973 words) - 02:52, 20 May 2021
Bytecode for transfer/Evaluation
This is a test of all .t1x files in all language pairs in http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/. There Please find your favorite language pair below and check.

67 KB (9,057 words) - 06:52, 24 September 2013
Corpora formats
A corpus should be easily parsed by software that needs to get data from it. There is also metadata that should be stored in the corpus, and t * language of content (per article)!

5 KB (813 words) - 00:08, 28 December 2011
PMC proposals/Stable version of apertium-sh-sl
The language pair seems to work OK in the sh→sl sense but not so well in sl→sh (appa Improving this language pair would be nice for the first milestone of the project Abu-MaTran (June

2 KB (380 words) - 22:26, 3 August 2013
Recursive transfer
** Learn shift/reduce using target-language information ? *: If a language uses CG, the rule SN -> @A→ @N would only match where CG mapped @A→ (an

5 KB (788 words) - 10:50, 9 February 2015
Bilingual dictionary discovery
...se dictionaries where each node is a word in a language, and each arc is a language pair. For example like: http://i.imgur.com/SFOsRMv.png * Only one word per input language

3 KB (487 words) - 00:02, 22 March 2018
Ideas for Google Summer of Code/Plain-text formats for Apertium data
...article/download/3355/1843 . I ([[User:Mlforcada|Mlforcada]]) believe this language is much easier to write; it should be upgraded and documented. The preproce [[Category:Ideas for Google Summer of Code|Plain-text formats for Apertium data]]

2 KB (324 words) - 11:37, 16 February 2016
Prerequisites for Mac OS X
...he parts about lttoolbox/apertium, just install the language pair/language data itself if you ran [https://apertium.projectjj.com/osx/install-release.sh in

2 KB (355 words) - 19:36, 12 May 2019
Sardo e italiano/Rapporto finale
...ioni possibili) e assicura che tutte abbiano un'equivalente nella ''Target Language''. Il risultato migliore sarebbe che non ci sia nessun errore nel Testvoc. ...Testvoc riguardano il verbo "stare". Non crediamo che siano errori "reali" data l'impossibilità nel riprodurli.

13 KB (1,910 words) - 11:34, 23 August 2016
Modes introduction
...e ('lt-toolbox', 'apertium-lex-tools') is a collection of tools which pipe data one to another. You can use these tools individually. There are many instru However, to ease the use of the tools, Apertium language-builds pre-configure chains of tools into scripts. These pre-configured cha

6 KB (992 words) - 17:25, 22 September 2016
English and Catalan/GSOC 2017
The lack of documentation regarding the language pair, the monolingual dictionaries or even the tagger has made me put an ef ...r to create wikitables with a lot of information about transfer rules from data embedded into the rule files (T1X, T2X and T3X). There are other scripts th

5 KB (887 words) - 22:24, 31 August 2017
Bylaws/Draft
...modular, documented, open platform for machine translation and other human language processing tasks</li> <li>To favour the interchange and reuse of existing linguistic data.</li>

8 KB (1,215 words) - 18:14, 3 March 2018
Lttoolbox-java
The Java port needs the C++ binaries for preparing/developing a language pair, i.a. to compile transfer files and train the tagger. ...ed Apertium JAR file, only dependent on JRE and an additional JAR file per language pair.

9 KB (1,370 words) - 09:49, 7 April 2020
By-laws/Draft
...modular, documented, open platform for machine translation and other human language processing tasks</li> <li>To favour the interchange and reuse of existing linguistic data.</li>

9 KB (1,356 words) - 18:34, 3 March 2018
Minimal installation from SVN
...[How to bootstrap a new pair]]. For existing pairs, see [[Install language data by compiling]],

717 bytes (103 words) - 22:05, 7 March 2018
Apertium Android
It requires internet permission to enable users to download language pairs (and developers to showcase their work from a phone). * language detection - for example using https://code.google.com/p/language-detection/

3 KB (449 words) - 01:06, 4 June 2020
Google Summer of Code/Report 2010
...stem. Currently the transfer system becomes the main bottleneck in case of language pair with complex transfer systems because of the XML processing associated ...he very moment the user inserts or deletes text. This allows for a further data mining on the edits to detect commonly modified structures in a given trans

16 KB (2,571 words) - 12:21, 20 June 2019
Apertium-arz-ara
...lect Corpus and Lexicon. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018.] :* A Parallel corpus of arz-ara-apc/ajp (2,994 sentences). The data was manually translated by professional translators. Sentences are collecte

2 KB (192 words) - 11:11, 19 January 2022
Bylaws
...modular, documented, open platform for machine translation and other human language processing tasks</li> <li>To favour the interchange and reuse of existing linguistic data.</li>

8 KB (1,214 words) - 22:30, 3 August 2013
Documentation for integrating Tesseract (OCR) into Apertium
! Language ..., the package 'spa' shown [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files here<sup>3</sup>], to be able to identify by the app texts in Spanish

3 KB (450 words) - 16:23, 10 December 2018
Polish and Russian/Project description
'''Corpora and language data'''

4 KB (570 words) - 18:43, 23 August 2016
ReTraTos
...dictionaries and transfer rules. The induction systems and open linguistic data can be used with the [[Apertium]] toolbox to build open-source MT systems. ...bes how to use ReTraTos to create a bilingual dictionary for your Apertium language pair. You will need:

8 KB (1,253 words) - 09:42, 6 October 2014
Ideas for Google Summer of Code/Appraise gisting
Many language pairs in Apertium are unique, such as Breton-French, and many of them are u * Contact [User:mlforcada Mikel L. Forcada] to obtain the data cited in the paper.

2 KB (238 words) - 19:49, 24 March 2020
Format handling
Other deformatters and reformatters were written directly in C or C++ language without using XML files. So, they don't follow format specification descri ...ated from a format specification in XML. Rules for format, like linguistic data, are specified in XML, and they contain regular expressions with flex synta

13 KB (1,781 words) - 09:49, 6 October 2014
XML editors
If you are editing Apertium language data (e.g. [[dix]] and [[transfer]] files), you should use a real XML editor. Th

5 KB (783 words) - 14:25, 29 December 2020
Command line
...m_New_Language_Pair_HOWTO]] – using lt-comp, lt-proc etc. to test language data

443 bytes (64 words) - 16:56, 27 April 2017
Installation/Developers
If you want to work on Apertium language data and/or tools, you most likely want to use the binaries from Tino Didriksens

2 KB (279 words) - 20:52, 2 April 2021
Migrating tools to GitHub
Apertium has migrated all the language data, the core, and a few tools to [https://github.com/apertium GitHub]. Many to

1 KB (215 words) - 04:45, 9 March 2018
Debian için Gereksinimler
...şmak'' istiyorsanız [[Minimal installation from SVN|check out the language data from SVN]] sayfasını okumalı ve derlemelisiniz ( Hala apertium/lttoolbox

2 KB (313 words) - 21:02, 2 April 2021
Setting up a build environment for a language pair
* All the language data files:

972 bytes (144 words) - 12:09, 26 September 2016
Lint
* Maintain consistency in the data present in the <r> tag in pardef entries. ...h modes.xml present in the same directory as the other files for the given language pair, this function checks and prompts incase a file defined in a program.

9 KB (1,459 words) - 19:41, 15 May 2021
Compiling the language pair
== Compiling the language pair == If you don't need to work on monolingual data use the nightly repos:

1 KB (163 words) - 16:53, 28 May 2017
Apertium-kir
'''Kymorph''' is a morphological analyser/generator for the [[Kyrgyz language]], currently working. It is intended to be compatible with transducers for # Get CG3 format of conllu data

1 KB (218 words) - 14:51, 24 April 2024
Integration and tagset conversion with Giellatekno
...air setup nowadays is using transducers from Giellatekno and pair-specific data in Apertium. This is a tricky set up because there is a lot of machinery ar

6 KB (984 words) - 17:56, 12 March 2016
Code style
* Prefer containers over home made data structures. It's going to make it impossible to build for language pair authors.

5 KB (823 words) - 15:40, 26 September 2016
CG tagging hybrid and tagger improvements/Work plan
...held back validation scripts for a few languages & give them reproducible language models ...Note that averaged here refers to averaging over time so that new training data isn’t given too much weight.

2 KB (254 words) - 15:18, 13 June 2016
Uputstvo za novi jezički par za Apertium
...ima leme su povezane s paradigmama koje nam dozvoǉavaju da opišemo kako se data reč meǌa bez pisaǌa svakog pojedinačnog nastavka. ...to see. In Serbo-Croatian this is videti. Serbo-Croatian is a null-subject language, this means that it doesn't typically use personal pronouns before the conj

26 KB (4,259 words) - 07:00, 16 February 2015
Ideas for Google Summer of Code/superblank handling algorithm
Then, ''after'' reordering (for instance, into a Turkic-style language) to generate ''sister my Wales in lives'', ** I disagree. One of the key aspects of "my way" is that non-textual data between block tags are ''not'' sent through the translation chain at all, m

9 KB (1,486 words) - 19:56, 24 March 2020
Metadix
Metadixes are currently used in some language pairs, such as English-Catalan and Occitan-Catalan. linguistic data are compiled these dictionaries are pre-processed, so

5 KB (744 words) - 08:26, 25 April 2016

Retrieved from "https://wiki.apertium.org/wiki/Special:Search"

Navigation menu