Search results

Language pair packages
'''Language pair packages''' are standalone JARs that can be run independently as well Since JAR files are nothing but renamed ZIP files, you can easily edit language pair packages to fit your needs. Note that the packages are ready to be use

11 KB (1,497 words) - 08:23, 7 April 2020
Germanic languages
...ogue.com/subgroups/germanic gem]) constitute a branch of the Indo-European language family spoken primarily in Europe, Anglo-America and Australasia. The commo ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.

32 KB (3,684 words) - 06:16, 28 December 2018
Odia
...s one of the official languages of India, and has around 33 million native language speakers globally. .../ktpress.org.in/pdf/evolution_of_oriya_language.pdf The Evolution of Oriya Language and Script], ''Utkal University, Cuttack,''

13 KB (1,770 words) - 06:56, 3 December 2017
Ideas for Google Summer of Code/lint for Apertium
Make a program which tests Apertium data files for suspicious or unrecommended constructs (likely to be bugs). Some ...x]] (dix) dictionary data, perhaps also transfer rules. The [[Apertium New Language Pair HOWTO]] should introduce most of the terminology and background you ne

5 KB (789 words) - 10:36, 31 May 2016
Google Summer of Code/Application 2008
...cant] (Alacant, Spain); the other one is [http://www.prompsit.com Prompsit Language Engineering]. These two organizations are currently responsible for most of ...systems to translate less-closely related languages. We have 10 published language pairs, and three more currently in development.

8 KB (1,255 words) - 19:50, 12 April 2021
Translating mnemonic files
...the mnemonic (starting on the first column) must be kept unchanged from a language to another, while the string farther to the right is translated. By defaut, as for lttoolbox, apertium, and the language pairs, the installation is done in <code>/usr/local/bin</code> and <code>/u

5 KB (789 words) - 12:16, 15 June 2018
Translation quality statistics
...r/>words !! data-sort-type="number"|WER !! data-sort-type="number"|PWER !! data-sort-type="number"|BLEU !! Reference / Notes ...forms that get some analysis, may give an indication of the maturity of a language pair.

9 KB (1,233 words) - 09:10, 21 November 2021
Javanese
...Javanese language]]) is an [[Wikipedia:Austronesian languages|Austronesian language]] from Indonesia, spoken by the Javanese people from the central and easter Its language code is '''jv''' and '''jav'''.

7 KB (881 words) - 13:11, 12 December 2018
Ideas for Google Summer of Code/Apertium African
...e language pairs (which haven't been started or have currentlu very little data in Apertium) and write an usable version which provides intelligible output * If there is some data for the language pair in the Apertium Github server, check it out and install it.

2 KB (238 words) - 13:45, 24 February 2023
Crossdics
...guage pairs <code>aa-bb</code> and <code>bb-cc</code> it will create a new language pair for <code>aa-cc</code>. * '''sl-tl''': source language (sl) and target language (tl).

5 KB (633 words) - 13:29, 6 October 2017
PMC proposals/Apertium Workshop in Russia
...eof, and following that the development of a prototype pair for a minority language of Russia. Russia has a long history of work in machine translation, but ve ...h oil, as Tatarstan and Sakha) students with good knowledge of a minorised language seldom have a computer and/or access to the internet. That is the case at l

18 KB (2,991 words) - 22:24, 3 August 2013
PMC proposals/Move Apertium to Github
* Individual repos for each pair, language module, and tool (preserving all commit history). ...ch|talk]]) 13:04, 7 February 2018 (CET) To install apertium and one or two language pairs, you (just) have to follow few wiki pages and then, you get the only

22 KB (3,325 words) - 14:06, 12 March 2018
Удмуртско-русский переводчик
...D0%BE%D1%81%D1%81%D0%B8%D0%B8 Šupaškar Apertium Workshop]. Russian part of language pair was created using [[lttoolbox]], and all files, needed for Russian, we === Some data ===

3 KB (299 words) - 06:39, 30 January 2012
Specific resources per language
...tps://apertium.github.io/apertium-on-github/source-browser.html. It houses language pairs which haven't completely matured and are under work. ==Specific resources per language==

10 KB (1,336 words) - 20:40, 11 December 2019
Lexical feature transfer - First report
for every sentence s in the source language corpus: for every sentence in the source language corpus:

6 KB (838 words) - 17:47, 25 July 2012
File names
Apertium has some naming conventions for the various files used in language data: Files compiled when you do "make" in a language pair:

890 bytes (126 words) - 10:10, 14 March 2017
Resources
{{see-also|Incubator|Specific resources per language}} ...Pair HOWTO|making a language pair]], feel free to make a new page for the language in question and paste it there. Stuff like basic dictionaries, paradigms, r

1 KB (164 words) - 05:20, 4 December 2019
UDPipe
;Get some data! Now try it on your own data.

5 KB (822 words) - 19:43, 9 March 2020
Semantic tagging
== Data sources == * Often a word can be disambiguated using its translation in another language, for example the triple (estació, gare, station) defines a building meanin

5 KB (949 words) - 15:27, 15 June 2020
Prerequisites for RPM
...t plan on working on the core C++ packages (but only want to work on / use language pairs), you can install all prerequisites with yum/zypper, using [[User:Tin For a list of available language pairs and other packages, see https://build.opensuse.org/project/show/home:

1 KB (231 words) - 10:03, 12 January 2022
Install Apertium core by compiling
...you have something, immediately, it to try invoke a tool. Without language data you can't see a translation, but you can see the help. Try, ...language data by compiling]]. Or, if your system has packaging, download a language package (but beware, a package manager may pull in a old package of Apertiu

5 KB (821 words) - 02:55, 27 July 2022
Why we trim
...erator.<ref>Typically this goes for both translation direction, although a language pair only released for one direction might only be trimmed in that directio ...at when post-editing, the post-editor has to constantly look at the source language text (whereas an unknown word would be possible to translate there and then

4 KB (679 words) - 16:06, 3 May 2020
Курсы машинного перевода для языков России/Session 7
...ly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The In contrast to many other types of systems for natural language processing — such as morphological analysers and part-of-speech taggers,

18 KB (2,490 words) - 12:00, 31 January 2012
Helsinki Apertium Workshop/Session 7
...ly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The In contrast to many other types of systems for natural language processing — such as morphological analysers and part-of-speech taggers,

18 KB (2,493 words) - 08:39, 10 May 2013
Plugin for Pidgin
...eir buddies (both incoming and outgoing messages). If the user has set the language pair eng-spa (English → Spanish) for incoming messages from buddy1, th *'''/apertium_check''' Shows the current language pairs associated with the buddy whose conversation you issued the command o

8 KB (1,263 words) - 02:18, 9 March 2018
Tartu Apertium Course/Session 7
...ly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The In contrast to many other types of systems for natural language processing — such as morphological analysers and part-of-speech taggers,

18 KB (2,493 words) - 10:59, 30 October 2015
Hectoralos/GSOC 2019 proposal: Catalan-Italian and Catalan-Portuguese
I’m a sociolinguist working on language maintenance and shift. I'm very interested in creating resources for minori '''1.2 Bring a released language pair up to state-of-the-art quality''': I'd like to improve the pairs Catal

16 KB (2,285 words) - 06:46, 12 April 2019
Monodix basics
...u can distinguish an element from an attribute and can recognise character data. If you want a quick recap, this should help: :<element attribute="value">character data</element>

11 KB (1,851 words) - 07:42, 16 February 2015
Apertium-quality/Quickstart
...t. It most likely won't let you in order to guarantee the integrity of the data. Morph testing isn't supported by the language we're using, but it is as simple to run as regression testing. One simply r

12 KB (1,931 words) - 17:06, 24 October 2018
Uighur and Turkish/Paper
...Machine Translation] - This looks interesting, 200K sentences of bilingual data collected, we should contact the authors to see if we can access it [https: ...eb interface [http://nmt.cloudtrans.org/ here], but unclear wrt details of data/evals [https://scholar.googleusercontent.com/scholar.bib?q=info:A6cMdf1SuHw

10 KB (1,483 words) - 07:00, 14 August 2018
Press
Websites referencing Apertium categorised by language of the website. News about Apertium categorised by language of report.

13 KB (1,689 words) - 21:42, 28 February 2021
How to bootstrap a new pair
...ium-init to bootstrap a new language pair (optionally with new monolingual data packages as well). ...is script in your working directory where you will be downloading language data. You can get the script from https://apertium.org/apertium-init

5 KB (824 words) - 15:30, 20 April 2021
Google Code-in/Application 2013
...m project develops a free/open-source platform for machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...eloped around the world, both in universities and companies (e.g. Prompsit Language Engineering) and by a growing numbers independent free-software developers.

6 KB (1,057 words) - 15:34, 28 October 2013
Mongolic languages
!rowspan=2| Language ==Existing language pairs==

5 KB (538 words) - 15:52, 11 April 2015
Indonesian
...ipedia:Indonesian language]]) is an Austronesian language and the official language of Indonesia. Since it is a register of [[Malay]], it is also often general In [[Apertium]], there is a language pair of [[Indonesian and Malaysian]] already in the [[Trunk|trunk category]

5 KB (629 words) - 13:08, 21 December 2019
Traductions en français
| width=320 | '''[[Apertium New Language Pair HOWTO]]''' | [[Become a language pair developer for Apertium]]

13 KB (1,601 words) - 23:31, 23 July 2021
Sudo
If you're working on language data, <code>sudo</code> is pretty much only for running package managers like <c ...exception is <code>sudo make install</code>, but when working on language data you should never have to do this.

856 bytes (144 words) - 12:52, 3 May 2018
Google Code-in/Application 2014
...rs independent free-software developers. There are currently 40 published language pairs within the project (including a number of "firsts" — for example Sp ...ommunication) often occurs at this age, and if we can show them that their language is useful, and other people care, and there is no barrier for its use in th

6 KB (987 words) - 10:21, 7 November 2014
Interfaces
...e official web site – it serves only the ''released'' (stable) versions of language pairs ** This is the official "beta" site – it serves the latest work in all language pairs (so things may work better, but also may have weird bugs). You can al

3 KB (457 words) - 07:42, 18 June 2021
Daemon
...ecifies the parameters and data files specific to that language pair. Each language pair can contain a number of modes; most of these are used for debugging ea ...b server. We use apertium-nn-nb as an example, but it should work with any language pair; the modules lt-proc/cg-proc/apertium-{tagger,pretransfer,transfer,int

13 KB (2,039 words) - 11:56, 3 June 2022
Top tips for GSOC applications
...ding period — and for documentation. Anyone thinking of working on a language pair should make sure that they read about [[testvoc]] and other quality co ...all]] Apertium and a language pair; read through the [[:Category:HOWTO|new language pair HOWTO]]. This might even give you some more ideas!

9 KB (1,509 words) - 23:51, 27 February 2023
Travis settings for Apertium
...thub. What this actually means is that you can set an apertium language or language pair on github to automatically build and test on each commit. You only nee This is an example for a monolingual data using hfst (from [apertium-fin]):

2 KB (249 words) - 06:26, 27 May 2021
Apertium-tki
Apertium language data for Iraqi Turkmen. [[Category:Language data]]

1 KB (144 words) - 20:07, 15 July 2021
Apertium Nieuw talenpaar HOWTO
...temen kan maken. Het enige wat je zelf moet doen, is de data schrijven. De data bestaat uit 3 belangrijke delen, de woordenboeken, en enkele regels (woordv ...ems van de oorspronkelijke taal(source language='sl')of de doeltaal(target language='tl') kan kiezen en veranderen.

36 KB (5,761 words) - 14:34, 4 December 2011
Nieuw talenpaar maken
...temen kan maken. Het enige wat je zelf moet doen, is de data schrijven. De data bestaat uit 3 belangrijke delen, de woordenboeken, en enkele regels (woordv ...ems van de oorspronkelijke taal(source language='sl')of de doeltaal(target language='tl') kan kiezen en veranderen.

36 KB (5,767 words) - 07:07, 16 February 2015
Морфологический трансдуктор русского языка
...textbook distinction in language, isn't it? When you start exploring real data the boundaries fade very fast and everything looks much more complicated.

22 KB (2,150 words) - 20:21, 24 April 2013
UD annotatrix/UD annotatrix at GSoC 2017
...statistical parser, which in turn can serve different purposes of natural language processing. For creating a good treebank, manual annotation and/or disambig ...interface allows to work with CoNLL-U and CG3 formats, and to convert the data between the formats. It also allows to either upload or paste corpora in pl

6 KB (930 words) - 15:59, 29 August 2017
Integrating Tesseract OCR into Apertium
...d of existing trained models. Successful tries are saved into new training data.<ref>https://static.googleusercontent.com/media/research.google.com/en//pub ...butions can also be found [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files-Contributions here].

2 KB (305 words) - 14:36, 28 October 2018
Apertium-init
...er]] or [[CG]] files. It creates fully working Makefiles and stub language data, so you can compile and test straight away (assuming you've [[Installation|

744 bytes (108 words) - 20:38, 13 January 2021
Bugzilla
| 64 || Apertium-tolk should give proper warning when no linguistic data is installed || 2008-03-31 || Wynand Winte ...rg/cgi-bin/bugzilla/index.cgi here]. Please feel to report your bug in any language you are comfortable with.

12 KB (1,254 words) - 22:08, 7 March 2018

Search results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools