Search results

Page title matches

Install language data using packaging
...f the big language data sets. You do not want to add to or modify language data, you want to use it. '''Data may be outdated''', use only for system assessment. See the main sec

3 KB (445 words) - 12:38, 24 April 2017
Install language data by compiling
...]. The instructions are very different. This page is for existing language data. ...mar or HFST. If that happens, follow instructions under [[Install language data by compiling#Missing dependencies | missing dependencies]].

5 KB (843 words) - 19:44, 2 March 2023

Page text matches

Generating lexical-selection rules from a parallel corpus
* an Apertium language pair Make a folder called data-en-es. We are going to keep all the generated files there.

15 KB (2,206 words) - 13:58, 7 October 2014
Generating lexical-selection rules from monolingual corpora
* A language pair (e.g. apertium-br-fr) ** The language pair should have the following two modes:

12 KB (1,634 words) - 18:26, 26 September 2016
Learning rules from parallel and non-parallel corpora
Your language pair should be fully set up in the direction that you're training for, and * an Apertium language pair

14 KB (2,181 words) - 19:01, 17 August 2018
Talk:Task ideas for Google Code-in
...SVN made it so this script which is very handy for downloading an Apertium language/pair doesn't fetch the newest packages anymore. This also means that beta.a |description=Currently, some Apertium pairs/language modules use CI but it's very inconsistent and doesn't come by default. Aper

397 KB (52,731 words) - 11:22, 10 December 2019
Running the monolingual rule learning
* Train a target side language model (http://hermes.fbk.eu/people/bertoldi/teaching/lab_2010-2011/img/irst * The language pair must support the pretransfer and multi modes. See apertium-sh-mk/modes

4 KB (503 words) - 19:01, 17 August 2018
Contributing to an existing pair
This is a guide on how to add linguistic data directly to an existing language pair in Apertium. It gets a bit technical – if you just want to notify us ...t-of-speech tagger, which is in charge of the disambiguation of the source language text.

50 KB (7,915 words) - 00:04, 10 March 2019
Using linguistic resources
...iew to the kind of data and resources that can be useful in building a new language pair for Apertium, and how to go about building them if they do not already Each Apertium language pair requires 3 dictionary files. For instance, for the English-Afrikaans

13 KB (2,112 words) - 12:11, 26 May 2023
User:Srbhr/GSOC 2020 Proposal: Automatic PostEditing
== Title: Automatic Post-Editing/Improving Language Pairs by Mining Post-Edits == ...stha University, New Delhi. I'm interested in Machine Learning and Natural Language Processing, and always seek to find ways to improve stuff based on them. I

23 KB (3,704 words) - 17:29, 30 March 2020
Talk:Ideas for Google Summer of Code
...'Rationale''': One of the most frustrating things when developing a new language pair is that you have to get the tags ''just right'' in order to be able to ...adequate translation. For instance, you might want to know in an ergative language if an absolutive is subject or object while translating. A shallow function

71 KB (10,374 words) - 21:14, 18 January 2021
User:Ksnmi/Application
...interested in Language, and machine translation is a part of handling the language change. I have been working with understanding both theoretically as well a ...the kind of projects whose implementation will help translation on all the language pairs on apertium at the end.

32 KB (5,064 words) - 09:19, 21 March 2014
Target-language tagger training
...nguage (<code>SL</code>) will be trained using information from the target language (<code>TL</code>). ==Language pair==

11 KB (1,470 words) - 08:16, 8 October 2014
User:Shash42/GSoC 2020 Proposal: Bilingual Dictionary Discovery
I am interested in Natural Language Processing, Deep Learning, and Applied Math. I am fascinated by computation ...s inherently targeted at low-resource, similar yet mutually unintelligible language-pairs. It is a dream organization for me because I get to contribute to a c

24 KB (3,699 words) - 22:49, 31 March 2020
Ideas for Google Summer of Code
...converted or expanded in the [[incubator]]. Consider doing or improving a language pair (see [[incubator]], [[nursery]] and [[staging]] for pairs that need wo == Language Data ==

23 KB (3,198 words) - 09:15, 4 March 2024
User:RomanZegarski/GSoC2011 proposal
...between languages and creating rules making possible to translate from one language to another one is intriguing process. ...allows to pass information regardless of language in which was created and language known by person retrieving it. Even if translation isn't perfect it gives a

7 KB (1,021 words) - 19:30, 7 April 2011
User:Deltamachine/proposal2018
<li>Theory of Language (Phonetics, Morphology, Syntax, Semantics)</li> <li>Language Diversity and Typology</li>

16 KB (2,445 words) - 09:19, 26 March 2018
Task ideas for Google Code-in (2013)
...language pair XX-YY by adding 50 words to its vocabulary || Add words to language pair XX-YY and test that the new vocabulary works. [[/Add words|Read more]] ...language pair || Add or correct a structural transfer rule to an existing language pair and test that it works. [[/Add transfer rule|Read more]]... || [[User

68 KB (10,323 words) - 15:37, 25 October 2014
User:Mjaskowski
I know as well how much time does one need to learn yet another language. I can only imagine problems arising if one wants to learn a language which is not as popular as languages I have mentioned above (for example, t

19 KB (3,209 words) - 18:45, 9 April 2010
Languages Of Russia
...of any language in Russia in areas smaller than the Federal Subjects. The data is in Russian and comes from the official 2010 Russian Census website. ===Complete guide to accessing the data===

3 KB (561 words) - 17:58, 14 January 2018
Install language data using packaging
...f the big language data sets. You do not want to add to or modify language data, you want to use it. '''Data may be outdated''', use only for system assessment. See the main sec

3 KB (445 words) - 12:38, 24 April 2017
The quick and dirty guide to making a new language pair
...rtium machine translation system from scratch. You can check the [[list of language pairs]] that have already been started. ...translation systems. The only thing you need to do is write the data. The data consists, on a basic level, of three dictionaries and a few rules (to deal

19 KB (3,164 words) - 20:58, 2 April 2021
User talk:Muki987
The Chinese police, according to official data, 1,317 people detained. `According to official data the Chinese police detained 1,317 people.'

85 KB (13,901 words) - 20:42, 19 June 2009
Install language data by compiling
...]. The instructions are very different. This page is for existing language data. ...mar or HFST. If that happens, follow instructions under [[Install language data by compiling#Missing dependencies | missing dependencies]].

5 KB (843 words) - 19:44, 2 March 2023
Using Giellatekno Divvun spellers with LibreOffice-Voikko on Debian
...on Ubuntu/Debian, using the Voikko plugins and Giellatekno/Divvun language data. ==Install the language data==

4 KB (596 words) - 21:02, 2 April 2021
Task ideas for Google Code-in
|title=Add recursive transfer support to a language pair that doesn't support it |description=Make a branch of an Apertium language pair that doesn't support recursive transfer and call it "recursive transfe

32 KB (4,862 words) - 06:23, 5 December 2019
Installation
* https://apertium.org is the official site, and offers all the released language pairs ...Apertium platform, and also offers a simple web interface to the released language pairs

6 KB (848 words) - 12:51, 1 April 2024
Apertium-apy
...rtium.org page uses an installation which currently only runs ''released'' language pairs (also available from https://apertium.org/apy if you prefer). However $ curl -G --data "lang=kir&modes=morph&q=алдым" https://beta.apertium.org/apy/analyse

37 KB (5,132 words) - 16:36, 5 June 2020
Grfro3d/proposal apertium cat-srd and ita-srd
...chine translation to understand the general meaning of the text in foreign language. The other approach is instead that of "dissemination" in which the MT is a ...(coding and decoding), data (linguistic data) and support tools to convert data and make them compatible with the engine. Even if most RBMT systems are pri

21 KB (3,171 words) - 14:34, 3 April 2017
User:Rcrowther
</ref> and language data on your system (developers may also want to consider their operating enviro ==== For translators: Install language data/dictionaries/pairs from repositories ====

4 KB (643 words) - 12:55, 24 April 2017
Entraînement d'un tagueur de langue cible
[[Target-language tagger training|In English]] ...t changez les variables <code>DATA</code> et <code>DIRECTION</code>. <code>DATA</code> doit pointer vers le répertoire contenant les données de la paire

12 KB (1,625 words) - 08:20, 8 October 2014
Languages
...epository scheme. (Originally, all monolingual language data was found in language pairs, meaning that there was a lot of duplication.) If you feel something ...hat constitutes a minimally-useful language package; generally, however, a language package should have over 60% coverage on a variety of corpora and should pr

15 KB (1,783 words) - 22:33, 1 February 2019
Installation troubleshooting
====When running configure script for language pair data==== ====Workaround when language pairs need updated configure.ac's====

20 KB (3,153 words) - 08:13, 24 May 2019
User:Khannatanmai/GSoC2020Proposal Trimming
...ence if one learns to create good tools for MT, they learn most of Natural Language Processing. A tool which is rule-based and open source really helps the community with language pairs that are resource- poor and gives them free translations for their ne

30 KB (4,918 words) - 16:55, 31 March 2020
Running the MaxEnt rule learning
DATA=/home/philip/Apertium/gsoc2013/monolingual/data ...atterns-frac-maxent.py $DATA/setimes.sh-mk.freq $DATA/setimes.sh-mk.ambig $DATA/setimes.sh-mk.annotated > events 2>ngrams

3 KB (520 words) - 21:25, 14 February 2014
User:Shraier/GSoC2012-Application1
...translation makes you notice these little differences by making the cross-language variation explicit. Since I have been involved in the translation area for ...of the field to refine the results of the automatically produced data. All data is organized in the XML files, which are humanly readable and editable. I b

14 KB (2,289 words) - 11:27, 6 April 2012
User:Khannatanmai/GSoC2019Proposal
...ence if one learns to create good tools for MT, they learn most of Natural Language Processing. A tool which is rule-based and open source really helps the community with language pairs that are resource- poor and gives them free translations for their ne

26 KB (4,048 words) - 18:50, 18 March 2020
Transfer rules examples
...to be translated. For example, HTML tags must not be translated in another language, but only the text of the Web page. ...e same software are used for every language pairs. It is the format of the data to be translated which will take to use a particular deformatter.

58 KB (8,365 words) - 20:16, 26 June 2018
Documentation of Matxin 1.0
Owing to the different syntactic structure of the phrases in each language, some Although the details of the modules and the linguistic data is presented in

58 KB (8,964 words) - 11:11, 14 May 2016
Flyer
...Iberian peninsula, but is now being used to translate between more distant language pairs. ...ngineering ([http://www.prompsit.com http://www.prompsit.com]). Linguistic data are being developed by Transducens, the Seminario

26 KB (3,122 words) - 06:25, 27 May 2021
User:Shraier/GSoC2012-Application2
...translation makes you notice these little differences by making the cross-language variation explicit. Since I have been involved in the translation area for ...of the field to refine the results of the automatically produced data. All data is organized in the XML files, which are humanly readable and editable. I b

14 KB (2,245 words) - 11:33, 6 April 2012
Publications
...ngsnes (ed.) Bauta: Janne Bondi Johannessen in memoriam, Oslo Studies in Language 11(2), 2020. 489–501. (ISSN 1890-9639 / ISBN 978-82-91398-12-9) ...system/files/swj1419.pdf The apertium bilingual dictionaries on the web of data]. Semantic Web, 9(2), 231-240.

33 KB (4,418 words) - 11:52, 29 December 2021
Workflow reference
...tion of each module with more precision. They may also introduce technical language which linguists and/or computer coders would use. The technical description References to 'xxx' and 'yyy' refer to a language code, for example 'en-es'; 'English' to 'Spanish'.

29 KB (4,687 words) - 16:28, 5 June 2020
User:Aha/GsocApplication
...translation, being a sub-field of NLP, enables to explore the grammar of a language and deal with it from a computational perspective. I really like the idea o ...vated community like Apertium's it is possible to accomplish such numerous language-pair translation. The project supports both widely spoken languages and min

11 KB (1,672 words) - 20:56, 9 April 2010
Finding numbers of speakers from the Russian census
...of any language in Russia in areas smaller than the Federal Subjects. The data is in Russian and comes from the official 2010 Russian Census website. Here are the steps to access the data:

2 KB (296 words) - 21:12, 13 January 2018
User:Deltamachine/proposal2017
<li>Theory of Language (Phonetics, Morphology, Syntax, Semantics)</li> <li>Language Diversity and Typology</li>

13 KB (2,187 words) - 09:56, 23 March 2018
User:Saswata Bose/GSoC2024Proposal
* Apertium allows one, as a language lover, to work very closely on a language both from a linguistic and a computational perspective. I, being a Research * ''Interested Task: '' Add a new variety to an existing language

9 KB (1,397 words) - 15:31, 2 April 2024
Pairviewer
...//d3js.org/ D3.js] tool that depicts all Apertium [[list of language pairs|language pairs]] in an interactive graph initially developed sometime before the [[G === Updating language data by scraping ===

5 KB (702 words) - 01:34, 9 December 2018
Hindi
=== Language pairs === .../github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin] Linguistic data for the Apertium Urdu-Hindi machine translator

6 KB (806 words) - 00:45, 7 December 2018
Apertium New Language Pair HOWTO
'''Apertium New Language Pair HOWTO''' ...rtium machine translation system from scratch. You can check the [[list of language pairs]] that have already been started.

36 KB (5,933 words) - 16:14, 22 February 2021
User:Gang Chen/GSoC 2013 Application: "Sliding Window PoS Tagger"
...language is a very complicated system, and translating a sentence from one language to another has always been a challenging task. Meanwhile, the translation n ...are approximations to the real structural rules in a language system, many language pairs have proved a high translation quality. So hopefully, we can expect t

21 KB (3,340 words) - 10:56, 28 May 2018
User:Mfoat/GSoC 2012 Application
...periments and develops all the possible methods, tools and technologies of data processing in natural languages (NLP). Therefore, it can be viewed as some ...pporting mainly the European languages; first of all they are focused on a language pair including English. Besides, those systems are not open, which makes it

10 KB (1,535 words) - 09:26, 5 April 2012
User talk:Rlopez/Application
I am master student majoring in Natural Language Processing, and I like many tasks of this area. The machine translation is ...the third case (she'll or shell) the translation is ambiguous. The Trigram Language Model can help to resolve the ambiguity. I think that there are few similar

14 KB (2,151 words) - 13:35, 21 March 2014
User talk:Rlopez/Application GSoC-2014
I am master student majoring in Natural Language Processing, and I like many tasks of this area. The machine translation is ...the third case (she'll or shell) the translation is ambiguous. The Trigram Language Model can help to resolve the ambiguity. I think that there are few similar

14 KB (2,151 words) - 16:14, 21 March 2014
Easy dictionary maintenance
The number of language pairs in development for Apertium is increasing, and so is the complexity o language pairs. With better tools, more people will be able to develop language pairs.

29 KB (4,382 words) - 07:53, 6 October 2019
User:Maharaj/GSoC2024Proposal
Native Language: Bodo ...carried out. Rule-based translation system provides opportunities to write data and linguistic rules of languages.

9 KB (1,205 words) - 04:13, 2 April 2024
Freerbmt11
...he implementation of the algorithms must be free/open-source, but also the data themselves. Nowadays, there are many machine translation packages of this t ...morphologically rich languages, which even with large corpora suffer from data sparseness.

6 KB (905 words) - 17:26, 18 October 2010
Supervised tagger training
...-supervised.make this one] from en-eo. You will need modify it to fit your language pair. This usually means editing the first few lines. ===Tagger data directory===

3 KB (537 words) - 13:44, 18 June 2014
User:OmarKassem/Proposal
'''GSOC 2019 : Light alternative format for all XML files in an Apertium language pair'''[http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code#Light ...expression to its referring entity. This is an important task for natural language.

9 KB (1,494 words) - 00:52, 20 April 2019
Talk:PMC proposals/Allow some code under github.com/apertium
* linguistic data for the engine * tools for creating/learning/managing/serving linguistic data

5 KB (733 words) - 03:00, 10 August 2015
Sentence segmenting
|Language You will need to install NLTK and NLTK data. Unfortunately, they both only support Python versions 2.6-2.7. If you are

14 KB (2,232 words) - 12:51, 26 September 2018
User:Pmodi/GSOC 2020 proposal: Hindi-Punjabi
...esource languages gives the speakers of those languages access to valuable data and can help in several domains, such as education, news, judiciary, etc. T ...ecause of the level of understanding it provides instead of simply blaming data for poor results, it actually shows that it can perform better for low reso

27 KB (4,091 words) - 19:32, 4 June 2020
User:GD/proposal
<li> Morphology, Syntax, Semantics, Typology/Language Diversity </li> ...ngines. I think rule-based translation very promising if we provide enough data and an effective analysis 

9 KB (1,413 words) - 11:49, 6 May 2018
Talk:Documentation
2.2. Data Stream without format. ...s code1 and code2. The note about regression tests would be removed if the language pair has none of course.

7 KB (799 words) - 06:26, 27 May 2021
Become a language pair developer for Apertium
...uide on how to use a development version of Apertium to make a change in a language pair. ...ou should try this to make sure things work before you move on to whatever language pair you plan on working on.

10 KB (1,626 words) - 17:46, 13 January 2020
Mandarin Chinese
...http://wiki.apertium.org/wiki/Mandarin_Chinese#In_Apertium some linguistic data in Apertium]. ...fers to the most commonly spoken form of Chinese that is the sole official language of China and Taiwan. It is also known as Putonghua or Standard Chinese ([[W

16 KB (2,148 words) - 03:28, 16 December 2015
French
...mpire, as did all Romance languages. There are currently 4 released French language pairs ...the sixth most spoken language in the world and is the second most studied language worldwide.

15 KB (2,081 words) - 07:14, 12 August 2020
User:Aditya
...hich are mostly commercial. But Apertium is an open-source and uses single language-independent specification, to allow for the ease of contributing to Apertiu ...n work properly and Suggestion styling if people choose an unlikely source language

13 KB (2,022 words) - 18:40, 27 March 2018
Sardu abbarra bivu!
...MT based on corpora: adding new languages is very easy. To create a new language pair, in fact, it is not necessary to include corpora with millions of word ...airs can be added by creating dictionaries and rules containing linguistic data in XML format.

15 KB (2,339 words) - 00:41, 4 June 2018
User:Shraier/Application
...translation makes you notice these little differences by making the cross-language variation explicit. ...of the field to refine the results of the automatically produced data. All data is organized in the XML which is humanly readable and editable. I believe i

11 KB (1,778 words) - 17:20, 26 April 2011
User:Chy/Gsoc 2010 Application/Java port of Apertium
...orced to limit our knowledge due to localization issues. We use different language,characters to express our selves in day to day life. To tackle these issues ...a programmer, and last and foremost I am interested on learning linguistic data processing where Apertium is great application to start with.

6 KB (977 words) - 07:49, 20 April 2010
Frequently Asked Questions
...ind that are incorrectly translated, to getting involved in creating a new language pair or programming on tools or user interfaces. Here are some question fre Our language agnostic tools are native and written in [https://en.wikipedia.org/wiki/C++

7 KB (1,139 words) - 06:27, 27 May 2021
English and Kazakh
...are basically for Anel, Aizhan and Assem who have started to develop this language pair... And Aida too... === Download apertium, lttoolbox and eng-kaz data from SVN ===

20 KB (2,856 words) - 06:26, 27 May 2021
Google Summer of Code/Wrap-up Report 2009
...ll these language pairs. This means that the data can be re-used by other language projects (e.g. in developing spelling or grammar checkers, thesauri, etc). This project was accepted as part of our "adopt a language pair" idea

12 KB (1,917 words) - 15:54, 12 September 2009
User:Skh/Application GSoC 2010
program. My courses so far include formal languages, data structures and ...t system used internally at SuSE, I was working on the workflow definition language and the core workflow engine. http://swamp.sf.net 

15 KB (2,372 words) - 19:57, 8 April 2010
User:Francis Tyers/Apertium 4
== Linguistic data == * At least one state-of-the-art language pair (wrt. Google) using all available modules.

3 KB (447 words) - 12:12, 27 June 2020
GSOC'16 Kira's results. Apertium website improvements: Docs diff
*'''langpair''': language pair to use for translation curl -G --data "langpair=eng|spa&q=run" http://localhost:2737/dictionaryLookup

5 KB (712 words) - 21:27, 16 August 2016
User:Ksingla025/Application
...ollected a sample of 2000 tweets to analyze the common patterns, some chat data, and also made a literature survey to check for types of non-standard input sample data : https://docs.google.com/document/d/1fGFO6V-lKcvqgzaQRfxEfLWGXF6AxqTIODKda

5 KB (817 words) - 21:23, 14 March 2014
Farsi/About
...appear at the beginning of a sentence. The unique thing about the persian language though, is that they use prepositions which is quite uncommon in many SOV l ...designed a Two-sided morphology analyst of nouns and adjectives in Persian language, using Xerox Finite State Technology as giving input word (adjective or nou

16 KB (2,597 words) - 20:58, 12 January 2013
User:Darthxaher/Application2010
...ect (2009) titled '''''Conversion of Anubadok: Creating an English Bengali Language Pair''''' under Apertium. The project was a great experience for me. I had ...ing offered by Apertium will have far reaching effect in the local Bengali Language adoption and localization of open source softwares.

16 KB (2,533 words) - 01:16, 10 April 2010
User:Arghya1998/proposal
...achine translation can aid a lot of these problems and breaking the “language barrier” across not just the country and the globe and connect people The project would bring a lot of developers at ease. Python is a high-level language with a lot of features that make it easier to grasp for developers. Python

15 KB (2,338 words) - 14:45, 27 March 2018
User:Gang Chen/GSoC 2013 Progress
2. en-es language pair(for experiment) https://svn.code.sf.net/p/apertium/svn/branches/aper 3. es-ca language pair(for experiment) https://svn.code.sf.net/p/apertium/svn/branches/aper

14 KB (1,896 words) - 08:43, 7 October 2013
Apertium guide for Windows users
* Apertium language pairs .../engine of Apertium installed (including the requirement lttoolbox, but no language pairs yet).

9 KB (1,367 words) - 09:17, 26 May 2021
User:Blanda.alex
= Google Summer of Code 2012 Application - adopting a new language pair fr-ro = ...next year's final thesis I will be working on a project related to natural language processing and pattern recognition.

8 KB (1,170 words) - 00:36, 13 April 2012
User:Marcriera/Proposal2018
...en source project; it is also a very welcoming family of collaborators and language enthusiasts. After successfully participating in GSoC 2017 with Apertium an I am interested in upgrading several language pairs to ease future development and bring one of them (Romanian-Catalan) t

11 KB (1,500 words) - 15:44, 30 April 2018
Bilingual dictionary
...of the main five data files in any language pair (see also: [[Apertium New Language Pair HOWTO]]). ....dix'' where ''apertium-A-B'' is the name of the [[List of language pairs| language pair]]. For example file ''apertium-af-nl.af-nl.dix'' is the bilingual dict

7 KB (1,244 words) - 16:41, 17 March 2018
Task ideas for Google Code-in/Getting started
...getting new contributors to Apertium and to helping spread our passion for language technology. ...of other things, live in our '''[[subversion|svn repo]]'''. The language data is found in the following places:

7 KB (1,091 words) - 19:54, 12 April 2021
Anaphora resolution module
...olving the antecedent of the anaphors in text becomes essential in several language pairs. ...ge it to the correct anaphor''' using a macro in the transfer rules of the language pair. (t1x)

20 KB (3,107 words) - 21:13, 24 June 2022
User:Francis Tyers/Sandbox
...e bilingual dictionary, collocations (n-grams) are extracted from a source language corpus. * Translations are scored on a target language corpus. -- The target language model training corpora would need to be preprocessed in some cases, to, for

51 KB (7,047 words) - 08:49, 9 June 2011
Ankush/Application
...nders , specially for Indian Languages because we still do not have enough data ...oreign languages. I am specially interested in MT systems where the source language is English and the target languages are Indian Languages. It is impossible

6 KB (923 words) - 17:57, 3 April 2010
Unsupervised tagger training
First, make a directory called <code><lang>-tagger-data</code>. Put your corpus into there with a name like <code><lang>.crp.txt</c ...cifies how to generate the probability file. You can grab one from another language package. For <code>apertium-en-af</code> I took the Makefile from <code>ape

7 KB (1,177 words) - 08:34, 8 October 2014
Narimann/GSOC 2019 proposal: Kazakh-Turkish and Turkish-Kazakh
'''Track:''' Data Science Dynamic Language Interpreter implementation

8 KB (1,094 words) - 13:10, 14 April 2019
User:Khannatanmai/New Apertium stream format
...the stream, and in the future one can add any amount of information in the language models or the translation modules. Later you can see how this formalism loo ...ynamic. All current pipes will continue to work as-is, unmodified. All old data and files remain valid.'''

24 KB (4,167 words) - 09:20, 17 July 2020
Using Apertium spellers with LibreOffice-Voikko on Debian
==Install language module== A language module supporting spelling may be installed, either from our repository, or

3 KB (387 words) - 12:21, 26 September 2016
Assimilation Evaluation Toolkit
...ion of machine translation. The tasks consist of sentences in the original language, reference translation with keywords omitted and the machine translation of ...various { gap } in order to discover phenomena and patterns in the natural language.

9 KB (1,368 words) - 09:04, 23 April 2015
User:Aboelhamd/proposal
...y in Egypt. Recently I have been granted a scholarship to study masters in data science at Innopolis University in Russia. ...subjects I loved the most were artificial intelligence, machine learning, data mining and deep learning, and that's because of the great potential in the

18 KB (2,903 words) - 22:18, 8 April 2019
User:Violet
You should now have the core/engine of Apertium installed (but no language pairs yet). == Apertium Language Pairs Installation ==

5 KB (822 words) - 04:50, 23 November 2011
Talk:Google Summer of Code/Application 2019
...modular, documented, open platform for machine translation and other human language processing tasks * To favour the interchange and reuse of existing linguistic data.

15 KB (2,462 words) - 16:57, 31 January 2019
User:Rroychoudhury/GSoC 2020 Proposal
...: Undergraduate Researcher at Jadavpur University specialising in Natural Language Processing Professional Interests : Natural language Processing ,Computational Linguistics , Sentiment Analysis , Statistical a

16 KB (2,215 words) - 14:49, 31 March 2020
User:Lguyogiro/GSoC2023Proposal
...icient digital data, parallel, corpora, etc. to work well for alternative, data-hungry approaches. ...loper can track it down and fix it, instead of simply blaming bad training data.

9 KB (1,296 words) - 17:42, 4 April 2023
Helsinki Apertium Workshop/Session 0
...duce translations which are less fluent, but more preserving of the source language meaning. ...er and number between a determiner and head noun will remain in the target language output.

11 KB (1,519 words) - 06:51, 11 May 2013
Tartu Apertium Course/Session 0
...duce translations which are less fluent, but more preserving of the source language meaning. ...er and number between a determiner and head noun will remain in the target language output.

11 KB (1,519 words) - 18:27, 16 October 2015
Курсы машинного перевода для языков России/Session 0
...duce translations which are less fluent, but more preserving of the source language meaning. ...er and number between a determiner and head noun will remain in the target language output.

12 KB (1,464 words) - 12:00, 31 January 2012
Install quick tests
More convincing if you have a language pair on the computer somewhere :) ...this should work for both packaged and compiled Apertium. Without language data you can't see a translation, but you can see the help. Try,

2 KB (368 words) - 06:02, 24 April 2017
User:Shobhit Gautam 1503
I am interested in NLP, data mining, math as well system design. I love to do competitive challenges as ...stic diversity, simply because endangered languages don’t offer sufficient data.

11 KB (1,787 words) - 17:11, 13 April 2021
Apertium kullanarak dil çifti geliştir
...probably try this to make sure things work before you move on to whatever language pair you plan on working on. Note that some existing language pairs have external dependencies, like HFST or Constraint Grammar. The [[In

10 KB (1,715 words) - 12:29, 28 May 2018
Indirect contribution guide
...tended to show how you can make an "indirect" contribution, by documenting language resources, helping us to build bilingual test sets, translating, promoting, ...first language, and translate them to the other. A translation in a third language may be useful in enlisting help, but is not required.

9 KB (1,494 words) - 05:58, 18 March 2015
User:Raveesh/Application
...effort and can be accessed in a place where there are no speakers of this language. The challenge in itself is very interesting and its application can be see ...t from the past 6 months. I could relate to the project- “Bring a released language pair up to state-of-the-art quality (Hindi-English)”. I have been working

10 KB (1,482 words) - 22:05, 21 May 2014
Google Summer of Code/Application 2016
...ed translation, morphological analysis, natural language processing, human language technologies ...Spanish–Catalan) but which has been expanded to deal with more divergent language pairs (such as English-Catalan and even Basque→English). The platform pro

10 KB (1,500 words) - 16:23, 18 February 2016
Apertium on Ubuntu or Debian
...probably just search for, tick off and install Apertium and your favorite language pairs in Synaptic. There's a friendly [https://help.ubuntu.com/community/Sy Step 2: '''Download apertium, lttoolbox and language pairs from SVN.'''

3 KB (475 words) - 16:28, 27 April 2017
Apertium-get
'''apertium-get''' is a little script to fetch and compile language data, with monolingual dependencies, from Github. ...d and compiled by just going to the directory where you want your language data to be, and running

2 KB (317 words) - 20:45, 23 March 2019
User:N0nick/Application
=Apertium Summer of Code application:  New Maltese-Hebrew language pair= ...and inter-connected world, the spread of information is still limited by a language barrier. 

13 KB (2,014 words) - 20:05, 4 June 2011

File:Altai-alphabet-pronunciation-writing-system omniglot-com.png

==== Altai Language Resources ==== Crúbadán language data for Southern Altai. Kevin Scannell. 2015. The Crúbadán Project. oai:cruba

(588 × 481 (23 KB)) - 07:30, 5 December 2017

Automatic postediting at GSoC 2018
==== Data preparation ==== There were three attempts to extract postediting operations for each language pair: with threshold = 0.8 and -m, -M = (1, 3).

7 KB (1,033 words) - 15:27, 15 August 2018
Bilingual dictionary enrichment via graph completion
<li>- 4: preprocessing : dictionary data needs some changes to be used in a graph, this step prepares it for further ...recommends what languages will be the most efficient to enrich particular language pair</li>

19 KB (2,541 words) - 15:44, 12 August 2018
User:Mono/GSoC 2017
...tium is a free/open-source platform for rule-based machine translation and language technology which is aimed providing support for lesser-resourced and margin ...t lets the user to input a URL, choose a source language and a destination language and translate the webpage. This feature has been successfully completed as

16 KB (2,280 words) - 01:40, 8 March 2018
User:Khannatanmai/GSoC2020Proposal GapFilling
...ence if one learns to create good tools for MT, they learn most of Natural Language Processing. A tool which is rule-based and open source really helps the community with language pairs that are resource- poor and gives them free translations for their ne

6 KB (984 words) - 15:20, 21 March 2020
User:Khannatanmai/GSoC2020Proposal DistributedRepresentations
...ence if one learns to create good tools for MT, they learn most of Natural Language Processing. A tool which is rule-based and open source really helps the community with language pairs that are resource- poor and gives them free translations for their ne

7 KB (1,078 words) - 07:16, 28 March 2020
User:Chebrolutejasvi/GSoC2020Proposal
...d was exposed to different languages. This led to me being fascinated with language translation and I wanted to contribute to help in making communication easi I am going to work on “ Adopt an unreleased language pair: Hindi - Telugu”. I want to get the pair released in both the direct

9 KB (1,387 words) - 16:27, 31 March 2020
Chebrolutejasvi/GSOC 2020 proposal: Hindi-Telugu
...d was exposed to different languages. This led to me being fascinated with language translation and I wanted to contribute to help in making communication easi I am going to work on “ Adopt an unreleased language pair: Hindi - Telugu”. I want to get the pair released in both the direct

9 KB (1,391 words) - 16:41, 31 March 2020
Apertium on Mac OS X
== Language data packages == If you've installed tools with install-nightly.sh, you can install language data with

4 KB (665 words) - 11:57, 18 November 2022
User:Nmathur54
== Morphological Analyzer of Braj Language == ...language, and I am interested in making of morphological analyzer for Braj Language. And this rise my more interest in NLP, Machine Learning etc.

6 KB (873 words) - 18:25, 27 March 2019
User talk:Nmathur54
== Morphological Analyzer of Braj Language == ...language, and I am interested in making of morphological analyzer for Braj Language. And this rise my more interest in NLP, Machine Learning etc.

6 KB (870 words) - 18:42, 25 March 2019
Category talk:GSoC 2019 student proposals
== Morphological Analyzer of Braj Language == ...language, and I am interested in making of morphological analyzer for Braj Language. And this rise my more interest in NLP, Machine Learning etc.

6 KB (870 words) - 18:37, 25 March 2019
User:Aikoniv/GSoC20010Application
...translation system that is far from perfection has much to offer society. Language barriers are as big an issue today as they have ever been in hindering frui ...or creating new Apertium language pairs. And more immediately, the sme-nob language pair in the incubator will no longer require pipeline hacks to coerce the c

16 KB (2,502 words) - 19:03, 8 April 2010
Google Summer of Code/Application 2009
...um project is a project which works on open-source machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...versitat d'Alacant] (Alacant, Spain) and [http://www.prompsit.com Prompsit Language Engineering].

10 KB (1,543 words) - 19:50, 12 April 2021
Ideas for Google Summer of Code/Bilingual dictionary enrichment via graph completion
...f language pairs that may be used to infer new entries for existing or new language pairs using graphs. ...a graph and relevant information is stated about them. The cloud of linked data is intended to be navigated by software agents primarily. In the case of Ap

3 KB (452 words) - 19:50, 24 March 2020
User:Popcorndude
Some sort of language model that takes a list of LUs and dependency relations and determines the Given a syntactic parser for one language and a fairly small parallel corpus it seems like it should be possible to l

6 KB (972 words) - 18:06, 23 December 2022
User:JCentelles/GSoCapplication
...ments and most people have to face the challenge of communicating in other language that their own. So the question is, who is not interested in having access ...resources (I provide) to analyze Chinese and generate Spanish (well, this language is already in Apertium). Then, I will work with an statistical system + pos

11 KB (1,666 words) - 05:22, 13 May 2013
Siciliano y castellano/Informe final
...oject goal is to create a machine translation package for Sicilian-Spanish language pair on the base of Apertium’s machine translation system. This project i ...he Sicilian dictionary was the abundance of spelling forms in the Sicilian language. For instance, one Sicilian verb with the meaning 'to join' can have the fo

9 KB (1,370 words) - 13:58, 23 August 2016
User:Kamush/GSoC2021Proposal
'''Develop a prototype MT system for Kazakh - Uzbek language pair''' ...ntribute to the platform by extending the list of language pairs my native language - Uzbek has so far.

6 KB (854 words) - 15:35, 20 April 2021
Talk:Morphological dictionary
...tasks is the construction of efficient lexical processors from linguistic data. ...xical forms involves drawing correspondences between a lexical form in one language, and the translation in another. This final operation is crucial in constru

18 KB (2,967 words) - 19:24, 11 December 2012
Sardinian and Italian/Final Report
...language particularly suitable for various reasons. First, because it is a language in process of standardization, so both the linguistic resources (written do ...he near future, it will be possible to operate in the translation of other language pairs as Sardinian-Catalan and Sardinian-Spanish.

7 KB (1,110 words) - 11:34, 23 August 2016
User:Uliana/gsoc-propuesta
Qualification: Major in Natural Language Processing 2015: Awardee of graduates’ competition „Natural Language Processing” (''a competition for students hold by National Research Unive

11 KB (1,652 words) - 15:56, 24 March 2016
User:MitchJ/Application
...ovide an accurate, universal automated translation engine and accompanying language-specific datasets. ...venience and necessity required by increasingly globalised communication. Language extinction is occurring all around the world, including my native Australia

9 KB (1,302 words) - 04:11, 8 April 2011
User:Commial/AWI
====Rewrite language.php file as an abstract script, and interface modules for Apertium, Aspell "language.php" has been separate in 2 parts : environment management and translation

32 KB (4,699 words) - 16:32, 19 August 2011
User:Shrey1608
...ulture.Following the principles of preserving culture and heritage through language, Apertium connects both ancient and modern through advances in machine tran ...plays an important role out here.Meeting different people with different language makes it difficult to communicate but it is overcome'd by the translation t

9 KB (1,415 words) - 00:29, 29 March 2020
Ideas for Google Summer of Code/Adopt a language pair
...declarative language. A good intro would be to look through [[Apertium New Language Pair HOWTO]], see also [[Contributing to an existing pair]]. If the pair ha #* If there is no translation, translate it into the languages of your language pair first.

6 KB (1,024 words) - 15:22, 20 April 2021
User:Aidana/Proposal
...elps people from whole world to understand and get information in foreign language very quickly and easy. Building machine translation systems is very interes I interested in task “Adopt an unreleased language pair”, as a language pair I choose is Kazakh-English, in Kazakh-English translation direction. I

9 KB (1,088 words) - 19:28, 24 March 2016
Google Code-in/Application 2015
...rs independent free-software developers. There are currently 40 published language pairs within the project (including a number of "firsts" — for example Sp natural language processing, machine translation, grammar, python, c++, linguistics, languag

7 KB (1,111 words) - 10:10, 15 November 2015
User:Irene/proposal
...rently one of more successful translation endeavors—and while it lacks the data and traffic that is available to Google Translate, it stands out from corpo ...s; and (3) providing support for discontiguous multiwords in some existing language pairs. See work plan for details.

9 KB (1,382 words) - 20:53, 3 April 2017
User:Rupjyoti/Proposal2
I have an urge to improve open source language translation with Apertium. ...a very noble goal, which is bringing languages with low resource language data to life by linking them with machine translation of high resource languages

4 KB (630 words) - 11:39, 8 April 2019
User:Nikant/GsocApplication
...computer understanding and interpreting the grammar and other aspects of a language just like a human. I then started experimenting with machine learning and c ...machine translation of this pair do not work very well. The Hindi-English language pair still lies in the incubator stage in the Apertium directory. It needs

12 KB (1,877 words) - 06:42, 30 April 2013
Using Apertium spellers with LibreOffice-Voikko on Debian/Manual compilation
==Install language module== * To install Kazakh language module, first get it

4 KB (492 words) - 02:54, 10 March 2018
User:Sakshi.iiita/Application
...one area which if exploited could bring in miraculous software for solving language related problems. ...ate it at such a large scale will surely bear fruits. It’s quite tough as “language” as a whole is ambiguous and bridging two ambiguous things is in itself a

10 KB (1,711 words) - 04:39, 8 April 2010
User:Youssefsan
...languages ([[List of language pairs]]) and uncovered ([[Wikipedia:Romance language]]) *Have a look at [[Language and pair maintainer]]

12 KB (1,702 words) - 20:47, 12 December 2013
User:Mk20
[[Category:Language families]] ...building up their own arrangement of jargon and sentence structure, every language got explicit and extraordinary to a gathering of individuals or human progr

9 KB (599 words) - 11:47, 9 December 2019
Apertium on openSUSE
You can replace cy-en by different language pair. For the list of language pairs go [http://wiki.apertium.org/wiki/List_of_language_pairs#Trunk_.28rel === Install language-pair data ===

5 KB (808 words) - 02:48, 9 March 2018
Shallow syntactic function labeller
1. All needed data for North Sami, Kurmanji, Breton, Kazakh and English was prepared: there ar ...Also the testpack for two language pairs was built: it contains all needed data for sme-nob and kmr-eng, the labeller and installation script.

5 KB (764 words) - 01:40, 8 March 2018
User:Kiara
*'''langpair''': language pair to use for translation curl --data 'context=otro+mundo&word=*mundo&newWord=MUNDO&langpair=esp|eng&g-recaptcha-

10 KB (1,458 words) - 01:44, 8 March 2018
User:Fpetkovski/GSoC-2012 Application
...translation can be thought of as one of the greatest challenges in natural language processing. It is the single most useful application of NLP and building a ...on of work has been put into both developing the platform and creating the language resources. However, there is always more work to be done and being a part o

11 KB (1,655 words) - 18:18, 5 April 2012
Writing a scraper
#* If you can't understand the language the website is written in, ask for help in IRC or use a translator and look ...er when calling <code>Writer()</code>. For example if we want to write the data every 30 seconds call <code>Writer(30)</code>.</li>

14 KB (2,389 words) - 05:20, 29 March 2019
User:Rcrowther/project proposal
Apertium builds a bridge between two different disciples, the study of language, and computing. While there are substantial areas of crossover, this means The editor will be a minimal table display. No source data persistence beyond the original file is intended. No alternate views or MVC

10 KB (1,589 words) - 13:01, 23 January 2017
User:Sphinx/GSoC 2013 Application: "Chinese(simple)-Chinese(traditional) language pair"
...ically. And give me a chance to contribute from a entry level, adopting an language pair. ...nese-Spanish. That is the first step to bring Chinese into the translation language group.

7 KB (1,021 words) - 16:16, 16 May 2013
Uralic languages
...family of some three dozen related languages descended from a Proto-Uralic language and spoken by more than 25 million people throughout Europe and Northern As ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.

22 KB (2,520 words) - 23:09, 22 December 2014
User:Sphinx/Application for "Adopt a language pair" GSOC 2013
...ically. And give me a chance to contribute from a entry level, adopting an language pair. ...nese-Spanish. That is the first step to bring Chinese into the translation language group.

6 KB (968 words) - 07:07, 30 April 2013
Romanian and Catalan/GSOC 2018
...e Summer of Code 2018. It also includes information on the upgrade of four language pairs which was carried out during the same period. For a more detailed wor ...tem and develop it to bring it to release quality. In addition, four other language pairs have been upgraded to the monolingual package system to ease future d

7 KB (1,071 words) - 10:48, 14 August 2018
User:Padth4i/GSoC 2020 Proposal: Improving upon Malayalam English language pair
...urces available. This project solves this problem by acting as a collected data set of dictionaries and transfer rules that can be used by other projects f ...ges, which is currently in the “Incubator” stage. Malayalam is a Dravidian language spoken commonly in Kerala and the union territories of Lakshadweep and Pudu

4 KB (679 words) - 07:23, 24 March 2020
User:Prondubuisi/GSOC 2020 proposal:English-Igbo pair
- improve my Language skills(Igbo and English) - Contribute my quota to the sustenance of my Native language(Igbo)

8 KB (1,262 words) - 07:04, 27 March 2020
Install Apertium core using packaging
...l be available. For various reasons, the author has successfully developed language pairs using public repository versions of Apertium core. ...tes and Apertium tools. You also get, for optional install; release-level language pairs, service providers, constraint grammar code, and more. All under pack

6 KB (1,006 words) - 18:26, 27 April 2021
Google Summer of Code/Application 2011
...m project develops a free/open-source platform for machine translation and language technology. We try to focus our efforts on lesser-resourced and marginalise ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also independent free-software developers play a huge rol

13 KB (2,013 words) - 12:21, 20 June 2019
User:Iamas/GSoC13 Application: "Improved Bilingual Dictionary Induction"
...ically diverse country like India. Machine Translation can help reduce the language barrier. That motivated me to study Computational Linguistics in IIIT-H. I *High-quality dictionaries are based on corpora. This linguistic data decreases the role of human intuition during lexicographic process.

7 KB (1,010 words) - 17:50, 3 May 2013
User:Agneet42/proposal
...f us as thinking creatures with the world around us, the subtle nuances of language (which are different even in similar tongues, say the Latin-derived Spanish ...have deep-rooted interests coupled with experience in the field of Natural Language processing. And I hope to make a difference in the field of machine transla

13 KB (1,923 words) - 13:52, 3 April 2017
Google Summer of Code/Application 2010
...m project develops a free/open-source platform for machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also independent free-software developers play a huge rol

11 KB (1,802 words) - 19:51, 12 April 2021
User:Ggregori
*I have been reviewing NLP and Python using 'Natural Language Processing with Python' book. ...ader reads a file, converts some of its contents and fills the appropriate data structures.

15 KB (2,393 words) - 05:10, 27 August 2011
Preparing data for Moses factored training using Apertium
===Download and compile data=== ...</code> and <code>apertium-is-en</code>. You can find others at: [[list of language pairs]] and [[list of dictionaries]].

4 KB (647 words) - 07:45, 8 October 2014
Romance languages
...dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs. !rowspan=2| Language

18 KB (2,312 words) - 18:25, 18 September 2016
Semitic languages
...) constitute a group of related languages and a branch of the Afro-Asiatic language family. Spoken by more than 470 million people throughout North Africa and ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.

20 KB (2,336 words) - 18:10, 14 April 2015
User:Sushain/SemeticLanguages
...) constitute a group of related languages and a branch of the Afro-Asiatic language family. Spoken by more than 470 million people throughout North Africa and ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.

19 KB (2,259 words) - 07:52, 3 January 2014
User:Pankajksharma/Application
...sed by it's recipients. Also advancement in MT would cause in reducing the language barrier in the exchange process of ideas. ...a given sentence S in a source language and it's translation T in another language, the idea is to find the translation (T') of another sentence S'. The condi

17 KB (2,712 words) - 08:31, 2 June 2016
User:Niks/Application
...re great tools for sharing ideas. A large credit of human progress goes to language evaluation and the way the ideas are shared. There is large diversity in na ...eople also becomes difficult. Machine Translation can help in lowering the language barrier more economically while taking lesser time than traditional human t

12 KB (1,883 words) - 12:27, 23 March 2014
User:Irene/workplan
| 1 || 5/30 - 6/4 || some data, find test corpus || || ...multiwords from dictionaries, set up testing framework, support/preparing data for English separable verbs || ||

4 KB (506 words) - 18:45, 17 August 2017
Ideas for Google Summer of Code/automatic-postediting
== Improving language pairs by mining MediaWiki Content Translation postedits == ...and bidix entries to improve the performance of an Apertium language pair. Data is available from Wikimedia content translation through an [API https://www

3 KB (383 words) - 19:56, 24 March 2020
Ideas for Google Summer of Code/Apertium Occitan French
...language, as Apertium offers the only machine translation system for this language pair. The idea is to make Occitan output easier to postedit and French outp ...guage data], [https://github.com/apertium/apertium-fra the French language data], and [https://github.com/apertium/apertium-oci-fra the Apertium Occitan-F

2 KB (213 words) - 19:48, 24 March 2020
Altay
=== Altai Language Resources === Crúbadán language data for Southern Altai. Kevin Scannell. 2015. The Crúbadán Project. oai:cruba

2 KB (217 words) - 06:57, 5 December 2017
User:Arinkverma
</li><li>'''Mathematics and graphing Language''': SciLab and Matlab </li><li>'''Database language''': SQL

4 KB (512 words) - 11:34, 11 April 2013
Freeling
...in some cases data or tools from Freeling could be useful to apertium, and data from apertium could be useful to Freeling. Also, to install the data, I had to change the lines in freeling/data/Makefile.am that looked like

5 KB (720 words) - 02:20, 10 March 2018
Fisl13
...Everything in Apertium is free/open source: engine, data for more than 29 language pairs and tools to translate at a speed of more than 20,000 words per secon === Useful data ===

1 KB (175 words) - 14:19, 25 July 2012
User:Francis Tyers/Experiments
* <s>why when we add more data, do the results get worse ? </s> * run br-fr test with huge data.

16 KB (1,524 words) - 10:49, 22 November 2012
Error: A new ambiguity class was found
(in this example, I use eng as language resp. eng-deu as pair) the file ./eng-tagger-data/eng.dic for some reasons is empty (has a file size of 0).

1 KB (165 words) - 14:16, 28 August 2016
User:Nikita Medyankin/GSoC 2016 WTR Proposal
...to be an improvement to the architecture of Apertium and would benefit all language pairs. ...u need to read some documentation, wikis, sites, and the like in a foreign language. Any improvements to the translation quality we can think of will make the

10 KB (1,671 words) - 18:37, 24 March 2016
Iranian languages
...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. ...dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

22 KB (2,532 words) - 11:36, 30 July 2018
Dravidian languages
...e>[http://www.ethnologue.com/subgroups/dravidian dra]</code>) constitute a language family of about 70 languages spoken primarily in South Asia. The four most ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.

19 KB (2,201 words) - 09:21, 9 December 2019
User:Kanjbaba
...master’s degree at the University of Helsinki, where I majored in Finnish language and culture but also focused on other Finno-Ugric languages, particularly N ...nline forums and channels without having to use Russian as an intermediate language. This would promote the use of the languages in wider domains and prevent f

10 KB (1,561 words) - 15:46, 27 March 2018
Turkic languages
...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. ...ictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

35 KB (3,577 words) - 15:24, 1 October 2021
Apertium
...y aimed at related-language pairs but expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides * a language-independent machine translation engine

776 bytes (114 words) - 19:07, 12 September 2018
Курсы машинного перевода для языков России/Session 8
...on-months (four people, 18 months) to develop (both engine, and linguistic data). It was widely used, with thousands of requests per day. ...sh State to rewrite the code as open-source, and to convert the linguistic data. After one person year, the first version of the Spanish--Catalan translato

12 KB (1,679 words) - 12:00, 31 January 2012
Google Summer of Code/Application 2012
...m project develops a free/open-source platform for machine translation and language technology. We try to focus our efforts on lesser-resourced and marginalise ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but independent free-software developers also play a huge rol

11 KB (1,680 words) - 12:22, 20 June 2019
Helsinki Apertium Workshop/Session 8
...on-months (four people, 18 months) to develop (both engine, and linguistic data). It was widely used, with thousands of requests per day. ...sh State to rewrite the code as open-source, and to convert the linguistic data. After one person year, the first version of the Spanish--Catalan translato

12 KB (1,683 words) - 08:42, 10 May 2013
Tartu Apertium Course/Session 8
...on-months (four people, 18 months) to develop (both engine, and linguistic data). It was widely used, with thousands of requests per day. ...sh State to rewrite the code as open-source, and to convert the linguistic data. After one person year, the first version of the Spanish--Catalan translato

12 KB (1,683 words) - 11:00, 30 October 2015
Unigram tagger
...ll the unigram models from “A set of open-source tools for Turkish natural language processing.”<ref name="trmorph-tools">http://coltekin.net/cagri/papers/tr ...tuff.”<ref name="prerequisites">[[Installation#If you want to add language data / do more advanced stuff]]</ref>

20 KB (3,229 words) - 20:06, 12 March 2018
User:Nstsj/Proposal
The main problem of such languages is that they lack written (and annotated) data, thus stopping us from applying most of ML-methods (for example, neural net ...ple at Apertium are doing a lot of good work making low-resourced-language data available and I'd like to contribute to that.

5 KB (776 words) - 19:50, 21 May 2019
User:Sereni
...o speaks to the idea of free and accessible information, with texts in one language instantly understandable for speakers of others. I believe MT can contribut ...ch evaluation would point out the pairs ready for release, thus increasing language cover, and it would also provide a quantitative scale for quality measureme

8 KB (1,303 words) - 06:34, 13 May 2014
User:Rafi kamal/Application
I'm from Bangladesh and Bangla is my native language. But I have to use English for a lot of purposes. For example, the medium o And lastly, I'm planning to do research on natural language processing in Bangla to develop a Bangla search engine. I hope working expe

10 KB (1,432 words) - 10:24, 15 May 2014
Odia
...s one of the official languages of India, and has around 33 million native language speakers globally. .../ktpress.org.in/pdf/evolution_of_oriya_language.pdf The Evolution of Oriya Language and Script], ''Utkal University, Cuttack,''

13 KB (1,770 words) - 06:56, 3 December 2017
Celtic languages
...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. ...dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

10 KB (1,263 words) - 06:04, 23 December 2014
User:Deepakjoy
 3.) Automatic language detection using libtextcat, to make it even more of a single-click service. .... Link addresses will automatically be modified so that they send the link data to Geriaoueg.

8 KB (1,367 words) - 04:53, 3 April 2010
User:Rahul/GSOCApplication
...h lab. Also machine translation tool is very helpful for the people having language problem, this is also a way to give something back to the society. ...of just the syntactic tags. It might not be enough if we want to reorder a language having long distance relationship. So using dependency relation along with

14 KB (2,232 words) - 19:03, 7 April 2011
Language pair packages
'''Language pair packages''' are standalone JARs that can be run independently as well Since JAR files are nothing but renamed ZIP files, you can easily edit language pair packages to fit your needs. Note that the packages are ready to be use

11 KB (1,497 words) - 08:23, 7 April 2020
Germanic languages
...ogue.com/subgroups/germanic gem]) constitute a branch of the Indo-European language family spoken primarily in Europe, Anglo-America and Australasia. The commo ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.

32 KB (3,684 words) - 06:16, 28 December 2018
Ideas for Google Summer of Code/lint for Apertium
Make a program which tests Apertium data files for suspicious or unrecommended constructs (likely to be bugs). Some ...x]] (dix) dictionary data, perhaps also transfer rules. The [[Apertium New Language Pair HOWTO]] should introduce most of the terminology and background you ne

5 KB (789 words) - 10:36, 31 May 2016
User:Eden/GSOC2020Proposal English-Swahili
Create a usable ‘English-Swahili’ language pair. ...ar: daily communication with my mentors and having enough Swahili language data.

6 KB (988 words) - 13:21, 31 March 2020
User:Ergaurav3/GSOC Application1:Unify the metadix formats
The improvement in the current language pair and the addition of the new language pair is a continuous process in the Apertium project. ...nary in the language pair which is increasing with the addition of the new language pair.

13 KB (2,203 words) - 19:23, 28 March 2014
User:Anarsaikhan
...tage. Founded on the principles of preserving culture and heritage through language, Apertium connects the realms of the ancient and modern through advances in ...nguages, we can potentially discover what is and isn't possible in a human language. This, in turn, tells us important things about the human mind. The fewer l

11 KB (1,714 words) - 13:51, 28 March 2018
Google Summer of Code/Application 2008
...cant] (Alacant, Spain); the other one is [http://www.prompsit.com Prompsit Language Engineering]. These two organizations are currently responsible for most of ...systems to translate less-closely related languages. We have 10 published language pairs, and three more currently in development.

8 KB (1,255 words) - 19:50, 12 April 2021
User:Srj31/GSOC 2020 proposal:Bengali-Hindi pair
...te to this platform and I will have the opportunity to further create more language pairs for the various languages of India. I plan on working on Adopting the unreleased language pair Hindi-Bengali and get the pair released in both directions having a WE

12 KB (1,663 words) - 12:53, 31 March 2020
User:Sushain/GermanicLanguages
...ogue.com/subgroups/germanic gem]) constitute a branch of the Indo-European language family spoken primarily in Europe, Anglo-America and Australasia. The commo ...ter plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair.

26 KB (3,036 words) - 07:04, 14 December 2014
User:Rupjyoti/Proposal
I have an urge to improve open source language translation with Apertium. ...a very noble goal, which is bringing languages with low resource language data to life by linking them with machine translation of high resource languages

5 KB (690 words) - 16:58, 7 April 2019
User:Asfrent/GSoC Log
* added language pairs es-ro, ro-es, en-es, es-en. ...ules.xml'' in stage1 of language pair es-en. Regenerated new tests for the language pair, ran memcheck, all memory tests pass.

6 KB (876 words) - 00:28, 27 June 2014
User:Mlforcada/sandbox/GSoC
...ment of many language pairs. || Knowledge of XML, XSLT and one programming language that allows XML processing and easy writing of a user interface || Mikel L

2 KB (326 words) - 13:00, 19 March 2010
Translating mnemonic files
...the mnemonic (starting on the first column) must be kept unchanged from a language to another, while the string farther to the right is translated. By defaut, as for lttoolbox, apertium, and the language pairs, the installation is done in <code>/usr/local/bin</code> and <code>/u

5 KB (789 words) - 12:16, 15 June 2018
Translation quality statistics
...r/>words !! data-sort-type="number"|WER !! data-sort-type="number"|PWER !! data-sort-type="number"|BLEU !! Reference / Notes ...forms that get some analysis, may give an indication of the maturity of a language pair.

9 KB (1,233 words) - 09:10, 21 November 2021
User:Jimregan/LG Article
...of unstable translators in various stages of development. (See: [[List of language pairs]]). ...uent development has been funded by the university, as well as by Prompsit Language Engineering. While Apertium 1 was designed with the Romance languages of Sp

9 KB (1,365 words) - 17:00, 13 July 2008
User:Oldtrafford.kedar
...es which interested me were Anusaaraka and Apertium. Anusaaraka only gives language access but doesn't give translation. Also it is not very user friendly as i • Converting WX resources to Unicode data.

7 KB (1,030 words) - 12:41, 9 April 2010
User:Mary.szmary/proposal
...ic faculty, so working with language material and understanding more about language structure while contributing to machine translation systems is one of my pr ...ore widespread nowadays corpus-based translation, it requires working with language structure, which it's attracts me as a linguist.

6 KB (969 words) - 01:16, 27 March 2016
Javanese
...Javanese language]]) is an [[Wikipedia:Austronesian languages|Austronesian language]] from Indonesia, spoken by the Javanese people from the central and easter Its language code is '''jv''' and '''jav'''.

7 KB (881 words) - 13:11, 12 December 2018
User:Francis Tyers
...alysers, part-of-speech taggers, etc., the idea is to model as much of the language as possible, the wider the coverage the better. An Apertium MT system on th ...re complicated. Rules match on source language patterns, and output target language patterns. For most pairs, these patterns are modelled on part of speech, or

12 KB (1,835 words) - 00:06, 1 July 2020
Ideas for Google Summer of Code/Apertium African
...e language pairs (which haven't been started or have currentlu very little data in Apertium) and write an usable version which provides intelligible output * If there is some data for the language pair in the Apertium Github server, check it out and install it.

2 KB (238 words) - 13:45, 24 February 2023
Crossdics
...guage pairs <code>aa-bb</code> and <code>bb-cc</code> it will create a new language pair for <code>aa-cc</code>. * '''sl-tl''': source language (sl) and target language (tl).

5 KB (633 words) - 13:29, 6 October 2017
User:Sl33k/Application
...at this picture is likely to change[1]. As a student interested in natural language processing, MT gives me a great platform to work closely with linguistics a ...al as a great open source MT engine and also the converting the linguistic data by the tools in a fairly comprehensive way even at the first glance. Its co

5 KB (743 words) - 10:36, 1 May 2011
PMC proposals/Apertium Workshop in Russia
...eof, and following that the development of a prototype pair for a minority language of Russia. Russia has a long history of work in machine translation, but ve ...h oil, as Tatarstan and Sakha) students with good knowledge of a minorised language seldom have a computer and/or access to the internet. That is the case at l

18 KB (2,991 words) - 22:24, 3 August 2013
PMC proposals/Move Apertium to Github
* Individual repos for each pair, language module, and tool (preserving all commit history). ...ch|talk]]) 13:04, 7 February 2018 (CET) To install apertium and one or two language pairs, you (just) have to follow few wiki pages and then, you get the only

22 KB (3,325 words) - 14:06, 12 March 2018
User:Mohitraj
...rocessing. Previously i have completed courses on XML, Python programming, Language Technologies and Machine Translation. I have worked towards the development 1. 9th IASNLP-2018: IIIT-Hyderabad Advanced School on Natural Language Processing

4 KB (514 words) - 18:25, 27 March 2019
User:Natasha singh/GSoC2023Proposal
...slation systems for less-resourced languages, which do not have sufficient data to train a good ML or DL based NLP model. ...rve as the stepping stone in extending various NLP applications in Kumaoni language which will in turn help facilitate communication and access to information

3 KB (459 words) - 03:05, 4 April 2023
Удмуртско-русский переводчик
...D0%BE%D1%81%D1%81%D0%B8%D0%B8 Šupaškar Apertium Workshop]. Russian part of language pair was created using [[lttoolbox]], and all files, needed for Russian, we === Some data ===

3 KB (299 words) - 06:39, 30 January 2012
User:Ergaurav3/GSOC Application2:Plain-text formats for Apertium data
Plain-text formats for Apertium data The improvement in the current language pair and the addition of the new language pair is a continuous process in the Apertium project.

12 KB (1,985 words) - 14:22, 21 March 2014
User:Elmurod1202/GSoC2020Progress
...les have to be recompiled. Type make in the directory where the linguistic data are saved”''' ** Installed language data by compiling

11 KB (1,500 words) - 15:20, 5 September 2020
User:OverPowered/GSoC2021 Progress Report
...any anything to the right-click menu and `clipboardWrite` to allow copying data with a button. ...ely without javascript by wrapping every word up in a <hover> tag with two data attributes to represent information in it and its position. All the html wi

11 KB (1,763 words) - 08:02, 14 August 2021
User:Eiji
Language: Japanese, English I am intrigued by natural language processing and its usage. NLP is widely used and it improves human and mach

5 KB (847 words) - 11:48, 20 March 2023
Specific resources per language
...tps://apertium.github.io/apertium-on-github/source-browser.html. It houses language pairs which haven't completely matured and are under work. ==Specific resources per language==

10 KB (1,336 words) - 20:40, 11 December 2019
Resources
{{see-also|Incubator|Specific resources per language}} ...Pair HOWTO|making a language pair]], feel free to make a new page for the language in question and paste it there. Stuff like basic dictionaries, paradigms, r

1 KB (164 words) - 05:20, 4 December 2019
Lexical feature transfer - First report
for every sentence s in the source language corpus: for every sentence in the source language corpus:

6 KB (838 words) - 17:47, 25 July 2012
File names
Apertium has some naming conventions for the various files used in language data: Files compiled when you do "make" in a language pair:

890 bytes (126 words) - 10:10, 14 March 2017
User:Darthxaher/Sandbox
=== Data Structure === ...ia.org/wiki/Stack-oriented_programming_language Stack-oriented programming language]

1 KB (156 words) - 09:14, 16 April 2010
UDPipe
;Get some data! Now try it on your own data.

5 KB (822 words) - 19:43, 9 March 2020
Semantic tagging
== Data sources == * Often a word can be disambiguated using its translation in another language, for example the triple (estació, gare, station) defines a building meanin

5 KB (949 words) - 15:27, 15 June 2020
User:Eden/GSOC2019 English-Lingala
I’m planning to start the ‘English-Lingala’ language pair. ...ime contributor to Apertium, mainly by creating new English/French-African Language pairs.

7 KB (1,168 words) - 09:53, 28 March 2020
Prerequisites for RPM
...t plan on working on the core C++ packages (but only want to work on / use language pairs), you can install all prerequisites with yum/zypper, using [[User:Tin For a list of available language pairs and other packages, see https://build.opensuse.org/project/show/home:

1 KB (231 words) - 10:03, 12 January 2022
Install Apertium core by compiling
...you have something, immediately, it to try invoke a tool. Without language data you can't see a translation, but you can see the help. Try, ...language data by compiling]]. Or, if your system has packaging, download a language package (but beware, a package manager may pull in a old package of Apertiu

5 KB (821 words) - 02:55, 27 July 2022
Курсы машинного перевода для языков России/Session 7
...ly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The In contrast to many other types of systems for natural language processing — such as morphological analysers and part-of-speech taggers,

18 KB (2,490 words) - 12:00, 31 January 2012
Helsinki Apertium Workshop/Session 7
...ly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The In contrast to many other types of systems for natural language processing — such as morphological analysers and part-of-speech taggers,

18 KB (2,493 words) - 08:39, 10 May 2013
Plugin for Pidgin
...eir buddies (both incoming and outgoing messages). If the user has set the language pair eng-spa (English → Spanish) for incoming messages from buddy1, th *'''/apertium_check''' Shows the current language pairs associated with the buddy whose conversation you issued the command o

8 KB (1,263 words) - 02:18, 9 March 2018
Tartu Apertium Course/Session 7
...ly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The In contrast to many other types of systems for natural language processing — such as morphological analysers and part-of-speech taggers,

18 KB (2,493 words) - 10:59, 30 October 2015
User:Zu-ann
...s, grammars and vocabularies, so this is a practical use of the linguistic data we have, and I find it fascinating. ...speakers of these languages can have machine translations for their native language, as well as other people interested in minority languages. I would be happy

6 KB (874 words) - 16:01, 27 March 2018
Hectoralos/GSOC 2019 proposal: Catalan-Italian and Catalan-Portuguese
I’m a sociolinguist working on language maintenance and shift. I'm very interested in creating resources for minori '''1.2 Bring a released language pair up to state-of-the-art quality''': I'd like to improve the pairs Catal

16 KB (2,285 words) - 06:46, 12 April 2019
Why we trim
...erator.<ref>Typically this goes for both translation direction, although a language pair only released for one direction might only be trimmed in that directio ...at when post-editing, the post-editor has to constantly look at the source language text (whereas an unknown word would be possible to translate there and then

4 KB (679 words) - 16:06, 3 May 2020
User:Hiten
...on system particularly appeals to low-resource languages that have limited data availability. Due to this limited availability, the rule based approach is ...open-source HIN-MWR translator will aid developers in creating additional language pairs related to Marwari.

7 KB (1,043 words) - 15:03, 19 April 2023
User:Vyhuholl/GSoC Proposal 2018: Esperanto and Russian
I am a linguist and I am interested in computational linguistics and natural language processing. I'm interested in adopting an unreleased language pair(Esperanto-Russian).

3 KB (449 words) - 19:49, 26 March 2018
Uighur and Turkish/Paper
...Machine Translation] - This looks interesting, 200K sentences of bilingual data collected, we should contact the authors to see if we can access it [https: ...eb interface [http://nmt.cloudtrans.org/ here], but unclear wrt details of data/evals [https://scholar.googleusercontent.com/scholar.bib?q=info:A6cMdf1SuHw

10 KB (1,483 words) - 07:00, 14 August 2018
Press
Websites referencing Apertium categorised by language of the website. News about Apertium categorised by language of report.

13 KB (1,689 words) - 21:42, 28 February 2021
How to bootstrap a new pair
...ium-init to bootstrap a new language pair (optionally with new monolingual data packages as well). ...is script in your working directory where you will be downloading language data. You can get the script from https://apertium.org/apertium-init

5 KB (824 words) - 15:30, 20 April 2021
Monodix basics
...u can distinguish an element from an attribute and can recognise character data. If you want a quick recap, this should help: :<element attribute="value">character data</element>

11 KB (1,851 words) - 07:42, 16 February 2015

Search results

Page title matches

Page text matches

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools