Search results

Press
Websites referencing Apertium categorised by language of the website. News about Apertium categorised by language of report.

13 KB (1,689 words) - 21:42, 28 February 2021
How to bootstrap a new pair
...ium-init to bootstrap a new language pair (optionally with new monolingual data packages as well). ...is script in your working directory where you will be downloading language data. You can get the script from https://apertium.org/apertium-init

5 KB (824 words) - 15:30, 20 April 2021
Google Code-in/Application 2013
...m project develops a free/open-source platform for machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...eloped around the world, both in universities and companies (e.g. Prompsit Language Engineering) and by a growing numbers independent free-software developers.

6 KB (1,057 words) - 15:34, 28 October 2013
User:Davidho/Application
...e machine translation engine and has been expanded to treat more divergent language pairs. It is well-designed and allows everyone to contribute to it. This en Second, the linguistic data files are encoded in XML-based formats. XML files are easy to understand, w

7 KB (1,097 words) - 02:39, 21 March 2014
Mongolic languages
!rowspan=2| Language ==Existing language pairs==

5 KB (538 words) - 15:52, 11 April 2015
User:Spiegelian
== GSoC application: apertium hbs-eng, adopting a language pair == One of my majors is Linguistics, the other is English Language and Literature. Other than the simple fact that machine translation gives q

6 KB (987 words) - 15:28, 16 May 2014
User:Amanmehta/Application
...lation fascinates me. The core problem that translation of a text from one language to other can’t be solved by simple substitution of words, catches my inte I plan to “Adopt an unreleased language pair”, or to be precise, three language pairs: mar-hin, guj-hin, mar-guj. Mar-hin and guj-hin pairs are in incubato

11 KB (1,617 words) - 11:06, 29 April 2017
Indonesian
...ipedia:Indonesian language]]) is an Austronesian language and the official language of Indonesia. Since it is a register of [[Malay]], it is also often general In [[Apertium]], there is a language pair of [[Indonesian and Malaysian]] already in the [[Trunk|trunk category]

5 KB (629 words) - 13:08, 21 December 2019
Traductions en français
| width=320 | '''[[Apertium New Language Pair HOWTO]]''' | [[Become a language pair developer for Apertium]]

13 KB (1,601 words) - 23:31, 23 July 2021
User:Commial/GSoCApplication2011
...and everyone, and of course we don't have time, or inclination, to learn a language, just for a work, just for a e-mail answer … A machine translation become ...an. Indeed, we tell the computer to mimic the human in its own domain, the language.

10 KB (1,635 words) - 09:42, 8 April 2011
User:Frankier
...re to be done and different potential ways to apply NLP techniques to help language learners). ...ne translation and NLP is that a rule based system can explain itself to a language learner (some statistical/ML approaches can learn rules - such hybrid syste

2 KB (288 words) - 18:21, 22 August 2016
Sudo
If you're working on language data, <code>sudo</code> is pretty much only for running package managers like <c ...exception is <code>sudo make install</code>, but when working on language data you should never have to do this.

856 bytes (144 words) - 12:52, 3 May 2018
User:Elmurod1202/GSoC2020 Final Report
..."State-of-the-art Morphological Analayser for Uzbek language and improved language pairs uz-kk, uz-ky, uz-tr". After discussions with mentors, the best path t ...rtium-tur-uzb) translation pair, Southeast European Times(SETimes) website data collection in Turkish was used(around 3.7M tokens).

5 KB (722 words) - 16:16, 5 September 2020
User:Memduh/GSoC 2017
The study of natural language processing is fascinating to me, and machine learning is a remarkably pract ...at the output of the system becomes intelligible, valid text in the target language.

4 KB (575 words) - 10:03, 16 April 2017
User:Sokureo
...s a computational linguist, it would be great to apply my knowledge in the language theory to machine translation. ...o useful: machine learning gets successful when we have access to tones of data, but that is not what we have dealing with dead or minority languages. That

6 KB (925 words) - 16:09, 27 March 2018
Interfaces
...e official web site – it serves only the ''released'' (stable) versions of language pairs ** This is the official "beta" site – it serves the latest work in all language pairs (so things may work better, but also may have weird bugs). You can al

3 KB (457 words) - 07:42, 18 June 2021
Daemon
...ecifies the parameters and data files specific to that language pair. Each language pair can contain a number of modes; most of these are used for debugging ea ...b server. We use apertium-nn-nb as an example, but it should work with any language pair; the modules lt-proc/cg-proc/apertium-{tagger,pretransfer,transfer,int

13 KB (2,039 words) - 11:56, 3 June 2022
Top tips for GSOC applications
...ding period — and for documentation. Anyone thinking of working on a language pair should make sure that they read about [[testvoc]] and other quality co ...all]] Apertium and a language pair; read through the [[:Category:HOWTO|new language pair HOWTO]]. This might even give you some more ideas!

9 KB (1,509 words) - 23:51, 27 February 2023
User:Daedalus/GSoC2024Proposal
...nasi. I love learning new languages, and I have a keen interest in Natural Language Processing and Linguistics. I have also contributed to Apertium previously ...to translation is very interesting and more interpretable compared to the data-hungry, uninterpretable black boxes that modern-day machine learning-based

6 KB (918 words) - 06:00, 2 April 2024
User:David Nemeskey/GSOC progress 2013
...ings of the 9th International Workshop on Finite State Methods and Natural Language Processing, pages 39--47.</ref> and understand what it does. ! Language

34 KB (5,431 words) - 16:27, 29 October 2013
Google Code-in/Application 2014
...rs independent free-software developers. There are currently 40 published language pairs within the project (including a number of "firsts" — for example Sp ...ommunication) often occurs at this age, and if we can show them that their language is useful, and other people care, and there is no barrier for its use in th

6 KB (987 words) - 10:21, 7 November 2014
Морфологический трансдуктор русского языка
...textbook distinction in language, isn't it? When you start exploring real data the boundaries fade very fast and everything looks much more complicated.

22 KB (2,150 words) - 20:21, 24 April 2013
UD annotatrix/UD annotatrix at GSoC 2017
...statistical parser, which in turn can serve different purposes of natural language processing. For creating a good treebank, manual annotation and/or disambig ...interface allows to work with CoNLL-U and CG3 formats, and to convert the data between the formats. It also allows to either upload or paste corpora in pl

6 KB (930 words) - 15:59, 29 August 2017
Integrating Tesseract OCR into Apertium
...d of existing trained models. Successful tries are saved into new training data.<ref>https://static.googleusercontent.com/media/research.google.com/en//pub ...butions can also be found [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files-Contributions here].

2 KB (305 words) - 14:36, 28 October 2018
User:AMR-KELEG/GSoC19 Proposal
...rning engineer. My role was developing sentiment analysis model for Arabic language. ...urses, I had to use python/ R and Tableau to perform analysis on different data-sets.

8 KB (1,258 words) - 15:30, 27 April 2020
User:Popcorndude/Recursive Transfer/Progress
* Converting another language pair | Complete, fully documented system with full ruleset for at least one language pair

14 KB (2,141 words) - 21:26, 13 August 2019
Apertium-init
...er]] or [[CG]] files. It creates fully working Makefiles and stub language data, so you can compile and test straight away (assuming you've [[Installation|

744 bytes (108 words) - 20:38, 13 January 2021
Travis settings for Apertium
...thub. What this actually means is that you can set an apertium language or language pair on github to automatically build and test on each commit. You only nee This is an example for a monolingual data using hfst (from [apertium-fin]):

2 KB (249 words) - 06:26, 27 May 2021

...h he/she is going to provide Input Text data and also needs to specify the language into which the Translation needs to be done. After that the user will enter the text data in the box located at the Left hand side of the page and clicks on the "Tra

(1,280 × 800 (96 KB)) - 18:52, 2 April 2010

Apertium-tki
Apertium language data for Iraqi Turkmen. [[Category:Language data]]

1 KB (144 words) - 20:07, 15 July 2021
Apertium Nieuw talenpaar HOWTO
...temen kan maken. Het enige wat je zelf moet doen, is de data schrijven. De data bestaat uit 3 belangrijke delen, de woordenboeken, en enkele regels (woordv ...ems van de oorspronkelijke taal(source language='sl')of de doeltaal(target language='tl') kan kiezen en veranderen.

36 KB (5,761 words) - 14:34, 4 December 2011
Nieuw talenpaar maken
...temen kan maken. Het enige wat je zelf moet doen, is de data schrijven. De data bestaat uit 3 belangrijke delen, de woordenboeken, en enkele regels (woordv ...ems van de oorspronkelijke taal(source language='sl')of de doeltaal(target language='tl') kan kiezen en veranderen.

36 KB (5,767 words) - 07:07, 16 February 2015
User:Youssefsan/GCI 2012
** Language pages [[French]], [[Spanish]], [[Nahuatl]], [[Dutch]] * Language pairs:

2 KB (218 words) - 16:46, 9 December 2012
Perceptron tagger
While training can be done directly in the language directory, it is a better idea to train the tagger with copies of the files ...e the training directory (replace <code>lang</code> with the corresponding language code).

4 KB (651 words) - 13:36, 23 August 2017
Kashmiri
{{Language Kashmiri is an Indo-Aryan language spoken in the Kashmir Valley and regions around it that were historically a

6 KB (811 words) - 10:42, 2 July 2018
User:Oğuz/GSoC 2019
== Proposal: Bringing 4 language pairs up to release quality == ...stvoc and lexical selection that will result in a valid text in the target language.

4 KB (614 words) - 13:00, 7 April 2019
Ideas for Google Summer of Code/Make a language pair state-of-the-art
..., transfer rules, scripting, corpora. The objective is to make an Apertium language pair state-of-the-art, or close to state-of-the-art in terms of translation ...ge pair of your choice in Apertium and install it. (see [[Install language data by compiling]])

2 KB (383 words) - 19:46, 2 March 2023
Bugzilla
| 64 || Apertium-tolk should give proper warning when no linguistic data is installed || 2008-03-31 || Wynand Winte ...rg/cgi-bin/bugzilla/index.cgi here]. Please feel to report your bug in any language you are comfortable with.

12 KB (1,254 words) - 22:08, 7 March 2018
VM for transfer
| clip || - || N/A || part → value || Obtains the part in the only language there is (inter/post-chunk) and pushes the value onto the stack ...|| - || link-to || part, pos → value || Obtains the 'part' in source language in position 'pos' and pushes the 'value' onto the stack. An optional operan

14 KB (2,020 words) - 13:58, 7 October 2014
User:David Nemeskey/GSOC proposal 2013
...ion is a very complex problem that depends on almost all fields of natural language processing. As such, it is a very "enabling" field, and can benefit from th ...ings of the 9th International Workshop on Finite State Methods and Natural Language Processing, pages 39--47.</ref>. However, the library currently used to par

10 KB (1,561 words) - 15:22, 28 May 2013
Talk:Writing Makefiles
install-data-local: Most language pairs have lines like

3 KB (482 words) - 15:54, 24 March 2014
Apertium-apy/Language identification
This page contains data for CLD2 coverage. If need help to obtain CLD2 coverage of a certain language, contact [[User:Wei2912]].

75 KB (7,440 words) - 17:12, 8 August 2014
Apertium on SliTaz
Where LANGUAGE_PAIR is language pair (e.g. en-eo) wget http://sunsite.unc.edu/pub/Linux/system/keyboards/console-data-1999.08.29.tar.gz

2 KB (281 words) - 02:58, 9 March 2018
Ideas for Google Summer of Code/Unsupervised weighting of automata
** Select a language ** Use the Apertium morphological analyser to analyse the test data

1 KB (213 words) - 21:13, 18 March 2019
Apertium on Windows
...s, data, and other system resources with applications, software tools, and data of the Unix-like environment. Therefore it is possible to launch Windows ap Now you're ready to download and build language pairs and use them under Cygwin's shell.

12 KB (1,883 words) - 22:06, 7 March 2018
Shallow syntactic function labeller/Workplan
...is it possible to achieve pretty good results having very small amount of data (like in case of Breton) ...ad of the original syntax module in kmr-eng pipeline. The testpack for two language pairs was built. All code was cleaned up, some docstrings were written. Als

6 KB (833 words) - 12:56, 22 August 2017
Comment contribuer à une paire de langues existante
* répertoire es-tagger-data : Contient les données nécessaires pour le tagger espagnol (corpus, etc.) * répertoire ca-tagger-data : Contient les données nécessaires pour le tagger catalan (corpus, etc.)

54 KB (8,480 words) - 18:55, 10 April 2017
Shell scripting
If you want to work on Apertium language pairs or tools, some knowledge of the Unix shell / command-line scripting w ...hell/ shell scripting] and [https://hacker-tools.github.io/data-wrangling/ Data wrangling] are relevant and succinct

746 bytes (101 words) - 09:20, 8 February 2019
North Saami and Finnish
** We can haz. Data is now checked in on Victorio at /langtech/trunk/words/dicts/algu, with a r ...ns Finnish and Northern Sámi. Ryan can contact them if it seems like their data would be of use.

16 KB (2,457 words) - 08:19, 12 April 2017
Online Apertium Workshop 2020
.../presentation/d/1LBcBs3KdzfS7vl6Sxe0UtOMLpWNMM6ciGS_YPCnxTr0 Reading-bound data as inline secondary tags]", Tino Didriksen *** "Reading-bound data is best transported as inline secondary tags, proven both by practical expe

3 KB (509 words) - 15:49, 2 July 2020
Trigger build on file save
...our language data directory (replacing "apertium-foo" for your monolingual data dir):

725 bytes (111 words) - 09:24, 2 March 2016
User:Popcorndude/Unit-Testing
tsv-file: past-tense-tests.tsv # read the test data from a tab-separated list ...as a test that can pass or fail) or in interactive mode (which updates the data to reflect the state of the translator).

9 KB (1,402 words) - 16:40, 2 March 2021
Translating man pages
By defaut, as for lttoolbox, apertium, and the language pairs, the installation is done in <code>/usr/local/bin</code> and <code>/u ...ium</code> command, there is the '''<code>-f</code>''' option to translate data produced in this format without having to call "by hand" a deformatter and

5 KB (780 words) - 11:48, 15 June 2018
Helsinki Apertium Workshop/Programme
...;13:00  ||   '''Practical''': Installing Apertium and creating a language pair ....sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session7a.pdf Data consistency, quality] and [https://svn.code.sf.net/p/apertium/svn/branches/

8 KB (720 words) - 15:18, 20 March 2015
Writing Makefiles
# Most language pairs don't need to specify anything else for install-data-local: install-data-local: install-modes

4 KB (612 words) - 13:09, 18 February 2015
Google Summer of Code/Application 2013
...m project develops a free/open-source platform for machine translation and language technology. We try to focus our efforts on lesser-resourced and marginalise ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but independent free-software developers also play a huge rol

9 KB (1,376 words) - 15:24, 22 March 2013
PMC proposals/New naming of the Bosnian-Croatian-Montenegrin-Serbian Sprachbund
...ces for the involvement of Croatian researchers and developers in Apertium language pairs involving Croatian as part of the [http://cordis.europa.eu/projects/r ...for the more inclusive ISO-639-2 code hbs to be used to refer to it in all language pairs developed inside Apertium for components of this macrolanguage.

6 KB (987 words) - 22:27, 3 August 2013
User:SilentFlame
...iterature which comes back to translating their literature to their native language, and this is where I have always liked to work. Machine translation is one of the most important fields of Natural Language Processing (NLP) and also employs almost all the fields of NLP. At the same

11 KB (1,849 words) - 10:47, 26 August 2017
User:SilentFlame/proposal
...iterature which comes back to translating their literature to their native language, and this is where I have always liked to work. Machine translation is one of the most important fields of Natural Language Processing (NLP) and also employs almost all the fields of NLP. At the same

11 KB (1,834 words) - 15:03, 2 April 2017
Listing Apertium element using command-line
** a language pair, ** the reference files for a language.

8 KB (1,327 words) - 21:34, 17 February 2019
Prerequisites for Debian
...t plan on working on the core C++ packages (but only want to work on / use language pairs), you can install all prerequisites with apt-get, using [[User:Tino D # or, to get all dependencies for building a language from git:

2 KB (311 words) - 21:05, 2 April 2021
User:Ifeanyi/proposal
GSOC 2021: Create a usable version of these language pair: English--Igbo ...you a solution in case if you are stuck in a particular issue. I love Igbo language so much that am willing to get involve or participate in anything that conc

6 KB (826 words) - 15:41, 7 April 2021
Apertium-service
...nslation pairs as a service and provides '''translate''' and '''detect''' (language recognition) capabilities over an '''XML-RPC''' interface, as well as '''RE ...for discussion). It also manages a ''resource pool'' of e.g. language pair data, both (eagerly) pre-allocated and (lazily) allocated at need, up to a high

13 KB (1,764 words) - 03:29, 6 November 2019
Google Code-in/Application 2010
...m project develops a free/open-source platform for machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also independent free-software developers play a huge rol

3 KB (424 words) - 19:24, 29 October 2010
Talk:Apertium New Language Pair HOWTO
etc. instead of all those different commands, for the language pairs priviliged enough to have fancy makefiles. :sloppiness on my side. If all data+build script is available for the user, this kind of errors disappear. [[Us

14 KB (2,149 words) - 16:12, 27 April 2017
Autoconcord
If you are the unlucky owner of a language pair where you must maintain the synthetic adjective tag (<sint>) in the bi -prepare attempts to detect and insert autoconcord data into the monodices,

7 KB (1,185 words) - 08:39, 6 October 2014
User:Ragib06/Application2011
I found an incomplete task on Bengali-English language pair in Apertium. I also checked that it was a GSoC project of 2009. Among ...er, there has been a GSoC project back in 2009 on adopting Bengali-English language pair. But that project was not complete enough to release bn-en from Aperti

9 KB (1,374 words) - 07:51, 9 April 2011
User:Ggregori/Application
...lation is trying to make a computer understand a, by definition ambiguous, language and the relation between different languages, therefore my interest in the ...and develop more complex things, and later port it to the C++ programming language.

10 KB (1,650 words) - 11:41, 28 April 2011
Comparison of part-of-speech tagging systems
!rowspan=3|System !!colspan=7|Language ...ives]/[words with a correct analysis from the morphological parser]). This data is also available in box plot form [https://frankier.github.io/apertium-tag

16 KB (1,448 words) - 16:50, 22 August 2017
User:Shardulc
...eChain'''). This requires the additional argument '''src''' for the source language of possible translation chains. The returned JS Object contains a mapping from language pairs to mode names (used internally by Apertium).

6 KB (724 words) - 03:21, 6 January 2017
User:Vaydheesh/Proposal
...hi, Bagheli, Chhattisgarhi, Bombay Hindi. Due to so much of variation in a language, linguistics has always fascinated me. Upon combining this with my passion During my projects on Machine Learning, I came across Natuaral Language Processing, which opened the world of Computer Linguistics for me. While br

10 KB (1,492 words) - 13:17, 9 April 2019
Getting bilingual dictionaries from OmegaWiki
...rossdics|crossdics]] package) to get cheap bilingual dictionaries from any language pair available in [http://www.omegawiki.org OmegaWiki] database. ...downloads/omegawiki-lexical.sql.gz download] the latest version of lexical data from the OmegaWiki database (see also [http://www.omegawiki.org/Help:Downlo

2 KB (202 words) - 00:55, 24 January 2018
User:Padth4i
'''Courses I've Taken''': Data Structures, Algorithms, Object Oriented Programming, Maths (Calculus, Matri == GSoC 2020: Improving Malayalam - English language pair ==

886 bytes (114 words) - 07:51, 24 March 2020
Unification of metadix and parametrized dictionaries
Different language-pair packages use different strategies to generate .dix dictionaries ([[mon ...t versions of a translator (for instance, for two different varieties of a language, such as Brazilian and European Portuguese) whose names could be ideally ti

11 KB (1,733 words) - 08:24, 25 April 2016
Linguistic Resources Document
* '''sl''': source language (for example, in morphological and bilingual dictionaries) * '''tl''': target language (for example, in bilingual dictionaries)

8 KB (902 words) - 09:19, 6 October 2014
Apertium-dixtools
...ictionary for languages A and C is built from dictionaries for A-B and B-C language pairs. (or some other Unicode language installed - I use eo.UTF-8) and run the tests again.

8 KB (1,070 words) - 01:29, 26 October 2018
Getting started with induction tools
Choose a language pair. For this example, it will be Italian (it) and English (en). To use ...ing a few million lines of xml. It will refer frequently to s1 (the first language of the two in the filename jrc-lang1-lang2.xml, which is jrc-en-it.xml in t

7 KB (973 words) - 02:52, 20 May 2021
Bytecode for transfer/Evaluation
This is a test of all .t1x files in all language pairs in http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/. There Please find your favorite language pair below and check.

67 KB (9,057 words) - 06:52, 24 September 2013
User:Ruthenian8/GSOC 2021 progress report
...nalysis for multiple Nakh-Daghestanian languages and develop corresponding language pairs. It covers all the parts of speech present in the language. 

3 KB (452 words) - 09:17, 18 August 2021
Corpora formats
A corpus should be easily parsed by software that needs to get data from it. There is also metadata that should be stored in the corpus, and t * language of content (per article)!

5 KB (813 words) - 00:08, 28 December 2011
User:Ahmed Siam/GSoC2023Proposal
* Native Language: Arabic * Second Language: English

4 KB (512 words) - 15:34, 29 March 2023
PMC proposals/Stable version of apertium-sh-sl
The language pair seems to work OK in the sh→sl sense but not so well in sl→sh (appa Improving this language pair would be nice for the first milestone of the project Abu-MaTran (June

2 KB (380 words) - 22:26, 3 August 2013
Recursive transfer
** Learn shift/reduce using target-language information ? *: If a language uses CG, the rule SN -> @A→ @N would only match where CG mapped @A→ (an

5 KB (788 words) - 10:50, 9 February 2015
Bilingual dictionary discovery
...se dictionaries where each node is a word in a language, and each arc is a language pair. For example like: http://i.imgur.com/SFOsRMv.png * Only one word per input language

3 KB (487 words) - 00:02, 22 March 2018
Packaging
== New Language or Pair Package == Import, push new branch data, push new upstream tag:

8 KB (1,106 words) - 19:51, 26 April 2018
Vin-ivar/proposal ud apertium
Bonus: use closely related language treebanks in UDPipe; transfer the lemmas, assume the POS tags remain the sa '''Week 6:''' stealing Apertium data

4 KB (657 words) - 08:58, 3 April 2017
Apertium separable/report2017
The purpose of this project is to allow Apertium language-pair developers to better translate "seperable" or "discontiguous" multiwor * (for language developers: have the language-data writer write it explicitly in the .lsx file)

1 KB (205 words) - 18:36, 15 November 2017
Dictionary maintenance
...an, Portuguese, there is support for generating a particular standard of a language (e.g. Brazilian Portuguese, Valencian). The way this is done may need to be * Language specific sections of monodix files.

3 KB (461 words) - 15:31, 26 September 2016
Apertium-apy/Debian
* you get to decide what kinds of crazy half-finished language pairs to serve (or you can just serve a few of the high-quality ones that y Now install APY and the language pairs you want:

5 KB (653 words) - 21:00, 2 April 2021
Morphological dictionary
...ary is to model the rules that govern the internal structure of words in a language. ...o begin with, some terminology; if you are familiar with graphs (as in the data structure), this might help. A finite-state automaton can be visualised as

15 KB (2,200 words) - 12:04, 6 October 2014
Apertium-apy/Fedora
* you get to decide what kinds of crazy half-finished language pairs to serve (or you can just serve a few of the high-quality ones that y Now install APY and the language pairs you want:

5 KB (640 words) - 21:02, 2 April 2021
Ideas for Google Summer of Code/Automatic diacritic restoration
...ter], which has been trained for more than 100 languages using web crawled data. Details are in his paper linked below. You can try the system [http://l ...issue is to optimize smoothing of the statistical models on a language-by-language basis.

2 KB (307 words) - 19:50, 24 March 2020
Ideas for Google Summer of Code/Plain-text formats for Apertium data
...article/download/3355/1843 . I ([[User:Mlforcada|Mlforcada]]) believe this language is much easier to write; it should be upgraded and documented. The preproce [[Category:Ideas for Google Summer of Code|Plain-text formats for Apertium data]]

2 KB (324 words) - 11:37, 16 February 2016
User:Oğuz/GSoC 2018
...nsfer and lexical selection that will result in a valid text in the target language. Data for machine-learned disambiguation.

3 KB (415 words) - 20:28, 25 March 2018
Prerequisites for Mac OS X
...he parts about lttoolbox/apertium, just install the language pair/language data itself if you ran [https://apertium.projectjj.com/osx/install-release.sh in

2 KB (355 words) - 19:36, 12 May 2019
User:ScoopGracie/PMC/Proposed bylaws
...modular, documented, open platform for machine translation and other human language processing tasks</li> <li>To favour the interchange and reuse of existing linguistic data.</li>

9 KB (1,508 words) - 21:40, 22 March 2020
User:Wei2912
...ei2912/WiktionaryCrawler is a crawler for Wiktionary which aims to extract data from pages. It was created for a GCI task which you can read about at [[Tas ...ies, then crawls these subcategories for pages. It then passes the page to language-specific parsers which turn it into the [[Speling format]].

2 KB (380 words) - 08:13, 29 May 2021
User talk:Popcorndude/Recursive Transfer
...f either a main verb / auxiliary or an adjective+copula. In transfer to a language like English, Spanish, French, or Turkish, the person of this possessed for * To what extent is it possible and desirable to put parts of this data in the monolingual repositories?

11 KB (1,582 words) - 20:16, 9 May 2019
Lttoolbox-java
The Java port needs the C++ binaries for preparing/developing a language pair, i.a. to compile transfer files and train the tagger. ...ed Apertium JAR file, only dependent on JRE and an additional JAR file per language pair.

9 KB (1,370 words) - 09:49, 7 April 2020
Sardo e italiano/Rapporto finale
...ioni possibili) e assicura che tutte abbiano un'equivalente nella ''Target Language''. Il risultato migliore sarebbe che non ci sia nessun errore nel Testvoc. ...Testvoc riguardano il verbo "stare". Non crediamo che siano errori "reali" data l'impossibilità nel riprodurli.

13 KB (1,910 words) - 11:34, 23 August 2016
User:Ozgay
...nsfer and lexical selection that will result in a valid text in the target language. Data for machine-learned disambiguation.

2 KB (279 words) - 21:05, 8 April 2019
Minimal installation from SVN
...[How to bootstrap a new pair]]. For existing pairs, see [[Install language data by compiling]],

717 bytes (103 words) - 22:05, 7 March 2018
Modes introduction
...e ('lt-toolbox', 'apertium-lex-tools') is a collection of tools which pipe data one to another. You can use these tools individually. There are many instru However, to ease the use of the tools, Apertium language-builds pre-configure chains of tools into scripts. These pre-configured cha

6 KB (992 words) - 17:25, 22 September 2016
Apertium Android
It requires internet permission to enable users to download language pairs (and developers to showcase their work from a phone). * language detection - for example using https://code.google.com/p/language-detection/

3 KB (449 words) - 01:06, 4 June 2020
User:Kevin Scannell
...elic]] machine translation, which was taken from an ad hoc system for this language pair that I created in 2005. Since then, I've developed a more mature [htt More about me: for about 15 years I have been working on developing language technology for under-resourced languages around the world. I've developed

1 KB (202 words) - 22:22, 19 January 2017
User:Mary.szmary/proposal2017
...statistical parser, which in turn can serve different purposes of natural language processing. For creating a good treebank, manual annotation and/or disambig ...ling to be involved in it. So, Apertium will be more likely to receive the data for its needs as a side product than by trying to get people to doing Apert

9 KB (1,483 words) - 22:04, 2 April 2017
Apertium-arz-ara
...lect Corpus and Lexicon. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018.] :* A Parallel corpus of arz-ara-apc/ajp (2,994 sentences). The data was manually translated by professional translators. Sentences are collecte

2 KB (192 words) - 11:11, 19 January 2022
User:Mlforcada/sandbox/governance
...modular, documented, open platform for machine translation and other human language processing tasks ## To favour the interchange and reuse of existing linguistic data.

7 KB (1,064 words) - 13:52, 24 February 2010
User talk:Sambit/GSoC proposal 2017: Odia and English
"Bootstrapped new language pair(odi.eng) with existing eng monodix." – the language code is "ori", no? https://en.wikipedia.org/wiki/ISO_639:ori ...with English: We're normally skeptical about it, since English has so much data that corpus-based methods work very well and it's very difficult to get hig

810 bytes (131 words) - 17:31, 5 April 2017
Google Summer of Code/Report 2010
...stem. Currently the transfer system becomes the main bottleneck in case of language pair with complex transfer systems because of the XML processing associated ...he very moment the user inserts or deletes text. This allows for a further data mining on the edits to detect commonly modified structures in a given trans

16 KB (2,571 words) - 12:21, 20 June 2019
English and Catalan/GSOC 2017
The lack of documentation regarding the language pair, the monolingual dictionaries or even the tagger has made me put an ef ...r to create wikitables with a lot of information about transfer rules from data embedded into the rule files (T1X, T2X and T3X). There are other scripts th

5 KB (887 words) - 22:24, 31 August 2017
Bylaws
...modular, documented, open platform for machine translation and other human language processing tasks</li> <li>To favour the interchange and reuse of existing linguistic data.</li>

8 KB (1,214 words) - 22:30, 3 August 2013
Talk:PMC proposals/Move Apertium to Github
A monorepo with all the lingustic data, pairs and language modules. Other folders in SVN like the core engine and peripheral tools (e. Individual repos for each pair, language module and tools. A couple of “meta-repos” that contain submodules poin

6 KB (979 words) - 17:55, 1 February 2018
User:Dtr5
...hough eu is badly configured). If you want to use a pair that has a common language, like es-pt, only the pt configuration file will be required. Follow the same procedure with the other language (5), provide a bidirectional dix (6), and press the upload button.

9 KB (1,410 words) - 13:52, 22 December 2015
Bylaws/Draft
...modular, documented, open platform for machine translation and other human language processing tasks</li> <li>To favour the interchange and reuse of existing linguistic data.</li>

8 KB (1,215 words) - 18:14, 3 March 2018
Documentation for integrating Tesseract (OCR) into Apertium
! Language ..., the package 'spa' shown [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files here3], to be able to identify by the app texts in Spanish

3 KB (450 words) - 16:23, 10 December 2018
By-laws/Draft
...modular, documented, open platform for machine translation and other human language processing tasks</li> <li>To favour the interchange and reuse of existing linguistic data.</li>

9 KB (1,356 words) - 18:34, 3 March 2018
Migrating tools to GitHub
Apertium has migrated all the language data, the core, and a few tools to [https://github.com/apertium GitHub]. Many to

1 KB (215 words) - 04:45, 9 March 2018
Ideas for Google Summer of Code/Appraise gisting
Many language pairs in Apertium are unique, such as Breton-French, and many of them are u * Contact [User:mlforcada Mikel L. Forcada] to obtain the data cited in the paper.

2 KB (238 words) - 19:49, 24 March 2020
User:Mark Mandel
...ry much, but one of my projects is the LDC's [http://lrwiki.ldc.upenn.edu/ Language Resource Wiki]. If you want to contact me directly, my username is mamandel

439 bytes (69 words) - 20:49, 27 March 2010
User:Shash42/GSoC 2020:Bilingual Dictionary Discovery
I am interested in Natural Language Processing, Deep Learning, and Applied Math. I am fascinated by computation ...ng this journey, I have self-learned a lot about linguistics. My algorithm/data structures journey too revolved around an olympiad, i.e. the International

2 KB (300 words) - 22:46, 31 March 2020
Talk:Turkic languages
...to have a 'Resources' page/section for Turkic languages, as it is done on language pages. http://starling.rinet.ru/cgi-bin/bdescr.cgi?root=config&morpho=0&basename=\data\alt\turcet

9 KB (1,248 words) - 23:52, 20 June 2016
User:Shash42
I am interested in Natural Language Processing, Deep Learning, and Applied Math. I am fascinated by computation ...ng this journey, I have self-learned a lot about linguistics. My algorithm/data structures journey too revolved around an olympiad, i.e. the International

2 KB (300 words) - 22:53, 31 March 2020
Setting up a build environment for a language pair
* All the language data files:

972 bytes (144 words) - 12:09, 26 September 2016
XML editors
If you are editing Apertium language data (e.g. [[dix]] and [[transfer]] files), you should use a real XML editor. Th

5 KB (783 words) - 14:25, 29 December 2020
Compiling the language pair
== Compiling the language pair == If you don't need to work on monolingual data use the nightly repos:

1 KB (163 words) - 16:53, 28 May 2017
Installation/Developers
If you want to work on Apertium language data and/or tools, you most likely want to use the binaries from Tino Didriksens

2 KB (279 words) - 20:52, 2 April 2021
Integration and tagset conversion with Giellatekno
...air setup nowadays is using transducers from Giellatekno and pair-specific data in Apertium. This is a tricky set up because there is a lot of machinery ar

6 KB (984 words) - 17:56, 12 March 2016
Debian için Gereksinimler
...şmak'' istiyorsanız [[Minimal installation from SVN|check out the language data from SVN]] sayfasını okumalı ve derlemelisiniz ( Hala apertium/lttoolbox

2 KB (313 words) - 21:02, 2 April 2021
Code style
* Prefer containers over home made data structures. It's going to make it impossible to build for language pair authors.

5 KB (823 words) - 15:40, 26 September 2016
Lint
* Maintain consistency in the data present in the <r> tag in pardef entries. ...h modes.xml present in the same directory as the other files for the given language pair, this function checks and prompts incase a file defined in a program.

9 KB (1,459 words) - 19:41, 15 May 2021
Uputstvo za novi jezički par za Apertium
...ima leme su povezane s paradigmama koje nam dozvoǉavaju da opišemo kako se data reč meǌa bez pisaǌa svakog pojedinačnog nastavka. ...to see. In Serbo-Croatian this is videti. Serbo-Croatian is a null-subject language, this means that it doesn't typically use personal pronouns before the conj

26 KB (4,259 words) - 07:00, 16 February 2015
CG tagging hybrid and tagger improvements/Work plan
...held back validation scripts for a few languages & give them reproducible language models ...Note that averaged here refers to averaging over time so that new training data isn’t given too much weight.

2 KB (254 words) - 15:18, 13 June 2016
User:Kamush/GSoC2021ProgresReport
* Collect data in both languages * Bootstrapping a new language pair apertium-kaz-uzb

10 KB (1,179 words) - 11:51, 31 August 2021
Metadix
Metadixes are currently used in some language pairs, such as English-Catalan and Occitan-Catalan. linguistic data are compiled these dictionaries are pre-processed, so

5 KB (744 words) - 08:26, 25 April 2016
Ideas for Google Summer of Code/superblank handling algorithm
Then, ''after'' reordering (for instance, into a Turkic-style language) to generate ''sister my Wales in lives'', ** I disagree. One of the key aspects of "my way" is that non-textual data between block tags are ''not'' sent through the translation chain at all, m

9 KB (1,486 words) - 19:56, 24 March 2020
Apertium-kir
'''Kymorph''' is a morphological analyser/generator for the [[Kyrgyz language]], currently working. It is intended to be compatible with transducers for # Get CG3 format of conllu data

1 KB (218 words) - 14:51, 24 April 2024
Format handling
Other deformatters and reformatters were written directly in C or C++ language without using XML files. So, they don't follow format specification descri ...ated from a format specification in XML. Rules for format, like linguistic data, are specified in XML, and they contain regular expressions with flex synta

13 KB (1,781 words) - 09:49, 6 October 2014
Polish and Russian/Project description
'''Corpora and language data'''

4 KB (570 words) - 18:43, 23 August 2016
ReTraTos
...dictionaries and transfer rules. The induction systems and open linguistic data can be used with the [[Apertium]] toolbox to build open-source MT systems. ...bes how to use ReTraTos to create a bilingual dictionary for your Apertium language pair. You will need:

8 KB (1,273 words) - 09:32, 3 May 2024
Command line
...m_New_Language_Pair_HOWTO]] – using lt-comp, lt-proc etc. to test language data

443 bytes (64 words) - 16:56, 27 April 2017
User:Srbhr
...lus, Statistics and Probability, Linear Algebra), Algorithms and Analysis, Data Structures, OOPs. ...chnical Knowledge''': Python, C/C++, JavaScript, Machine Learning, Natural Language Processing, Deep Learning.

1 KB (168 words) - 15:41, 24 March 2020

Search results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools