Search results

Jump to navigation Jump to search
  • Websites referencing Apertium categorised by language of the website. News about Apertium categorised by language of report.
    13 KB (1,689 words) - 21:42, 28 February 2021
  • ...ium-init to bootstrap a new language pair (optionally with new monolingual data packages as well). ...is script in your working directory where you will be downloading language data. You can get the script from https://apertium.org/apertium-init
    5 KB (824 words) - 15:30, 20 April 2021
  • ...m project develops a free/open-source platform for machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...eloped around the world, both in universities and companies (e.g. Prompsit Language Engineering) and by a growing numbers independent free-software developers.
    6 KB (1,057 words) - 15:34, 28 October 2013
  • ...e machine translation engine and has been expanded to treat more divergent language pairs. It is well-designed and allows everyone to contribute to it. This en Second, the linguistic data files are encoded in XML-based formats. XML files are easy to understand, w
    7 KB (1,097 words) - 02:39, 21 March 2014
  • !rowspan=2| Language ==Existing language pairs==
    5 KB (538 words) - 15:52, 11 April 2015
  • == GSoC application: apertium hbs-eng, adopting a language pair == One of my majors is Linguistics, the other is English Language and Literature. Other than the simple fact that machine translation gives q
    6 KB (987 words) - 15:28, 16 May 2014
  • ...lation fascinates me. The core problem that translation of a text from one language to other can’t be solved by simple substitution of words, catches my inte I plan to “Adopt an unreleased language pair”, or to be precise, three language pairs: mar-hin, guj-hin, mar-guj. Mar-hin and guj-hin pairs are in incubato
    11 KB (1,617 words) - 11:06, 29 April 2017
  • ...ipedia:Indonesian language]]) is an Austronesian language and the official language of Indonesia. Since it is a register of [[Malay]], it is also often general In [[Apertium]], there is a language pair of [[Indonesian and Malaysian]] already in the [[Trunk|trunk category]
    5 KB (629 words) - 13:08, 21 December 2019
  • | width=320 | '''[[Apertium New Language Pair HOWTO]]''' | [[Become a language pair developer for Apertium]]
    13 KB (1,601 words) - 23:31, 23 July 2021
  • ...and everyone, and of course we don't have time, or inclination, to learn a language, just for a work, just for a e-mail answer … A machine translation become ...an. Indeed, we tell the computer to mimic the human in its own domain, the language.
    10 KB (1,635 words) - 09:42, 8 April 2011
  • ...re to be done and different potential ways to apply NLP techniques to help language learners). ...ne translation and NLP is that a rule based system can explain itself to a language learner (some statistical/ML approaches can learn rules - such hybrid syste
    2 KB (288 words) - 18:21, 22 August 2016
  • If you're working on language data, <code>sudo</code> is pretty much only for running package managers like <c ...exception is <code>sudo make install</code>, but when working on language data you should never have to do this.
    856 bytes (144 words) - 12:52, 3 May 2018
  • ..."State-of-the-art Morphological Analayser for Uzbek language and improved language pairs uz-kk, uz-ky, uz-tr". After discussions with mentors, the best path t ...rtium-tur-uzb) translation pair, Southeast European Times(SETimes) website data collection in Turkish was used(around 3.7M tokens).
    5 KB (722 words) - 16:16, 5 September 2020
  • The study of natural language processing is fascinating to me, and machine learning is a remarkably pract ...at the output of the system becomes intelligible, valid text in the target language.
    4 KB (575 words) - 10:03, 16 April 2017
  • ...s a computational linguist, it would be great to apply my knowledge in the language theory to machine translation.<br/> ...o useful: machine learning gets successful when we have access to tones of data, but that is not what we have dealing with dead or minority languages. That
    6 KB (925 words) - 16:09, 27 March 2018
  • ...e official web site – it serves only the ''released'' (stable) versions of language pairs ** This is the official "beta" site – it serves the latest work in all language pairs (so things may work better, but also may have weird bugs). You can al
    3 KB (457 words) - 07:42, 18 June 2021
  • ...ecifies the parameters and data files specific to that language pair. Each language pair can contain a number of modes; most of these are used for debugging ea ...b server. We use apertium-nn-nb as an example, but it should work with any language pair; the modules lt-proc/cg-proc/apertium-{tagger,pretransfer,transfer,int
    13 KB (2,039 words) - 11:56, 3 June 2022
  • ...ding period &mdash; and for documentation. Anyone thinking of working on a language pair should make sure that they read about [[testvoc]] and other quality co ...all]] Apertium and a language pair; read through the [[:Category:HOWTO|new language pair HOWTO]]. This might even give you some more ideas!
    9 KB (1,509 words) - 23:51, 27 February 2023
  • ...nasi. I love learning new languages, and I have a keen interest in Natural Language Processing and Linguistics. I have also contributed to Apertium previously ...to translation is very interesting and more interpretable compared to the data-hungry, uninterpretable black boxes that modern-day machine learning-based
    6 KB (918 words) - 06:00, 2 April 2024
  • ...ings of the 9th International Workshop on Finite State Methods and Natural Language Processing, pages 39--47.</ref> and understand what it does. ! Language
    34 KB (5,431 words) - 16:27, 29 October 2013
  • ...rs independent free-software developers. There are currently 40 published language pairs within the project (including a number of "firsts" — for example Sp ...ommunication) often occurs at this age, and if we can show them that their language is useful, and other people care, and there is no barrier for its use in th
    6 KB (987 words) - 10:21, 7 November 2014
  • ...textbook distinction in language, isn't it? When you start exploring real data the boundaries fade very fast and everything looks much more complicated.
    22 KB (2,150 words) - 20:21, 24 April 2013
  • ...statistical parser, which in turn can serve different purposes of natural language processing. For creating a good treebank, manual annotation and/or disambig ...interface allows to work with CoNLL-U and CG3 formats, and to convert the data between the formats. It also allows to either upload or paste corpora in pl
    6 KB (930 words) - 15:59, 29 August 2017
  • ...d of existing trained models. Successful tries are saved into new training data.<ref>https://static.googleusercontent.com/media/research.google.com/en//pub ...butions can also be found [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files-Contributions here].
    2 KB (305 words) - 14:36, 28 October 2018
  • ...rning engineer. My role was developing sentiment analysis model for Arabic language. ...urses, I had to use python/ R and Tableau to perform analysis on different data-sets.
    8 KB (1,258 words) - 15:30, 27 April 2020
  • * Converting another language pair | Complete, fully documented system with full ruleset for at least one language pair
    14 KB (2,141 words) - 21:26, 13 August 2019
  • ...er]] or [[CG]] files. It creates fully working Makefiles and stub language data, so you can compile and test straight away (assuming you've [[Installation|
    744 bytes (108 words) - 20:38, 13 January 2021
  • ...thub. What this actually means is that you can set an apertium language or language pair on github to automatically build and test on each commit. You only nee This is an example for a monolingual data using hfst (from [apertium-fin]):
    2 KB (249 words) - 06:26, 27 May 2021
  • File:Pet1.png
    ...h he/she is going to provide Input Text data and also needs to specify the language into which the Translation needs to be done. After that the user will enter the text data in the box located at the Left hand side of the page and clicks on the "Tra
    (1,280 × 800 (96 KB)) - 18:52, 2 April 2010
  • Apertium language data for Iraqi Turkmen. [[Category:Language data]]
    1 KB (144 words) - 20:07, 15 July 2021
  • ...temen kan maken. Het enige wat je zelf moet doen, is de data schrijven. De data bestaat uit 3 belangrijke delen, de woordenboeken, en enkele regels (woordv ...ems van de oorspronkelijke taal(source language='sl')of de doeltaal(target language='tl') kan kiezen en veranderen.
    36 KB (5,761 words) - 14:34, 4 December 2011
  • ...temen kan maken. Het enige wat je zelf moet doen, is de data schrijven. De data bestaat uit 3 belangrijke delen, de woordenboeken, en enkele regels (woordv ...ems van de oorspronkelijke taal(source language='sl')of de doeltaal(target language='tl') kan kiezen en veranderen.
    36 KB (5,767 words) - 07:07, 16 February 2015
  • ** Language pages [[French]], [[Spanish]], [[Nahuatl]], [[Dutch]] * Language pairs:
    2 KB (218 words) - 16:46, 9 December 2012
  • While training can be done directly in the language directory, it is a better idea to train the tagger with copies of the files ...e the training directory (replace <code>lang</code> with the corresponding language code).
    4 KB (651 words) - 13:36, 23 August 2017
  • {{Language Kashmiri is an Indo-Aryan language spoken in the Kashmir Valley and regions around it that were historically a
    6 KB (811 words) - 10:42, 2 July 2018
  • == Proposal: Bringing 4 language pairs up to release quality == ...stvoc and lexical selection that will result in a valid text in the target language.
    4 KB (614 words) - 13:00, 7 April 2019
  • ..., transfer rules, scripting, corpora. The objective is to make an Apertium language pair state-of-the-art, or close to state-of-the-art in terms of translation ...ge pair of your choice in Apertium and install it. (see [[Install language data by compiling]])
    2 KB (383 words) - 19:46, 2 March 2023
  • | 64 || Apertium-tolk should give proper warning when no linguistic data is installed || 2008-03-31 || Wynand Winte ...rg/cgi-bin/bugzilla/index.cgi here]. Please feel to report your bug in any language you are comfortable with.
    12 KB (1,254 words) - 22:08, 7 March 2018
  • | clip || - || N/A || part &rarr; value || Obtains the part in the only language there is (inter/post-chunk) and pushes the value onto the stack ...|| - || link-to || part, pos &rarr; value || Obtains the 'part' in source language in position 'pos' and pushes the 'value' onto the stack. An optional operan
    14 KB (2,020 words) - 13:58, 7 October 2014
  • ...ion is a very complex problem that depends on almost all fields of natural language processing. As such, it is a very "enabling" field, and can benefit from th ...ings of the 9th International Workshop on Finite State Methods and Natural Language Processing, pages 39--47.</ref>. However, the library currently used to par
    10 KB (1,561 words) - 15:22, 28 May 2013
  • install-data-local: Most language pairs have lines like
    3 KB (482 words) - 15:54, 24 March 2014
  • This page contains data for CLD2 coverage. If need help to obtain CLD2 coverage of a certain language, contact [[User:Wei2912]].
    75 KB (7,440 words) - 17:12, 8 August 2014
  • Where LANGUAGE_PAIR is language pair (e.g. en-eo) wget http://sunsite.unc.edu/pub/Linux/system/keyboards/console-data-1999.08.29.tar.gz
    2 KB (281 words) - 02:58, 9 March 2018
  • ** Select a language ** Use the Apertium morphological analyser to analyse the test data
    1 KB (213 words) - 21:13, 18 March 2019
  • ...s, data, and other system resources with applications, software tools, and data of the Unix-like environment. Therefore it is possible to launch Windows ap Now you're ready to download and build language pairs and use them under Cygwin's shell.
    12 KB (1,883 words) - 22:06, 7 March 2018
  • ...is it possible to achieve pretty good results having very small amount of data (like in case of Breton) ...ad of the original syntax module in kmr-eng pipeline. The testpack for two language pairs was built. All code was cleaned up, some docstrings were written. Als
    6 KB (833 words) - 12:56, 22 August 2017
  • * répertoire es-tagger-data : Contient les données nécessaires pour le tagger espagnol (corpus, etc.) * répertoire ca-tagger-data : Contient les données nécessaires pour le tagger catalan (corpus, etc.)
    54 KB (8,480 words) - 18:55, 10 April 2017
  • If you want to work on Apertium language pairs or tools, some knowledge of the Unix shell / command-line scripting w ...hell/ shell scripting] and [https://hacker-tools.github.io/data-wrangling/ Data wrangling] are relevant and succinct
    746 bytes (101 words) - 09:20, 8 February 2019
  • ** We can haz. Data is now checked in on Victorio at /langtech/trunk/words/dicts/algu, with a r ...ns Finnish and Northern Sámi. Ryan can contact them if it seems like their data would be of use.
    16 KB (2,457 words) - 08:19, 12 April 2017
  • .../presentation/d/1LBcBs3KdzfS7vl6Sxe0UtOMLpWNMM6ciGS_YPCnxTr0 Reading-bound data as inline secondary tags]", Tino Didriksen *** "Reading-bound data is best transported as inline secondary tags, proven both by practical expe
    3 KB (509 words) - 15:49, 2 July 2020
  • ...our language data directory (replacing "apertium-foo" for your monolingual data dir):
    725 bytes (111 words) - 09:24, 2 March 2016
  • tsv-file: past-tense-tests.tsv # read the test data from a tab-separated list ...as a test that can pass or fail) or in interactive mode (which updates the data to reflect the state of the translator).
    9 KB (1,402 words) - 16:40, 2 March 2021
  • By defaut, as for lttoolbox, apertium, and the language pairs, the installation is done in <code>/usr/local/bin</code> and <code>/u ...ium</code> command, there is the '''<code>-f</code>''' option to translate data produced in this format without having to call "by hand" a deformatter and
    5 KB (780 words) - 11:48, 15 June 2018
  • ...;13:00&nbsp; || &nbsp; '''Practical''': Installing Apertium and creating a language pair ....sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session7a.pdf Data consistency, quality] and [https://svn.code.sf.net/p/apertium/svn/branches/
    8 KB (720 words) - 15:18, 20 March 2015
  • # Most language pairs don't need to specify anything else for install-data-local: install-data-local: install-modes
    4 KB (612 words) - 13:09, 18 February 2015
  • ...m project develops a free/open-source platform for machine translation and language technology. We try to focus our efforts on lesser-resourced and marginalise ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but independent free-software developers also play a huge rol
    9 KB (1,376 words) - 15:24, 22 March 2013
  • ...ces for the involvement of Croatian researchers and developers in Apertium language pairs involving Croatian as part of the [http://cordis.europa.eu/projects/r ...for the more inclusive ISO-639-2 code hbs to be used to refer to it in all language pairs developed inside Apertium for components of this macrolanguage.
    6 KB (987 words) - 22:27, 3 August 2013
  • ...iterature which comes back to translating their literature to their native language, and this is where I have always liked to work.<br/> Machine translation is one of the most important fields of Natural Language Processing (NLP) and also employs almost all the fields of NLP. At the same
    11 KB (1,849 words) - 10:47, 26 August 2017
  • ...iterature which comes back to translating their literature to their native language, and this is where I have always liked to work.<br/> Machine translation is one of the most important fields of Natural Language Processing (NLP) and also employs almost all the fields of NLP. At the same
    11 KB (1,834 words) - 15:03, 2 April 2017
  • ** a language pair, ** the reference files for a language.
    8 KB (1,327 words) - 21:34, 17 February 2019
  • ...t plan on working on the core C++ packages (but only want to work on / use language pairs), you can install all prerequisites with apt-get, using [[User:Tino D # or, to get all dependencies for building a language from git:
    2 KB (311 words) - 21:05, 2 April 2021
  • <b>GSOC 2021: Create a usable version of these language pair: English--Igbo</b> ...you a solution in case if you are stuck in a particular issue. I love Igbo language so much that am willing to get involve or participate in anything that conc
    6 KB (826 words) - 15:41, 7 April 2021
  • ...nslation pairs as a service and provides '''translate''' and '''detect''' (language recognition) capabilities over an '''XML-RPC''' interface, as well as '''RE ...for discussion). It also manages a ''resource pool'' of e.g. language pair data, both (eagerly) pre-allocated and (lazily) allocated at need, up to a high
    13 KB (1,764 words) - 03:29, 6 November 2019
  • ...m project develops a free/open-source platform for machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalis ...ped around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also independent free-software developers play a huge rol
    3 KB (424 words) - 19:24, 29 October 2010
  • etc. instead of all those different commands, for the language pairs priviliged enough to have fancy makefiles. :sloppiness on my side. If all data+build script is available for the user, this kind of errors disappear. [[Us
    14 KB (2,149 words) - 16:12, 27 April 2017
  • If you are the unlucky owner of a language pair where you must maintain the synthetic adjective tag (<sint>) in the bi -prepare attempts to detect and insert autoconcord data into the monodices,
    7 KB (1,185 words) - 08:39, 6 October 2014
  • I found an incomplete task on Bengali-English language pair in Apertium. I also checked that it was a GSoC project of 2009. Among ...er, there has been a GSoC project back in 2009 on adopting Bengali-English language pair. But that project was not complete enough to release bn-en from Aperti
    9 KB (1,374 words) - 07:51, 9 April 2011
  • ...lation is trying to make a computer understand a, by definition ambiguous, language and the relation between different languages, therefore my interest in the ...and develop more complex things, and later port it to the C++ programming language.
    10 KB (1,650 words) - 11:41, 28 April 2011
  • !rowspan=3|System !!colspan=7|Language ...ives]/[words with a correct analysis from the morphological parser]). This data is also available in box plot form [https://frankier.github.io/apertium-tag
    16 KB (1,448 words) - 16:50, 22 August 2017
  • ...eChain'''). This requires the additional argument '''src''' for the source language of possible translation chains. The returned JS Object contains a mapping from language pairs to mode names (used internally by Apertium).
    6 KB (724 words) - 03:21, 6 January 2017
  • ...hi, Bagheli, Chhattisgarhi, Bombay Hindi. Due to so much of variation in a language, linguistics has always fascinated me. Upon combining this with my passion During my projects on Machine Learning, I came across Natuaral Language Processing, which opened the world of Computer Linguistics for me. While br
    10 KB (1,492 words) - 13:17, 9 April 2019
  • ...rossdics|crossdics]] package) to get cheap bilingual dictionaries from any language pair available in [http://www.omegawiki.org OmegaWiki] database. ...downloads/omegawiki-lexical.sql.gz download] the latest version of lexical data from the OmegaWiki database (see also [http://www.omegawiki.org/Help:Downlo
    2 KB (202 words) - 00:55, 24 January 2018
  • '''Courses I've Taken''': Data Structures, Algorithms, Object Oriented Programming, Maths (Calculus, Matri == GSoC 2020: Improving Malayalam - English language pair ==
    886 bytes (114 words) - 07:51, 24 March 2020
  • Different language-pair packages use different strategies to generate .dix dictionaries ([[mon ...t versions of a translator (for instance, for two different varieties of a language, such as Brazilian and European Portuguese) whose names could be ideally ti
    11 KB (1,733 words) - 08:24, 25 April 2016
  • * '''sl''': source language (for example, in morphological and bilingual dictionaries) * '''tl''': target language (for example, in bilingual dictionaries)
    8 KB (902 words) - 09:19, 6 October 2014
  • ...ictionary for languages A and C is built from dictionaries for A-B and B-C language pairs. (or some other Unicode language installed - I use eo.UTF-8) and run the tests again.
    8 KB (1,070 words) - 01:29, 26 October 2018
  • Choose a language pair. For this example, it will be Italian (it) and English (en). To use ...ing a few million lines of xml. It will refer frequently to s1 (the first language of the two in the filename jrc-lang1-lang2.xml, which is jrc-en-it.xml in t
    7 KB (973 words) - 02:52, 20 May 2021
  • This is a test of all .t1x files in all language pairs in http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/. There Please find your favorite language pair below and check.
    67 KB (9,057 words) - 06:52, 24 September 2013
  • ...nalysis for multiple Nakh-Daghestanian languages and develop corresponding language pairs. It covers all the parts of speech present in the language.<br/>
    3 KB (452 words) - 09:17, 18 August 2021
  • A corpus should be easily parsed by software that needs to get data from it. There is also metadata that should be stored in the corpus, and t * language of content (per article)!
    5 KB (813 words) - 00:08, 28 December 2011
  • * Native Language: Arabic * Second Language: English
    4 KB (512 words) - 15:34, 29 March 2023
  • The language pair seems to work OK in the sh→sl sense but not so well in sl→sh (appa Improving this language pair would be nice for the first milestone of the project Abu-MaTran (June
    2 KB (380 words) - 22:26, 3 August 2013
  • ** Learn shift/reduce using target-language information ? *: If a language uses CG, the rule SN -> @A→ @N would only match where CG mapped @A→ (an
    5 KB (788 words) - 10:50, 9 February 2015
  • ...se dictionaries where each node is a word in a language, and each arc is a language pair. For example like: http://i.imgur.com/SFOsRMv.png * Only one word per input language
    3 KB (487 words) - 00:02, 22 March 2018
  • == New Language or Pair Package == Import, push new branch data, push new upstream tag:
    8 KB (1,106 words) - 19:51, 26 April 2018
  • Bonus: use closely related language treebanks in UDPipe; transfer the lemmas, assume the POS tags remain the sa '''Week 6:''' stealing Apertium data
    4 KB (657 words) - 08:58, 3 April 2017
  • The purpose of this project is to allow Apertium language-pair developers to better translate "seperable" or "discontiguous" multiwor * (for language developers: have the language-data writer write it explicitly in the .lsx file)
    1 KB (205 words) - 18:36, 15 November 2017
  • ...an, Portuguese, there is support for generating a particular standard of a language (e.g. Brazilian Portuguese, Valencian). The way this is done may need to be * Language specific sections of monodix files.
    3 KB (461 words) - 15:31, 26 September 2016
  • * you get to decide what kinds of crazy half-finished language pairs to serve (or you can just serve a few of the high-quality ones that y Now install APY and the language pairs you want:
    5 KB (653 words) - 21:00, 2 April 2021
  • ...ary is to model the rules that govern the internal structure of words in a language. ...o begin with, some terminology; if you are familiar with graphs (as in the data structure), this might help. A finite-state automaton can be visualised as
    15 KB (2,200 words) - 12:04, 6 October 2014
  • * you get to decide what kinds of crazy half-finished language pairs to serve (or you can just serve a few of the high-quality ones that y Now install APY and the language pairs you want:
    5 KB (640 words) - 21:02, 2 April 2021
  • ...ter], which has been trained for more than 100 languages using web crawled data. Details are in his paper linked below. You can try the system [http://l ...issue is to optimize smoothing of the statistical models on a language-by-language basis.
    2 KB (307 words) - 19:50, 24 March 2020
  • ...article/download/3355/1843 . I ([[User:Mlforcada|Mlforcada]]) believe this language is much easier to write; it should be upgraded and documented. The preproce [[Category:Ideas for Google Summer of Code|Plain-text formats for Apertium data]]
    2 KB (324 words) - 11:37, 16 February 2016
  • ...nsfer and lexical selection that will result in a valid text in the target language. Data for machine-learned disambiguation.
    3 KB (415 words) - 20:28, 25 March 2018
  • ...he parts about lttoolbox/apertium, just install the language pair/language data itself if you ran [https://apertium.projectjj.com/osx/install-release.sh in
    2 KB (355 words) - 19:36, 12 May 2019
  • ...modular, documented, open platform for machine translation and other human language processing tasks</li> <li>To favour the interchange and reuse of existing linguistic data.</li>
    9 KB (1,508 words) - 21:40, 22 March 2020
  • ...ei2912/WiktionaryCrawler is a crawler for Wiktionary which aims to extract data from pages. It was created for a GCI task which you can read about at [[Tas ...ies, then crawls these subcategories for pages. It then passes the page to language-specific parsers which turn it into the [[Speling format]].
    2 KB (380 words) - 08:13, 29 May 2021
  • ...f either a main verb / auxiliary or an adjective+copula. In transfer to a language like English, Spanish, French, or Turkish, the person of this possessed for * To what extent is it possible and desirable to put parts of this data in the monolingual repositories?
    11 KB (1,582 words) - 20:16, 9 May 2019
  • The Java port needs the C++ binaries for preparing/developing a language pair, i.a. to compile transfer files and train the tagger. ...ed Apertium JAR file, only dependent on JRE and an additional JAR file per language pair.
    9 KB (1,370 words) - 09:49, 7 April 2020
  • ...ioni possibili) e assicura che tutte abbiano un'equivalente nella ''Target Language''. Il risultato migliore sarebbe che non ci sia nessun errore nel Testvoc. ...Testvoc riguardano il verbo "stare". Non crediamo che siano errori "reali" data l'impossibilità nel riprodurli.
    13 KB (1,910 words) - 11:34, 23 August 2016
  • ...nsfer and lexical selection that will result in a valid text in the target language. Data for machine-learned disambiguation.
    2 KB (279 words) - 21:05, 8 April 2019
  • ...[How to bootstrap a new pair]]. For existing pairs, see [[Install language data by compiling]],
    717 bytes (103 words) - 22:05, 7 March 2018
  • ...e ('lt-toolbox', 'apertium-lex-tools') is a collection of tools which pipe data one to another. You can use these tools individually. There are many instru However, to ease the use of the tools, Apertium language-builds pre-configure chains of tools into scripts. These pre-configured cha
    6 KB (992 words) - 17:25, 22 September 2016
  • It requires internet permission to enable users to download language pairs (and developers to showcase their work from a phone). * language detection - for example using https://code.google.com/p/language-detection/
    3 KB (449 words) - 01:06, 4 June 2020
  • ...elic]] machine translation, which was taken from an ad hoc system for this language pair that I created in 2005. Since then, I've developed a more mature [htt More about me: for about 15 years I have been working on developing language technology for under-resourced languages around the world. I've developed
    1 KB (202 words) - 22:22, 19 January 2017
  • ...statistical parser, which in turn can serve different purposes of natural language processing. For creating a good treebank, manual annotation and/or disambig ...ling to be involved in it. So, Apertium will be more likely to receive the data for its needs as a side product than by trying to get people to doing Apert
    9 KB (1,483 words) - 22:04, 2 April 2017
  • ...lect Corpus and Lexicon. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018.] :* A Parallel corpus of arz-ara-apc/ajp (2,994 sentences). The data was manually translated by professional translators. Sentences are collecte
    2 KB (192 words) - 11:11, 19 January 2022
  • ...modular, documented, open platform for machine translation and other human language processing tasks ## To favour the interchange and reuse of existing linguistic data.
    7 KB (1,064 words) - 13:52, 24 February 2010
  • "Bootstrapped new language pair(odi.eng) with existing eng monodix." – the language code is "ori", no? https://en.wikipedia.org/wiki/ISO_639:ori ...with English: We're normally skeptical about it, since English has so much data that corpus-based methods work very well and it's very difficult to get hig
    810 bytes (131 words) - 17:31, 5 April 2017
  • ...stem. Currently the transfer system becomes the main bottleneck in case of language pair with complex transfer systems because of the XML processing associated ...he very moment the user inserts or deletes text. This allows for a further data mining on the edits to detect commonly modified structures in a given trans
    16 KB (2,571 words) - 12:21, 20 June 2019
  • The lack of documentation regarding the language pair, the monolingual dictionaries or even the tagger has made me put an ef ...r to create wikitables with a lot of information about transfer rules from data embedded into the rule files (T1X, T2X and T3X). There are other scripts th
    5 KB (887 words) - 22:24, 31 August 2017
  • ...modular, documented, open platform for machine translation and other human language processing tasks</li> <li>To favour the interchange and reuse of existing linguistic data.</li>
    8 KB (1,214 words) - 22:30, 3 August 2013
  • A monorepo with all the lingustic data, pairs and language modules. Other folders in SVN like the core engine and peripheral tools (e. Individual repos for each pair, language module and tools. A couple of “meta-repos” that contain submodules poin
    6 KB (979 words) - 17:55, 1 February 2018
  • ...hough eu is badly configured). If you want to use a pair that has a common language, like es-pt, only the pt configuration file will be required. Follow the same procedure with the other language (5), provide a bidirectional dix (6), and press the upload button.
    9 KB (1,410 words) - 13:52, 22 December 2015
  • ...modular, documented, open platform for machine translation and other human language processing tasks</li> <li>To favour the interchange and reuse of existing linguistic data.</li>
    8 KB (1,215 words) - 18:14, 3 March 2018
  • ! Language ..., the package 'spa' shown [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files here<sup>3</sup>], to be able to identify by the app texts in Spanish
    3 KB (450 words) - 16:23, 10 December 2018
  • ...modular, documented, open platform for machine translation and other human language processing tasks</li> <li>To favour the interchange and reuse of existing linguistic data.</li>
    9 KB (1,356 words) - 18:34, 3 March 2018
  • Apertium has migrated all the language data, the core, and a few tools to [https://github.com/apertium GitHub]. Many to
    1 KB (215 words) - 04:45, 9 March 2018
  • Many language pairs in Apertium are unique, such as Breton-French, and many of them are u * Contact [User:mlforcada Mikel L. Forcada] to obtain the data cited in the paper.
    2 KB (238 words) - 19:49, 24 March 2020
  • ...ry much, but one of my projects is the LDC's [http://lrwiki.ldc.upenn.edu/ Language Resource Wiki]. If you want to contact me directly, my username is mamandel
    439 bytes (69 words) - 20:49, 27 March 2010
  • I am interested in Natural Language Processing, Deep Learning, and Applied Math. I am fascinated by computation ...ng this journey, I have self-learned a lot about linguistics. My algorithm/data structures journey too revolved around an olympiad, i.e. the International
    2 KB (300 words) - 22:46, 31 March 2020
  • ...to have a 'Resources' page/section for Turkic languages, as it is done on language pages. http://starling.rinet.ru/cgi-bin/bdescr.cgi?root=config&morpho=0&basename=\data\alt\turcet
    9 KB (1,248 words) - 23:52, 20 June 2016
  • I am interested in Natural Language Processing, Deep Learning, and Applied Math. I am fascinated by computation ...ng this journey, I have self-learned a lot about linguistics. My algorithm/data structures journey too revolved around an olympiad, i.e. the International
    2 KB (300 words) - 22:53, 31 March 2020
  • * All the language data files:
    972 bytes (144 words) - 12:09, 26 September 2016
  • If you are editing Apertium language data (e.g. [[dix]] and [[transfer]] files), you should use a real XML editor. Th
    5 KB (783 words) - 14:25, 29 December 2020
  • == Compiling the language pair == If you don't need to work on monolingual data use the nightly repos:
    1 KB (163 words) - 16:53, 28 May 2017
  • If you want to work on Apertium language data and/or tools, you most likely want to use the binaries from Tino Didriksens
    2 KB (279 words) - 20:52, 2 April 2021
  • ...air setup nowadays is using transducers from Giellatekno and pair-specific data in Apertium. This is a tricky set up because there is a lot of machinery ar
    6 KB (984 words) - 17:56, 12 March 2016
  • ...şmak'' istiyorsanız [[Minimal installation from SVN|check out the language data from SVN]] sayfasını okumalı ve derlemelisiniz ( Hala apertium/lttoolbox
    2 KB (313 words) - 21:02, 2 April 2021
  • * Prefer containers over home made data structures. It's going to make it impossible to build for language pair authors.
    5 KB (823 words) - 15:40, 26 September 2016
  • * Maintain consistency in the data present in the <r> tag in pardef entries. ...h modes.xml present in the same directory as the other files for the given language pair, this function checks and prompts incase a file defined in a program.
    9 KB (1,459 words) - 19:41, 15 May 2021
  • ...ima leme su povezane s paradigmama koje nam dozvoljavaju da opišemo kako se data reč menja bez pisanja svakog pojedinačnog nastavka. ...to see. In Serbo-Croatian this is videti. Serbo-Croatian is a null-subject language, this means that it doesn't typically use personal pronouns before the conj
    26 KB (4,259 words) - 07:00, 16 February 2015
  • ...held back validation scripts for a few languages & give them reproducible language models ...Note that averaged here refers to averaging over time so that new training data isn’t given too much weight.
    2 KB (254 words) - 15:18, 13 June 2016
  • * Collect data in both languages * Bootstrapping a new language pair apertium-kaz-uzb
    10 KB (1,179 words) - 11:51, 31 August 2021
  • Metadixes are currently used in some language pairs, such as English-Catalan and Occitan-Catalan. linguistic data are compiled these dictionaries are pre-processed, so
    5 KB (744 words) - 08:26, 25 April 2016
  • Then, ''after'' reordering (for instance, into a Turkic-style language) to generate ''sister my Wales in lives'', ** I disagree. One of the key aspects of "my way" is that non-textual data between block tags are ''not'' sent through the translation chain at all, m
    9 KB (1,486 words) - 19:56, 24 March 2020
  • '''Kymorph''' is a morphological analyser/generator for the [[Kyrgyz language]], currently working. It is intended to be compatible with transducers for # Get CG3 format of conllu data
    1 KB (218 words) - 14:51, 24 April 2024
  • Other deformatters and reformatters were written directly in C or C++ language without using XML files. So, they don't follow format specification descri ...ated from a format specification in XML. Rules for format, like linguistic data, are specified in XML, and they contain regular expressions with flex synta
    13 KB (1,781 words) - 09:49, 6 October 2014
  • '''Corpora and language data'''
    4 KB (570 words) - 18:43, 23 August 2016
  • ...dictionaries and transfer rules. The induction systems and open linguistic data can be used with the [[Apertium]] toolbox to build open-source MT systems. ...bes how to use ReTraTos to create a bilingual dictionary for your Apertium language pair. You will need:
    8 KB (1,273 words) - 09:32, 3 May 2024
  • ...m_New_Language_Pair_HOWTO]] – using lt-comp, lt-proc etc. to test language data
    443 bytes (64 words) - 16:56, 27 April 2017
  • ...lus, Statistics and Probability, Linear Algebra), Algorithms and Analysis, Data Structures, OOPs. ...chnical Knowledge''': Python, C/C++, JavaScript, Machine Learning, Natural Language Processing, Deep Learning.
    1 KB (168 words) - 15:41, 24 March 2020

View (previous 250 | next 250) (20 | 50 | 100 | 250 | 500)