Search results

UD annotatrix/UD annotatrix at GSoC 2017
...statistical parser, which in turn can serve different purposes of natural language processing. For creating a good treebank, manual annotation and/or disambig ...interface allows to work with CoNLL-U and CG3 formats, and to convert the data between the formats. It also allows to either upload or paste corpora in pl

6 KB (930 words) - 15:59, 29 August 2017
Integrating Tesseract OCR into Apertium
...d of existing trained models. Successful tries are saved into new training data.<ref>https://static.googleusercontent.com/media/research.google.com/en//pub ...butions can also be found [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files-Contributions here].

2 KB (305 words) - 14:36, 28 October 2018
User:AMR-KELEG/GSoC19 Proposal
...rning engineer. My role was developing sentiment analysis model for Arabic language. ...urses, I had to use python/ R and Tableau to perform analysis on different data-sets.

8 KB (1,258 words) - 15:30, 27 April 2020
Ideas for Google Summer of Code/Make a language pair state-of-the-art
..., transfer rules, scripting, corpora. The objective is to make an Apertium language pair state-of-the-art, or close to state-of-the-art in terms of translation ...ge pair of your choice in Apertium and install it. (see [[Install language data by compiling]])

2 KB (383 words) - 19:46, 2 March 2023
Bugzilla
| 64 || Apertium-tolk should give proper warning when no linguistic data is installed || 2008-03-31 || Wynand Winte ...rg/cgi-bin/bugzilla/index.cgi here]. Please feel to report your bug in any language you are comfortable with.

12 KB (1,254 words) - 22:08, 7 March 2018
VM for transfer
| clip || - || N/A || part → value || Obtains the part in the only language there is (inter/post-chunk) and pushes the value onto the stack ...|| - || link-to || part, pos → value || Obtains the 'part' in source language in position 'pos' and pushes the 'value' onto the stack. An optional operan

14 KB (2,020 words) - 13:58, 7 October 2014
User:David Nemeskey/GSOC proposal 2013
...ion is a very complex problem that depends on almost all fields of natural language processing. As such, it is a very "enabling" field, and can benefit from th ...ings of the 9th International Workshop on Finite State Methods and Natural Language Processing, pages 39--47.</ref>. However, the library currently used to par

10 KB (1,561 words) - 15:22, 28 May 2013
Perceptron tagger
While training can be done directly in the language directory, it is a better idea to train the tagger with copies of the files ...e the training directory (replace <code>lang</code> with the corresponding language code).

4 KB (651 words) - 13:36, 23 August 2017
Kashmiri
{{Language Kashmiri is an Indo-Aryan language spoken in the Kashmir Valley and regions around it that were historically a

6 KB (811 words) - 10:42, 2 July 2018
User:Oğuz/GSoC 2019
== Proposal: Bringing 4 language pairs up to release quality == ...stvoc and lexical selection that will result in a valid text in the target language.

4 KB (614 words) - 13:00, 7 April 2019
Ideas for Google Summer of Code/Unsupervised weighting of automata
** Select a language ** Use the Apertium morphological analyser to analyse the test data

1 KB (213 words) - 21:13, 18 March 2019
Apertium on Windows
...s, data, and other system resources with applications, software tools, and data of the Unix-like environment. Therefore it is possible to launch Windows ap Now you're ready to download and build language pairs and use them under Cygwin's shell.

12 KB (1,883 words) - 22:06, 7 March 2018
Shallow syntactic function labeller/Workplan
...is it possible to achieve pretty good results having very small amount of data (like in case of Breton) ...ad of the original syntax module in kmr-eng pipeline. The testpack for two language pairs was built. All code was cleaned up, some docstrings were written. Als

6 KB (833 words) - 12:56, 22 August 2017
Comment contribuer à une paire de langues existante
* répertoire es-tagger-data : Contient les données nécessaires pour le tagger espagnol (corpus, etc.) * répertoire ca-tagger-data : Contient les données nécessaires pour le tagger catalan (corpus, etc.)

54 KB (8,480 words) - 18:55, 10 April 2017
Shell scripting
If you want to work on Apertium language pairs or tools, some knowledge of the Unix shell / command-line scripting w ...hell/ shell scripting] and [https://hacker-tools.github.io/data-wrangling/ Data wrangling] are relevant and succinct

746 bytes (101 words) - 09:20, 8 February 2019
North Saami and Finnish
** We can haz. Data is now checked in on Victorio at /langtech/trunk/words/dicts/algu, with a r ...ns Finnish and Northern Sámi. Ryan can contact them if it seems like their data would be of use.

16 KB (2,457 words) - 08:19, 12 April 2017
Online Apertium Workshop 2020
.../presentation/d/1LBcBs3KdzfS7vl6Sxe0UtOMLpWNMM6ciGS_YPCnxTr0 Reading-bound data as inline secondary tags]", Tino Didriksen *** "Reading-bound data is best transported as inline secondary tags, proven both by practical expe

3 KB (509 words) - 15:49, 2 July 2020
Trigger build on file save
...our language data directory (replacing "apertium-foo" for your monolingual data dir):

725 bytes (111 words) - 09:24, 2 March 2016
User:Popcorndude/Unit-Testing
tsv-file: past-tense-tests.tsv # read the test data from a tab-separated list ...as a test that can pass or fail) or in interactive mode (which updates the data to reflect the state of the translator).

9 KB (1,402 words) - 16:40, 2 March 2021
Translating man pages
By defaut, as for lttoolbox, apertium, and the language pairs, the installation is done in <code>/usr/local/bin</code> and <code>/u ...ium</code> command, there is the '''<code>-f</code>''' option to translate data produced in this format without having to call "by hand" a deformatter and

5 KB (780 words) - 11:48, 15 June 2018

Search results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools