Search results

Northern Sámi and Norwegian/Regression tests
===Unknown===

38 KB (6,273 words) - 11:01, 24 December 2020
Spanish and Esperanto/Notoj pri versioj
Number of unknown words (marked with a star) in test: 117 Percentage of unknown words: 3,87 % 

6 KB (845 words) - 20:08, 3 October 2011
Compounds
Both [[lttoolbox]] and [[HFST]] have methods for dynamically analysing unknown compound words into their constituent parts. See below for how it's done in ..., and only do compounding if the other methods would give an unknown word. Unknown words are made up of strings of characters from <alphabet>, separated

16 KB (2,689 words) - 09:07, 6 April 2021
Spanish and Esperanto/Quality tests
Number of unknown words (marked with a star) in test: 117 Percentage of unknown words: 3,87 % 

98 KB (16,331 words) - 20:28, 30 September 2011
Evaluation
Note: Reference translation MUST have no unknown-word marks, even if systems that do not mark unknown words with a star.

6 KB (981 words) - 09:13, 21 November 2021
Starting a new language with lttoolbox
hsb.dix:25: element s: validity error : IDREF attribute n references an unknown ID "nom" hsb.dix:33: element s: validity error : IDREF attribute n references an unknown ID "nom"

19 KB (3,440 words) - 12:10, 26 September 2016
Matching unknown words
...interchunk, this is quite easy, as each unknown word has the chunk lemma 'unknown', but it's un- or under-documented how this should be done using apertium-t

2 KB (209 words) - 11:06, 24 March 2012
Как использовать lttoolbox, чтобы разработать новый морфологический анализатор
hsb.dix:25: element s: validity error : IDREF attribute n references an unknown ID "nom" hsb.dix:33: element s: validity error : IDREF attribute n references an unknown ID "nom"

25 KB (2,260 words) - 18:36, 12 January 2012
English and Catalan/Transfer Rules
|unknown |REGLA: unknown

45 KB (7,840 words) - 10:56, 18 September 2017
Ideas for Google Summer of Code/Detect hidden unknown words
'''Detect hidden unknown words by using the probabilities of the HMM-based part-of-speech tagger in ...orms for which there exists at least one lexical form cannot be considered unknown and there is no way to know whether the set of possible lexical forms provi

2 KB (277 words) - 19:51, 24 March 2020
Task ideas for Google Code-in/Add words to monolingual dictionary
...m Wikipedia, newspapers, literature, etc.) '''detect the 250 most frequent unknown words''' (words in the source document which are not in the dictionary). S ...opriate <code>.dix</code> or <code>.lexc</code> file) so that they are not unknown anymore. Make sure to categorise stems correctly (this can be hard, so ple

2 KB (299 words) - 19:44, 30 December 2019
Task ideas for Google Code-in/Grow bilingual
...m Wikipedia, newspapers, literature, etc.) '''detect the 200 most frequent unknown words''' (words in the source document which are not in the bilingual dicti ...ropriate <code>.dix</code> file) in [[bidix]] format (so that they are not unknown anymore), as well as the monolingual analysers if needed. Make sure to cat

2 KB (320 words) - 15:01, 19 January 2020
Apertium et les contraintes grammaticales (vislcg3)
<pre>LIST unknown = ("\\*.*"r) ; </pre> <pre>SELECT proper-name IF (1 unknown);</pre>

8 KB (1,211 words) - 23:02, 4 April 2021
Курсы машинного перевода для языков России/Session 8
...high, and compares with commercial systems -- over 95% coverage (around 5 unknown words out of 100 words), and between 3-7% word-error rate (out of 100 words ...final coverage of the system was around 90%, e.g. over a set of corpora 10 unknown words out of 100 on average. The word-error rate was around 17%, meaning th

12 KB (1,679 words) - 12:00, 31 January 2012
Publications
...in extending dictionaries by assigning stems and inflectional paradigms to unknown words] (pp.19-26.). EAMT 2014 – 17th Annual conference of the European As ...sites/default/files/FreeRBMT-2012.pdf#33 Choosing the correct paradigm for unknown words in rule-based machine translation systems]. Third International Works

33 KB (4,418 words) - 11:52, 29 December 2021
Bosnian-Croatian-Montenegrin-Serbian and Slovenian
echo unknown; coverage = 1 - unknown / total

6 KB (625 words) - 16:54, 1 July 2013
Improved corpus-based paradigm matching
...orpus, using your existing analyser, and tagger to give possible values to unknown words <spectie> you would assign possible values for case/number/gender to the unknown surface forms of *fizikalne and *matematične based

4 KB (611 words) - 14:26, 10 February 2015
Helsinki Apertium Workshop/Session 8
...high, and compares with commercial systems -- over 95% coverage (around 5 unknown words out of 100 words), and between 3-7% word-error rate (out of 100 words ...final coverage of the system was around 90%, e.g. over a set of corpora 10 unknown words out of 100 on average. The word-error rate was around 17%, meaning th

12 KB (1,683 words) - 08:42, 10 May 2013
Task ideas for Google Code-in/Add words
...from Wikipedia, newspapers, literature, etc.) detect the 50 most frequent unknown words (source words which are not in the dictionaries of the language pair # add these words to the source dictionary (so that they are not unknown anymore), add the correspondence to the bilingual dictionary, and add the w

2 KB (271 words) - 05:34, 17 December 2015
Tatar and Russian
...rus-nova.txt, -ref = tat-rus-posted.txt). WER / PER results are given when unknown-word marks (stars) are not removed. ...if it's usable by that time. || '''Midterm evaluation''' Results when unknown word-marks (stars) are not removed tat-rus/texts/text1.* (full coverage

8 KB (1,006 words) - 12:48, 9 March 2018

Search results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools