Search results

Lttoolbox-java
-e: morphological analysis, with compound analysis on unknown words -n: morph. generation without unknown word marks

9 KB (1,370 words) - 09:49, 7 April 2020
Tartu Apertium Course/Session 8
...high, and compares with commercial systems -- over 95% coverage (around 5 unknown words out of 100 words), and between 3-7% word-error rate (out of 100 words ...final coverage of the system was around 90%, e.g. over a set of corpora 10 unknown words out of 100 on average. The word-error rate was around 17%, meaning th

12 KB (1,683 words) - 11:00, 30 October 2015
Automated extraction of lexical resources
...verb conjugations, declensions, etc. More generically, upon finding a new unknown word, we can productively generate all its inflections according to every p ...ome constraints, which we can use in order to gather information about an unknown word. More generically, we can gather information about a word knowing whic

6 KB (928 words) - 13:57, 3 April 2009
Matxin New Language Pair HOWTO
...smi="v|tv|fut|p1|sg" si="root" UpCase="none" lem="iç" mi="v|tv|fut|p1|sg" unknown="transfer"> ...="0" slem="bira" smi="n|acc" si="dobj" UpCase="none" lem="bira" mi="n|acc" unknown="transfer">

53 KB (8,811 words) - 04:05, 21 January 2017
Norwegian Nynorsk and Norwegian Bokmål/arkiv
...analysis. This could happen by changing lt-proc (fst_processor.cc) so that unknown words are sent to a decompounding-function that tries various strategies (l * If the first member is unknown, choose the analysis with the longest last member.

13 KB (2,051 words) - 10:24, 22 September 2010
English and Catalan/Transfer Rules
|unknown |REGLA: unknown

45 KB (7,840 words) - 10:56, 18 September 2017
Ideas for Google Summer of Code/Detect hidden unknown words
'''Detect hidden unknown words by using the probabilities of the HMM-based part-of-speech tagger in ...orms for which there exists at least one lexical form cannot be considered unknown and there is no way to know whether the set of possible lexical forms provi

2 KB (277 words) - 19:51, 24 March 2020
Task ideas for Google Code-in/Add words to monolingual dictionary
...m Wikipedia, newspapers, literature, etc.) '''detect the 250 most frequent unknown words''' (words in the source document which are not in the dictionary). S ...opriate <code>.dix</code> or <code>.lexc</code> file) so that they are not unknown anymore. Make sure to categorise stems correctly (this can be hard, so ple

2 KB (299 words) - 19:44, 30 December 2019
Task ideas for Google Code-in/Grow bilingual
...m Wikipedia, newspapers, literature, etc.) '''detect the 200 most frequent unknown words''' (words in the source document which are not in the bilingual dicti ...ropriate <code>.dix</code> file) in [[bidix]] format (so that they are not unknown anymore), as well as the monolingual analysers if needed. Make sure to cat

2 KB (320 words) - 15:01, 19 January 2020
Apertium et les contraintes grammaticales (vislcg3)
<pre>LIST unknown = ("\\*.*"r) ; </pre> <pre>SELECT proper-name IF (1 unknown);</pre>

8 KB (1,211 words) - 23:02, 4 April 2021
Курсы машинного перевода для языков России/Session 8
...high, and compares with commercial systems -- over 95% coverage (around 5 unknown words out of 100 words), and between 3-7% word-error rate (out of 100 words ...final coverage of the system was around 90%, e.g. over a set of corpora 10 unknown words out of 100 on average. The word-error rate was around 17%, meaning th

12 KB (1,679 words) - 12:00, 31 January 2012
Publications
...in extending dictionaries by assigning stems and inflectional paradigms to unknown words] (pp.19-26.). EAMT 2014 – 17th Annual conference of the European As ...sites/default/files/FreeRBMT-2012.pdf#33 Choosing the correct paradigm for unknown words in rule-based machine translation systems]. Third International Works

33 KB (4,418 words) - 11:52, 29 December 2021
Bosnian-Croatian-Montenegrin-Serbian and Slovenian
echo unknown; coverage = 1 - unknown / total

6 KB (625 words) - 16:54, 1 July 2013
Improved corpus-based paradigm matching
...orpus, using your existing analyser, and tagger to give possible values to unknown words <spectie> you would assign possible values for case/number/gender to the unknown surface forms of *fizikalne and *matematične based

4 KB (611 words) - 14:26, 10 February 2015
Helsinki Apertium Workshop/Session 8
...high, and compares with commercial systems -- over 95% coverage (around 5 unknown words out of 100 words), and between 3-7% word-error rate (out of 100 words ...final coverage of the system was around 90%, e.g. over a set of corpora 10 unknown words out of 100 on average. The word-error rate was around 17%, meaning th

12 KB (1,683 words) - 08:42, 10 May 2013
Ideas for Google Summer of Code/Sliding-window part-of-speech tagger
...ent for the current hidden-Markov-model tagger. It should have support for unknown words, and also for "forbid" descriptions (not described in the paper). The

2 KB (251 words) - 00:37, 6 April 2013
LTTB1059
ERROR LTTB1059: Transducer has features that are unknown to this version of lttoolbox - upgrade!

122 bytes (18 words) - 23:43, 25 August 2023
Monodix basics
...f>See [[Alphabet]] for how the alphabet affects blanks and tokenisation of unknown words.</ref> it will look something like:

11 KB (1,851 words) - 07:42, 16 February 2015
Task ideas for Google Code-in (2013)
...input string and for each word returns whether the word is correct, and if unknown returns suggestions. Whether segmentation is done by the client or by aper

68 KB (10,323 words) - 15:37, 25 October 2014
Ideas for Google Summer of Code
...specified in the alphabet, it is dealt with as whitespace, so e.g. you get unknown words split in two so you can end up with stuff like ^G$ö^k$ı^rmak$ which

23 KB (3,198 words) - 09:15, 4 March 2024

Search results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools