Search results

Jump to navigation Jump to search

Page title matches

Ideas for Google Summer of Code/Detect hidden unknown words
'''Detect hidden unknown words by using the probabilities of the HMM-based part-of-speech tagger in ...orms for which there exists at least one lexical form cannot be considered unknown and there is no way to know whether the set of possible lexical forms provi

2 KB (277 words) - 19:51, 24 March 2020
Matching unknown words
...interchunk, this is quite easy, as each unknown word has the chunk lemma 'unknown', but it's un- or under-documented how this should be done using apertium-t

2 KB (209 words) - 11:06, 24 March 2012

Page text matches

Apertium-recursive/Example
<li>^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$</li></ol> <li>^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$</li></ol>

33 KB (6,259 words) - 06:07, 1 June 2023
Turkic MT Improvements GSoC2019 report
...the project, with the help of mentors on Kipchak languages. Most frequent unknown tokens from corpora of each language (mostly consisting of Wikipedia entrie Number of unknown words (marked with a star) in test: 124

7 KB (798 words) - 18:30, 26 August 2019
Hectoralos/GSOC 2020 work plan control
Number of unknown words (marked with a star) in test: 78<br> Percentage of unknown words: 6.61 %<br>

17 KB (2,274 words) - 06:14, 27 August 2020
Google Summer of Code/Midterm report 2011
Number of unknown words (marked with a star) in test: 4 Percentage of unknown words: 1.10 %

4 KB (404 words) - 00:19, 3 July 2012
Uighur and Turkish/GSoC2018 report
Here is the WER result before I added the unknown words/wrote some CG rules for the text: Number of unknown words (marked with a star) in test:

5 KB (607 words) - 13:25, 12 August 2018
English and Esperanto/Evaluation
Number of unknown words (marked with a star) in test: 23 Percentage of unknown words: 7,10 %

39 KB (5,922 words) - 07:37, 20 March 2014
Kazakh and Tatar/Work plan
Number of unknown words (marked with a star) in test: 2 Percentage of unknown words: 0.37 %

6 KB (728 words) - 19:47, 8 May 2014
English and Italian/Google Translate
Number of unknown words (marked with a star) in test: Percentage of unknown words: 0.00 %

4 KB (563 words) - 16:24, 21 March 2014
Installation troubleshooting
.../usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/crtendS.o /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/../../../../lib/crtn.o -O3 -mtune=nocona -Wl,-soname -Wl g++: error: /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/../../../../lib/crti.o: No such file or directory

20 KB (3,153 words) - 08:13, 24 May 2019
Translation quality statistics
...orpus on which further translations will be made. Evaluations not allowing unknown words will give a better indication of "best-case" working of transfer rule ! Translator !! Date !! Version !! Direction !! Unknown<br/>words !! data-sort-type="number"|WER !! data-sort-type="number"|PWER !!

9 KB (1,233 words) - 09:10, 21 November 2021
Calculating coverage
unknown=$(grep -c '/\*' $outfile) known_percent=$(calc -p "round( 100*($total-$unknown-$bidix_unknown)/$total, 3)")

4 KB (583 words) - 15:18, 10 January 2022
Named entity recognition
...r rules that work on <np>-tagged words do not apply when the word is unknown. Another is that proper nouns can be ambiguous with other, known words, and ==Unknown proper nouns in transfer==

3 KB (492 words) - 16:52, 10 March 2018
English and Kazakh
...known noun2, prep det unknown adjec noun, prep det unknown noun, sup-adjec unknown nom. ***Another possibility (Aida): detect unknown capitalized words (possible?). We tried with regular expressions but they

20 KB (2,856 words) - 06:26, 27 May 2021
Курсы машинного перевода для языков России/Session 7
...line coverage for releasing a new prototype translator is around 80%, or 2 unknown words in 10 for a given corpus. This is not enough to make revision practic Number of unknown words (marked with a star) in test:

18 KB (2,490 words) - 12:00, 31 January 2012
Helsinki Apertium Workshop/Session 7
...line coverage for releasing a new prototype translator is around 80%, or 2 unknown words in 10 for a given corpus. This is not enough to make revision practic Number of unknown words (marked with a star) in test:

18 KB (2,493 words) - 08:39, 10 May 2013
Tartu Apertium Course/Session 7
...line coverage for releasing a new prototype translator is around 80%, or 2 unknown words in 10 for a given corpus. This is not enough to make revision practic Number of unknown words (marked with a star) in test:

18 KB (2,493 words) - 10:59, 30 October 2015
Swedish and Danish/Evaluation
Number of unknown words (marked with a star) in test: 271 Percentage of unknown words: 23.42 %

27 KB (4,372 words) - 11:33, 14 October 2009
Charlifter
Number of unknown words (marked with a star) in test: Percentage of unknown words: 0.00 %

3 KB (341 words) - 02:07, 10 March 2018
Mandarin Chinese
...gernt.com/dict.shtml English-Chinese Online Dictionary] TigerNT '''license unknown''' ...ses.b5.gz Chinese Community Information Center Corpora] FTP .gz '''license unknown'''

16 KB (2,148 words) - 03:28, 16 December 2015
Aragonese and Catalan/Evaluation
Number of unknown words (marked with a star) in test: 156 Percentage of unknown words: 11.86 %

2 KB (236 words) - 08:55, 16 January 2016
Apertium-uzb-kaa
Number of unknown words (marked with a star) in test: 284 Percentage of unknown words: 91.91 %

5 KB (515 words) - 14:34, 1 September 2019
Why we trim
...d into misunderstanding the content, instead of observing that there is an unknown word. ...rmous increase in transfer complexity – all tags have to be presumed to be unknown, and developer time is wasted on bug-hunting and workarounds instead of imp

4 KB (679 words) - 16:06, 3 May 2020
Norwegian Nynorsk and Norwegian Bokmål
Number of unknown words (marked with a star) in test: 653 Percentage of unknown words: 17.48 %

23 KB (3,704 words) - 11:56, 16 December 2020
Apertium and Constraint Grammar
==Matching unknown words in Apertium== lttoolbox prepends a star to unknown words, so you can match unknown words using a simple regexp matching that star:

7 KB (1,116 words) - 20:57, 2 April 2021
Constraint-based lexical selection module
...the OR operation, the rules would try to match precisely a sequence of one unknown word followed by one known one. ====Matching an unknown word====

19 KB (2,820 words) - 15:26, 11 April 2023
French and Esperanto/Quality tests
Number of unknown words (marked with a star) in test: 203<br/> Percentage of unknown words: 8,24 %<br/>

81 KB (13,134 words) - 16:48, 30 September 2011
List of symbols
| <code>GD</code> || Gender to be determined || ||  ...ity to be determined || if the sub-category is (currently) unknown || 

38 KB (4,492 words) - 15:36, 9 May 2024
Unsupervised tagger training
...we had approx 13,000 entries. Approx half of the training sentences had an unknown word. With this we got very poor tagger performance. Then we added 7,000 pr ...My dix is not big enough, and approx half of the training sentences has an unknown word. Can't I just grep these sentences away, and then train on the rest?

7 KB (1,177 words) - 08:34, 8 October 2014
Unigram tagger
and the unknown analysis string <code>a<c></code> a score of ...er scores to unknown analysis strings with frequent <math>a</math> than to unknown analysis strings with infrequent <math>a</math> .

20 KB (3,229 words) - 20:06, 12 March 2018
Alphabet
...words as opposed to "blank" chars. Its main effect is on tokenisation of ''unknown'' words, since non-alphabet characters may still be part of a ''known'' wor ...a word not in the dictionary, but composed of alphabetic chars, we get an unknown-word analysis:

2 KB (400 words) - 08:52, 28 April 2014
Crimean Tatar and Turkish/Work plan
Number of tokenised words unknown to analyser: 63730 — 43.1 % of tokens had * unknown to bidix: 112 — 0.1 % of tokens had @

4 KB (496 words) - 18:27, 19 June 2017
Task ideas for Google Code-in
...%BBB% and run it through Apertium's %AAA%-%BBB% translator to identify 50 unknown forms. Add the stems of these forms to the analyser in an appropriate way ...%BBB% and run it through Apertium's %AAA%-%BBB% translator to identify 50 unknown forms. Add the stems of these forms to the analyser in an appropriate way

32 KB (4,862 words) - 06:23, 5 December 2019
Measuring coverage of HFST transducer
echo "TOP UNKNOWN WORDS:" UNKNOWN=`cat /tmp/$LG.parade.txt | grep '\*' | wc -l`

864 bytes (139 words) - 02:14, 6 September 2019
Incorporating guessing into Apertium
...etc. By the time you finish you should have a reasonable model of missing unknown words. <match case="Aa" unknown="true"><add-reading tags="np.ant"/></match>

4 KB (558 words) - 13:07, 26 June 2020
Contributing
==Adding/fixing unknown words== If you have some words that are unknown in a certain language pair, you can help out by simply writing a list of wo

3 KB (549 words) - 09:17, 26 May 2021
Starting a new language with lttoolbox
hsb.dix:25: element s: validity error : IDREF attribute n references an unknown ID "nom" hsb.dix:33: element s: validity error : IDREF attribute n references an unknown ID "nom"

19 KB (3,440 words) - 12:10, 26 September 2016
Matching unknown words
...interchunk, this is quite easy, as each unknown word has the chunk lemma 'unknown', but it's un- or under-documented how this should be done using apertium-t

2 KB (209 words) - 11:06, 24 March 2012
Как использовать lttoolbox, чтобы разработать новый морфологический анализатор
hsb.dix:25: element s: validity error : IDREF attribute n references an unknown ID "nom" hsb.dix:33: element s: validity error : IDREF attribute n references an unknown ID "nom"

25 KB (2,260 words) - 18:36, 12 January 2012
Morphology of Turkmen
=== Unknown ===

4 KB (682 words) - 11:14, 16 April 2012
Xml grep
==I get "Unknown option --xpath"==

5 KB (863 words) - 09:04, 10 October 2017
Jorgal
==How do I see unknown words?==

2 KB (331 words) - 12:03, 28 February 2017
Apertium-apy
*'''-f --missing-freqs:''' path to sqlite3 database of words that were unknown (requires <code>sudo apt-get install sqlite3</code>) *'''markUnknown=no''' (optional): include this to remove "*" in front of unknown words

37 KB (5,132 words) - 16:36, 5 June 2020
Begiak
=== Unknown ===

8 KB (1,234 words) - 17:01, 3 December 2020
Northern Sámi and Norwegian/Regression tests
===Unknown===

38 KB (6,273 words) - 11:01, 24 December 2020
Spanish and Esperanto/Notoj pri versioj
Number of unknown words (marked with a star) in test: 117<br/> Percentage of unknown words: 3,87 %<br/>

6 KB (845 words) - 20:08, 3 October 2011
Compounds
Both [[lttoolbox]] and [[HFST]] have methods for dynamically analysing unknown compound words into their constituent parts. See below for how it's done in ..., and only do compounding if the other methods would give an unknown word. Unknown words are made up of strings of characters from <alphabet>, separated

16 KB (2,689 words) - 09:07, 6 April 2021
Spanish and Esperanto/Quality tests
Number of unknown words (marked with a star) in test: 117<br/> Percentage of unknown words: 3,87 %<br/>

98 KB (16,331 words) - 20:28, 30 September 2011
Evaluation
Note: Reference translation MUST have no unknown-word marks, even if systems that do not mark unknown words with a star.

6 KB (981 words) - 09:13, 21 November 2021
Lttoolbox-java
-e: morphological analysis, with compound analysis on unknown words -n: morph. generation without unknown word marks

9 KB (1,370 words) - 09:49, 7 April 2020
Tartu Apertium Course/Session 8
...high, and compares with commercial systems -- over 95% coverage (around 5 unknown words out of 100 words), and between 3-7% word-error rate (out of 100 words ...final coverage of the system was around 90%, e.g. over a set of corpora 10 unknown words out of 100 on average. The word-error rate was around 17%, meaning th

12 KB (1,683 words) - 11:00, 30 October 2015

Retrieved from "https://wiki.apertium.org/wiki/Special:Search"

Navigation menu