Unigram tagger

From Apertium
Revision as of 21:55, 17 January 2016 by M5w (talk | contribs) (→‎File Format)
Jump to navigation Jump to search

apertium-tagger from “m5w/apertium”[1] supports all the unigram models from “A set of open-source tools for Turkish natural language processing.”[2]

Installation

First, install all prerequisites. See “If you want to add language data / do more advanced stuff.”[3]

Then, replace <directory> with the directory you’d like to clone “m5w/apertium”[1] into and clone the repository.

git clone https://github.com/m5w/apertium.git <directory>

Then, configure your environment[4] and finally configure, build, and install[5] “m5w/apertium.”[1]

Usage

See apertium-tagger -h .

Training a Model on a Hand-Tagged Corpus

First, get a hand-tagged corpus as you would for any non-unigram model.

$ cat handtagged.txt
^a/a<a>$
^a/a<b>$
^a/a<b>$
^aa/a<a>+a<a>$
^aa/a<a>+a<b>$
^aa/a<a>+a<b>$
^aa/a<b>+a<a>$
^aa/a<b>+a<a>$
^aa/a<b>+a<a>$
^aa/a<b>+a<b>$
^aa/a<b>+a<b>$
^aa/a<b>+a<b>$
^aa/a<b>+a<b>$

Example 2.1.1: handtagged.txt : a Hand-Tagged Corpus for apertium-tagger

Then, replace MODEL with the unigram model from “A set of open-source tools for Turkish natural language processing”[2] you’d like to use, replace SERIALISED_BASIC_TAGGER with the filename to which you’d like to write the model, and train the tagger.

$ apertium-tagger -s 0 -u MODEL SERIALISED_BASIC_TAGGER handtagged.txt

Disambiguation

Either write your input to a file or pipe it to the tagger.

$ cat raw.txt
^a/a<a>/a<b>/a<c>$
^aa/a<a>+a<a>/a<a>+a<b>/a<b>+a<a>/a<b>+a<b>/a<a>+a<c>/a<c>+a<a>/a<c>+a<c>$

Example 2.2.1: raw.txt : Input for apertium-tagger

Replace MODEL with the unigram model from “A set of open-source tools for Turkish natural language processing”[2] you’d like to use, replace SERIALISED_BASIC_TAGGER with the file to which you wrote the unigram model, and disambiguate the input.

$ apertium-tagger -gu MODEL SERIALISED_BASIC_TAGGER raw.txt
^a/a<b>$
^aa/a<b>+a<b>$
$ echo '^a/a<a>/a<b>/a<c>$
^aa/a<a>+a<a>/a<a>+a<b>/a<b>+a<a>/a<b>+a<b>/a<a>+a<c>/a<c>+a<a>/a<c>+a<c>$' | \
apertium-tagger -gu MODEL SERIALISED_BASIC_TAGGER
^a/a<b>$
^aa/a<b>+a<b>$

Unigram Models

See section 5.3 of “A set of open-source tools for Turkish natural language processing.”[2]

Model 1

See section 5.3.1 of “A set of open-source tools for Turkish natural language processing.”[2]

This model assigns each analysis string a score of

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \mathrm{score} &= f(T)~\text{,} \end{align} }

with additive smoothing.

Consider the following corpus.

$ cat handtagged.txt
^a/a<a>$
^a/a<b>$
^a/a<b>$

Example 3.1.1: handtagged.txt : A Hand-Tagged Corpus for apertium-tagger

Given the lexical unit ^a/a<a>/a<b>/a<c>$ , the tagger assigns the analysis string a<a> a score of

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \mathrm{score} &= \mathrm{tokenCount\_T} + 1\\ &= 1 + 1\\ &= 2~\text{,} \end{align} }

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \text{where}~&\mathrm{tokenCount\_T}~\text{is the frequency of}~T~\text{in the corpus}~\text{.} \end{align} }

The tagger then assigns the analysis string a<b> a score of

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \mathrm{score} &= 2 + 1\\ &= 3 \end{align} }

and the unknown analysis string a<c> a score of

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \mathrm{score} &= 0 + 1\\ &= 1~\text{.} \end{align} }

If ./autogen.sh is passed the option --enable-debug , the tagger prints such calculations to standard error.

$ ./autogen.sh --enable-debug
$ make
$ echo '^a/a<a>/a<b>/a<c>$' | apertium-tagger -gu 1 SERIALISED_BASIC_TAGGER


score("a<a>") ==
  2 ==
  2.000000000000000000
score("a<b>") ==
  3 ==
  3.000000000000000000
score("a<c>") ==
  1 ==
  1.000000000000000000
^a<b>$

Training on Corpora with Ambiguous Lexical Units

Consider the following corpus.

$ cat handtagged.txt
^a/a<a>$
^a/a<a>/a<b>$
^a/a<b>$
^a/a<b>$

Example 3.1.1.1: handtagged.txt : a Hand-Tagged Corpus for apertium-tagger

The tagger expects lexical units of 1 analysis string, or lexical units of size 1. However, the size of the lexical unit ^a/a<a>/a<b>$ is 2. For this lexical unit,

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(\texttt{a<a>}) = P(\texttt{a<b>}) = \frac12~\text{;} }

the tagger must effectively increment the frequency of both analysis strings by 0.500000000000000000 . However, the tagger can’t increment the analysis strings’ frequencies by a non-integral number because model 1 represents analysis strings’ frequencies as std::size_t [6].

Instead, the tagger multiplies all the stored analysis strings’ frequencies by this lexical unit’s size and increments the frequency of each of this lexical unit’s analysis strings by 1.

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} f(\texttt{a<a>}) &= (1)(2) &f(\texttt{a<b>}) &= (0)(1)\\ &+ 1 = 2 + 1 = 3 & &+ 1 = 0 + 1 = 1 \end{align} }

The tagger could then increment the analysis strings’ frequencies of another lexical unit of size 2 without multiplying any of the stored analysis strings’ frequencies. To account for this, the tagger stores the least common multiple of all lexical units’ sizes; only if the LCM isn’t divisible by a lexical unit’s size does the tagger multiply all the analysis strings’ frequencies.

After incrementing the analysis strings’ frequencies of the lexical unit ^a/a<a>/a<b>$, the tagger increments the analysis string a<b> of the lexical unit ^a/a<b>$ by

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \frac{\mathrm{LCM}}{\mathrm{TheLexicalUnit.size}} = \frac{2}{1} = 2~\text{.} \end{align} }

If the tagger gets another lexical unit of size 2, it would increment the frequency of each of the lexical unit’s analysis strings by

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \frac{\mathrm{LCM}}{\mathrm{TheLexicalUnit.size}} = \frac{2}{2} = 1~\text{,} \end{align} }

and if it gets a lexical unit of size 3, it would multiply all the analysis strings’ frequencies by 3 and then increment the frequency of each of the lexical unit’s analysis strings by

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \frac{\mathrm{LCM}}{\mathrm{TheLexicalUnit.size}} = \frac{6}{3} = 2~\text{.} \end{align} }

Each model supports functions to increment all their stored analysis strings’ frequencies, so models 2 and 3 support this algorithm as well.

TODO: If one passes the -d option to apertium-tagger , the tagger prints warnings about ambiguous analyses in corpora to stderr.

$ apertium-tagger -ds 0 -u 1 handtagged.txt
apertium-tagger: handtagged.txt: 2:13: unexpected analysis "a<b>" following anal
ysis "a<a>"
^a/a<a>/a<b>$
            ^

File Format

The tagger represents this model as std::map<Analysis, std::size_t> .[7][6][8]

Given the hand-tagged corpus Example 2.1.1: handtagged.txt , the tagger represents the model as

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align}&\texttt{std::map<Analysis, std::size\_t> Model}\\ &\qquad\begin{align}&\texttt{a<a>}&1\\ &\texttt{a<b>}&2\\ &\texttt{a<a>+a<a>}&1\\ &\texttt{a<a>+a<b>}&2\\ &\texttt{a<b>+a<a>}&3\\ &\texttt{a<b>+a<b>}&4~\text{.} \end{align} \end{align} }

The tagger then serialises the model as

0x01 // size of the number of unique analysis strings in bytes
0x06 // number of unique analysis strings
0x01 // size of the number of morphemes in the analysis string a<a> in bytes
0x01 // number of morphemes in the analysis string a<a>
0x01 // size of the length of the lemma of the first morpheme of the analysis string a<a> in bytes
0x01 // length of the lemma of the first morpheme of the analysis string a<a>
0x01 // size of the first character of the lemma of the first morpheme of the analysis string a<a> in bytes
0x61 // first character of the lemma of the first morpheme of the analysis string a<a>
0x01 // size of the number of tags in the first morpheme of the analysis string a<a> in bytes
0x01 // number of tags in the first morpheme of the analysis string a<a>
0x01 // size of the length of the first tag of the first morpheme of the analysis string a<a> in bytes
0x01 // length of the first tag of the first morpheme of the analysis string a<a>
0x01 // size of the first character of the first tag of the first morpheme of the analysis string a<a> in bytes
0x61 // first character of the first tag of the first morpheme of the analysis string a<a>
0x01 // size of the frequency of the analysis string a<a> in bytes
0x01 // frequency of the analysis string a<a>

0x01 // size of the number of morphemes in the analysis string a<a>+a<a> in bytes
0x02 // number of morphemes in the analysis string a<a>+a<a>
. . .

or, more concisely, as

0000000: 0106 0101 0101 0161 0101 0101 0161 0101  .......a.....a..
0000010: 0102 0101 0161 0101 0101 0161 0101 0161  .....a.....a...a
0000020: 0101 0101 0161 0101 0102 0101 0161 0101  .....a.......a..
0000030: 0101 0161 0101 0161 0101 0101 0162 0102  ...a...a.....b..
0000040: 0101 0101 0161 0101 0101 0162 0102 0102  .....a.....b....
0000050: 0101 0161 0101 0101 0162 0101 0161 0101  ...a.....b...a..
0000060: 0101 0161 0103 0102 0101 0161 0101 0101  ...a.......a....
0000070: 0162 0101 0161 0101 0101 0162 0104       .b...a.....b..

.

Model 2

See section 5.3.2 of “A set of open-source tools for Turkish natural language processing.”[2]

Consider Example 3.1.1: handtagged.txt .

The tag string <b> is twice as frequent as <a>. However, model 1 scores b<a> and b<b> equally because neither analysis string appears in the corpus.

This model splits each analysis string into a root, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle r} , and the part of the analysis string that isn’t the root, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a} . An analysis string’s root is its first lemma. The Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle r} of a<b>+c<d> is a , and its Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a} is <b>+c<d> . The tagger assigns each analysis string a score of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(r|a)f(a)} with add-one smoothing. (Without additive smoothing, this model would be the same as model 1.)[9] The tagger assigns higher scores to unknown analysis strings with frequent Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a} than to unknown analysis strings with infrequent Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a} .

Given the lexical unit ^b/b<a>/b<b>$, the tagger assigns the analysis string b<a> a score of

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \mathrm{score} & = \frac{(\mathrm{tokenCount\_r\_a} + 1)(\mathrm{tokenCount\_a} + 1)}{\mathrm{tokenCount\_a} + 1 + \mathrm{typeCount\_a}} \\ & = \frac{(0 + 1)(1 + 1)}{1 + 1 + 2} \\ & = \frac{(1)(2)}{4} \\ & = \frac{1}{2}~\text{,} \end{align} }

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \text{where}~&\mathrm{tokenCount\_r\_a}~\text{is the frequency of the}~r,a~\text{in the corpus ,} \\ &\mathrm{tokenCount\_a}~\text{is the frequency of the}~a~\text{in the corpus ,} \\ \text{and}~&\mathrm{typeCount\_a}~\text{is the size of the parameter vector of all}~r~\text{preceding the}~a~\text{.} \end{align} }

Note that Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathrm{typeCount\_a}} counts the analysis string being scored. For example, the tagger would assign the known analysis string a<a> a score of

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \mathrm{score} & = \frac{(1 + 1)(1 + 1)}{1 + 1 + 1} \\ & = \frac{(2)(2)}{3} \\ & = \frac{4}{3}~\text{.} \end{align} }

The tagger assigns the analysis string b<b> a score of

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \mathrm{score} & = \frac{(0 + 1)(2 + 1)}{2 + 1 + 2} \\ & = \frac{(1)(3)}{5} \\ & = \frac{3}{5}~\text{.} \end{align} }

File Format

Model 3

See section 5.3.3 of “A set of open-source tools for Turkish natural language processing.”[2]

Consider Example 3.1.1: handtagged.txt .

The morpheme a<b> is twice as frequent as the morpheme a<a> . However, model 2 scores the analysis strings a<a>+a<a> and a<b>+a<a> equally because the Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a} of neither appears in the corpus.

This model splits each analysis string into an Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle r~\text{,}} a first inflection, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle i_0~\text{,}} and a sequence of derivation-inflection pairs, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (d_1,i_1)...(d_n,i_n)~\text{.}} The Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle r} of the analysis string a<b>+c<d> is a , its Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle i_0} is <b> , and its Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (d_1,i_1)...(d_n,i_n)} is c<d> , where its Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d_1} is c , and its Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle i_1} is <d> . The tagger assigns each analysis string a score of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(r|i_0)f(i_0)\prod_{i = 1}^n P(d_i|i_{i-1})P(i_i|d_i)} with add-one smoothing. The tagger assigns higher scores to unknown analysis strings with frequent Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle r,i_0} than to unknown analysis strings with infrequent Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle r,i_0~\text{.}}

Given the lexical unit ^aa/a<a>+a<a>/a<b>+a<a>$ , the tagger assigns the analysis string a<a>+a<a> a score of

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \mathrm{score} =\;&\frac{(\mathrm{tokenCount\_r\_i\_0} + 1)(\mathrm{tokenCount\_i\_0} + 1)}{\mathrm{tokenCount\_i\_0} + 1 + \mathrm{typeCount\_i\_0}}\\ &\begin{align}\prod_{i = 1}^n\,&\frac{\mathrm{tokenCount\_d\_i}(d_n, i_{n - 1}) + 1}{\mathrm{tokenCount\_i}(i_{n - 1}) + 1 + \mathrm{typeCount\_i}(i_{n - 1}, d_n)}\\ &\frac{\mathrm{tokenCount\_i\_d}(i_n, d_n) + 1}{\mathrm{tokenCount\_d}(d_n) + 1 + \mathrm{typeCount\_d}(d_n, i_n)}\end{align}\\ =\;&\frac{(1 + 1)(1 + 1)}{1 + 1 + 1}\frac{0 + 1}{0 + 1 + 1}\frac{0 + 1}{0 + 1 + 1}\\ =\;&\frac{(2)(2)}3\frac12\frac12\\ =\;&\frac43\frac14\\ =\;&\frac13~\text{,} \end{align} }

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \text{where}~&\mathrm{tokenCount\_r\_i\_0}~\text{is the frequency of the}~r,i_0~\text{in the corpus}~\text{,}\\ &\mathrm{tokenCount\_i\_0}~\text{is the frequency of the}~i_0~\text{in the corpus}~\text{,}\\ &\mathrm{typeCount\_i\_0}~\text{is the size of the parameter vector of}~r~\text{preceding the}~i_0~\text{,}\\ &\mathrm{tokenCount\_d\_i}(d_n, i_{n - 1})~\text{is the frequency of the}~d_n~\text{following the}~i_{n - 1}~\text{in the corpus}~\text{,}\\ &\mathrm{tokenCount\_i}(i_{n - 1})~\text{is the frequency of non-final}~i_{n - 1}~\text{in the corpus}~\text{,}\\ &\mathrm{typeCount\_i}(i_{n - 1}, d_n)~\text{is the size of the parameter vector of}~d~\text{following the}~i_{n - 1}~\text{,}\\ &\mathrm{tokenCount\_i\_d}(i_n, d_n)~\text{is the frequency of the}~i_n~\text{following the}~d_n~\text{in the corpus}~\text{,}\\ &\mathrm{tokenCount\_d}(d_n)~\text{is the frequency of the}~d_n~\text{in the corpus}~\text{,}\\ \text{and}~&\mathrm{typeCount\_d}(d_n, i_n)~\text{is the size of the parameter vector of}~i~\text{following the}~d_n~\text{.} \end{align} }

The tagger assigns the analysis string a<b>+a<a> a score of

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \mathrm{score} =\;&\frac{(2 + 1)(2 + 1)}{2 + 1 + 1}\frac{0 + 1}{0 + 1 + 1}\frac{0 + 1}{0 + 1 + 1}\\ =\;&\frac{(3)(3)}{4}\frac12\frac12\\ =\;&\frac94\frac14\\ =\;&\frac9{16}~\text{.} \end{align} }

File Format

Notes

  1. 1.0 1.1 1.2 https://github.com/m5w/apertium
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 http://coltekin.net/cagri/papers/trmorph-tools.pdf
  3. Installation#If you want to add language data / do more advanced stuff
  4. Minimal installation from SVN#Set up environment
  5. Minimal installation from SVN#Configure, build, and install
  6. 6.0 6.1 http://en.cppreference.com/w/cpp/types/size_t
  7. http://en.cppreference.com/w/cpp/container/map
  8. https://github.com/m5w/apertium/blob/master/apertium/analysis.h
  9. Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align}\mathrm{score} &= \frac{(\mathrm{tokenCount\_r\_a})(\mathrm{tokenCount\_a})}{\mathrm{tokenCount\_a}}\\&= \mathrm{tokenCount\_r\_a} = \mathrm{tokenCount\_T}\end{align}}