Difference between revisions of "Lttoolbox and lexc"

From Apertium
Jump to navigation Jump to search
 
(17 intermediate revisions by 4 users not shown)
Line 1: Line 1:
  +
A <code>.lexc</code> file defines how morphemes in the language are joined together, ''morphotactics''.
  +
  +
[[lttoolbox et lexc]]
  +
 
{{TOCD}}
 
{{TOCD}}
This page describes some how [[lttoolbox]] and [[HFST]]'s <code>lexc</code> are similar.
+
This page describes some how [[lttoolbox]] and [[HFST]]'s <code>lexc</code> are similar, so that people more familiar with one can get to grips more easily with the other.
   
 
==Terminology==
 
==Terminology==
Line 7: Line 11:
 
! lttoolbox !! lexc !! Notes
 
! lttoolbox !! lexc !! Notes
 
|-
 
|-
| Paradigm || Continuation lexicon ||
+
| Paradigm || Continuation lexicon || So each time you see <code>LEXICON foo</code>, think <code><pardef n="foo"/></code>
 
|-
 
|-
 
| Section || Root lexicon ||
 
| Section || Root lexicon ||
Line 17: Line 21:
 
| Symbol || Multichar symbol || Sequences of one or more symbol which are treated as one symbol
 
| Symbol || Multichar symbol || Sequences of one or more symbol which are treated as one symbol
 
|}
 
|}
 
   
 
==Example==
 
==Example==
Line 24: Line 27:
   
 
<pre>
 
<pre>
  +
<dictionary>
 
<sdefs>
 
<sdefs>
 
<sdef n="n"/>
 
<sdef n="n"/>
Line 31: Line 35:
 
<pardefs>
 
<pardefs>
 
<pardef n="RegNounInfl">
 
<pardef n="RegNounInfl">
<e><p><l/><r><s n="n"/><s n="sg"/></p></e>
+
<e><p><l/><r><s n="n"/><s n="sg"/></r></p></e>
<e><p><l>s</l><r><s n="n"/><s n="pl"/></p></e>
+
<e><p><l>s</l><r><s n="n"/><s n="pl"/></r></p></e>
 
</pardef>
 
</pardef>
 
</pardefs>
 
</pardefs>
<section id="root" type="standard">
+
<section id="Root" type="standard">
 
<e lm="cat"><i>cat</i><par n="RegNounInfl"/></e> <!-- A noun -->
 
<e lm="cat"><i>cat</i><par n="RegNounInfl"/></e> <!-- A noun -->
 
</section>
 
</section>
  +
</dictionary>
  +
</pre>
  +
  +
And to compile and use this dictionary:
  +
  +
<pre>
  +
$ lt-comp lr test.dix test.bin
  +
Root@standard 7 7
  +
  +
$ echo "cat" | lt-proc test.bin
  +
^cat/cat<n><sg>$
  +
  +
$ echo "cats" | lt-proc test.bin
  +
^cats/cat<n><pl>$
 
</pre>
 
</pre>
   
 
===lexc===
 
===lexc===
  +
''See also: [[Apertium-specific conventions for lexc]]''
   
 
<pre>
 
<pre>
 
Multichar_Symbols
 
Multichar_Symbols
   
  +
%<n%>
+N +Pl +Sg
 
  +
%<pl%>
  +
%<sg%>
   
 
LEXICON Root
 
LEXICON Root
Line 57: Line 78:
 
LEXICON RegNounInfl
 
LEXICON RegNounInfl
   
+N+Sg: # ;
+
%<n%>%<sg%>: # ;
+N+Pl:s # ;
+
%<n%>%<pl%>:s # ;
  +
</pre>
  +
  +
And to compile and use this dictionary:
  +
  +
<pre>
  +
$ hfst-lexc test.lexc -o test.gen.hfst
  +
$ hfst-invert -i test.gen.hfst -o test.mor.hfst
  +
  +
$ echo "cat" | hfst-lookup test.mor.hfst
  +
cat cat<n><sg>
   
  +
$ echo "cats" | hfst-lookup test.mor.hfst
  +
cats cat<n><pl>
 
</pre>
 
</pre>
   
 
[[Category:Lttoolbox]]
 
[[Category:Lttoolbox]]
  +
[[Category:HFST]]
  +
[[Category:Documentation in English]]
  +
[[Category:Lexc]]

Latest revision as of 08:10, 30 December 2014

A .lexc file defines how morphemes in the language are joined together, morphotactics.

lttoolbox et lexc

This page describes some how lttoolbox and HFST's lexc are similar, so that people more familiar with one can get to grips more easily with the other.

Terminology[edit]

lttoolbox lexc Notes
Paradigm Continuation lexicon So each time you see LEXICON foo, think <pardef n="foo"/>
Section Root lexicon
Left Up Both left and upper correspond to surface form
Right Down Corresponds to lexical form
Symbol Multichar symbol Sequences of one or more symbol which are treated as one symbol

Example[edit]

lttoolbox[edit]

<dictionary>
  <sdefs>
    <sdef n="n"/>
    <sdef n="pl"/>
    <sdef n="sg"/>
  </sdefs>
  <pardefs> 
    <pardef n="RegNounInfl">
      <e><p><l/><r><s n="n"/><s n="sg"/></r></p></e>
      <e><p><l>s</l><r><s n="n"/><s n="pl"/></r></p></e>
    </pardef>
  </pardefs>
  <section id="Root" type="standard">
    <e lm="cat"><i>cat</i><par n="RegNounInfl"/></e> <!-- A noun -->
  </section>
</dictionary>

And to compile and use this dictionary:

$ lt-comp lr test.dix test.bin
Root@standard 7 7

$ echo "cat" | lt-proc test.bin
^cat/cat<n><sg>$

$ echo "cats" | lt-proc test.bin
^cats/cat<n><pl>$

lexc[edit]

See also: Apertium-specific conventions for lexc

Multichar_Symbols

%<n%>
%<pl%> 
%<sg%>

LEXICON Root

NounRoot ;

LEXICON NounRoot

cat RegNounInfl ; ! A noun

LEXICON RegNounInfl

%<n%>%<sg%>:   # ;
%<n%>%<pl%>:s   # ;

And to compile and use this dictionary:

$ hfst-lexc test.lexc -o test.gen.hfst
$ hfst-invert -i test.gen.hfst -o test.mor.hfst

$ echo "cat" | hfst-lookup test.mor.hfst 
cat	cat<n><sg>

$ echo "cats" | hfst-lookup test.mor.hfst 
cats	cat<n><pl>