Difference between revisions of "User:Firespeaker/Apertium-turkic talk outline"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
  +
Sketch for talk on [http://cl.indiana.edu/wiki/Fall2012ClingDing Writing Turkic-language morphological transducers using HFST (for MT)] on October 2nd.
  +
 
== Morphological transducers: what and why ==
 
== Morphological transducers: what and why ==
 
* slide 1: definition, example (sample input/output)
 
* slide 1: definition, example (sample input/output)

Revision as of 16:01, 28 September 2012

Sketch for talk on Writing Turkic-language morphological transducers using HFST (for MT) on October 2nd.

Morphological transducers: what and why

  • slide 1: definition, example (sample input/output)
  • slide 2: use in RBMT, specifically apertium
  • slide 3: other uses: spell checkers, ...?

Turkic languages

Geographical/demographic overview of Turkic languages

  • slides 4, 5?
    • a map, numbers of speakers, wikipedia presence

Morphological and phonological properties encountered in Turkic languages

(these are all to be taken as "challenges for morphological transducers")

  • slide 5: Agglutination
  • slide 6: Vowel harmony
  • slide 7: Consonantal processes
  • slide 8: "buffer" segments
  • slide 9: Cyrillic orthographical issues
  • something on morpho-syntactic issues that've come up a lot
    • no suffix can attach to "any word", "any part of speech" or even e.g. "all nouns"; often suffixes recur in very specific sorts of places; it's almost like we have dozens of POSes
      • We don't want to overanalyse(/overgenerate)
        • disambig issues
        • testvoc issues
    • Adjective classes (e.g., whether used as <attr>/<subst>/<advl>, +comparative, etc.)
    • Non-finite verb forms
    •  ?

Developing a morphological transducer

  • Important resources to start with:
    • a corpus
    • some grammars and dictionaries
    • linguistic knowledge of the language (if you want to get into it deeply)
    • native speakers!
      • ability to work with informants
      • patience!
      • cf. Chuvash (i.e., the native speakers hopefully agree on forms)

HFST and how we use it

  • slide: HFST: what and who
  • slide: our purposes: using two two-level systems together for a three-level system (?):
    • slide: overview of lexc and why it was chosen
    • slide: overview of twol and why it was chosen

Examples: how morphophonological issues above are dealt with

  • bing
  • bang
  • bam

State of affairs now with apertium-turkic