User:Firespeaker/Apertium-turkic talk outline

From Apertium

< User:Firespeaker

Revision as of 15:48, 28 September 2012 by Firespeaker (talk | contribs) (→‎Morphological and phonological properties encountered in Turkic languages)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

Contents

1 Morphological transducers: what and why
2 Turkic languages
- 2.1 Geographical/demographic overview of Turkic languages
- 2.2 Morphological and phonological properties encountered in Turkic languages
3 Developing a morphological transducer
- 3.1 HFST and how we use it
- 3.2 Examples: how morphophonological issues above are dealt with
4 State of affairs now with apertium-turkic

Morphological transducers: what and why

slide 1: definition, example (sample input/output)
slide 2: use in RBMT, specifically apertium
slide 3: other uses: spell checkers, ...?

Turkic languages

Geographical/demographic overview of Turkic languages

slides 4, 5?
- a map, numbers of speakers, wikipedia presence

Morphological and phonological properties encountered in Turkic languages

slide 5: Agglutination
slide 6: Vowel harmony
slide 7: Consonantal processes
slide 8: "buffer" segments
slide 9: Cyrillic orthographical issues
something on morpho-syntactic issues that've come up a lot
- no suffix can attach to "any word", "any part of speech" or even e.g. "all nouns"; often suffixes recur in very specific sorts of places; it's almost like we have dozens of POSes
- Adjective classes (e.g., whether used as <attr>/<subst>/<advl>, +comparative, etc.)
- Non-finite verb forms
- ?

Developing a morphological transducer

Important resources to start with:
- a corpus
- some grammars and dictionaries
- linguistic knowledge of the language (if you want to get into it deeply)
- native speakers!
  - ability to work with informants
  - patience!
  - cf. Chuvash (i.e., the native speakers hopefully agree on forms)

HFST and how we use it

slide: HFST: what and who
slide: our purposes: using two two-level systems together for a three-level system (?):
- slide: overview of lexc and why it was chosen
- slide: overview of twol and why it was chosen

Examples: how morphophonological issues above are dealt with

bing
bang
bam

State of affairs now with apertium-turkic

Turkic languages

Retrieved from "https://wiki.apertium.org/w/index.php?title=User:Firespeaker/Apertium-turkic_talk_outline&oldid=36727"