Difference between revisions of "User:Firespeaker/Apertium-turkic talk outline"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
Firespeaker (talk | contribs) |
||
Line 9: | Line 9: | ||
** a map, numbers of speakers, wikipedia presence |
** a map, numbers of speakers, wikipedia presence |
||
=== Morphological and phonological properties encountered in Turkic languages === |
=== Morphological and phonological properties encountered in Turkic languages === |
||
(these are all to be taken as "challenges for morphological transducers") |
|||
* slide 5: Agglutination |
* slide 5: Agglutination |
||
* slide 6: Vowel harmony |
* slide 6: Vowel harmony |
||
Line 16: | Line 17: | ||
* something on morpho-syntactic issues that've come up a lot |
* something on morpho-syntactic issues that've come up a lot |
||
** no suffix can attach to "any word", "any part of speech" or even e.g. "all nouns"; often suffixes recur in very specific sorts of places; it's almost like we have dozens of POSes |
** no suffix can attach to "any word", "any part of speech" or even e.g. "all nouns"; often suffixes recur in very specific sorts of places; it's almost like we have dozens of POSes |
||
*** We don't want to overanalyse(/overgenerate) |
|||
**** disambig issues |
|||
**** testvoc issues |
|||
** Adjective classes (e.g., whether used as {{tag|attr}}/{{tag|subst}}/{{tag|advl}}, +comparative, etc.) |
** Adjective classes (e.g., whether used as {{tag|attr}}/{{tag|subst}}/{{tag|advl}}, +comparative, etc.) |
||
** Non-finite verb forms |
** Non-finite verb forms |
Revision as of 15:51, 28 September 2012
Contents
Morphological transducers: what and why
- slide 1: definition, example (sample input/output)
- slide 2: use in RBMT, specifically apertium
- slide 3: other uses: spell checkers, ...?
Turkic languages
Geographical/demographic overview of Turkic languages
- slides 4, 5?
- a map, numbers of speakers, wikipedia presence
Morphological and phonological properties encountered in Turkic languages
(these are all to be taken as "challenges for morphological transducers")
- slide 5: Agglutination
- slide 6: Vowel harmony
- slide 7: Consonantal processes
- slide 8: "buffer" segments
- slide 9: Cyrillic orthographical issues
- something on morpho-syntactic issues that've come up a lot
- no suffix can attach to "any word", "any part of speech" or even e.g. "all nouns"; often suffixes recur in very specific sorts of places; it's almost like we have dozens of POSes
- We don't want to overanalyse(/overgenerate)
- disambig issues
- testvoc issues
- We don't want to overanalyse(/overgenerate)
- Adjective classes (e.g., whether used as
<attr>
/<subst>
/<advl>
, +comparative, etc.) - Non-finite verb forms
- ?
- no suffix can attach to "any word", "any part of speech" or even e.g. "all nouns"; often suffixes recur in very specific sorts of places; it's almost like we have dozens of POSes
Developing a morphological transducer
- Important resources to start with:
- a corpus
- some grammars and dictionaries
- linguistic knowledge of the language (if you want to get into it deeply)
- native speakers!
- ability to work with informants
- patience!
- cf. Chuvash (i.e., the native speakers hopefully agree on forms)
HFST and how we use it
- slide: HFST: what and who
- slide: our purposes: using two two-level systems together for a three-level system (?):
- slide: overview of lexc and why it was chosen
- slide: overview of twol and why it was chosen
Examples: how morphophonological issues above are dealt with
- bing
- bang
- bam