Difference between revisions of "User:Firespeaker/Steps for writing a morphological transducer"

From Apertium
Jump to navigation Jump to search
(Created page with 'This is a short guide.... This outlines steps that should be followed more or less in order; however, it can be an iterative process, and sometimes you need to go ahead a step o…')
 
Line 20: Line 20:
== Major evaluation ==
== Major evaluation ==
You should be evaluating all along, but at this point stop and do a major evaluation. Run your transducer over several large corpora (if available). What's missing? Major chunks of morphology? Some common words? Try to fill these in, and your coverage should jump by several percent.
You should be evaluating all along, but at this point stop and do a major evaluation. Run your transducer over several large corpora (if available). What's missing? Major chunks of morphology? Some common words? Try to fill these in, and your coverage should jump by several percent.
== Diambiguation ==
This is a good point to stop and do a different type of evaluation. Run through the analyses of a couple sentences (ideally several paragraphs' worth) and see if there are any words that aren't being evaluated right. Focus on words with multiple analyses where only one is correct in the context. (Words without a correct analysis also will need work, but that's part of [[#Expand coverage|expanding coverage]].)
== Expand coverage ==
== Expand coverage ==
=== Tweak morphophonology ===
=== Tweak morphophonology ===

Revision as of 07:35, 14 January 2013

This is a short guide....

This outlines steps that should be followed more or less in order; however, it can be an iterative process, and sometimes you need to go ahead a step or two to figure out what you did wrong or missed a couple steps back.

Document Resources

Document Morphotactics

Phonology

with clear rules for any condition, documenting variation

Word classes

Not just nouns/verbs/adjectives/etc., but types of these, how they pattern

Decide what the best formalism is

Start writing morphophonology

Add some nouns

Add some noun morphology

For example, plural. Where do cases come in relation to plural?

Add some verbs

Add some verbal morphology

For example, simple present tense.

Figure out more complex categories

Major evaluation

You should be evaluating all along, but at this point stop and do a major evaluation. Run your transducer over several large corpora (if available). What's missing? Major chunks of morphology? Some common words? Try to fill these in, and your coverage should jump by several percent.

Diambiguation

This is a good point to stop and do a different type of evaluation. Run through the analyses of a couple sentences (ideally several paragraphs' worth) and see if there are any words that aren't being evaluated right. Focus on words with multiple analyses where only one is correct in the context. (Words without a correct analysis also will need work, but that's part of expanding coverage.)

Expand coverage

Tweak morphophonology

Add lexemes in bulk

based on frequency lists, words in a corpus not covered, things that occur to you off the top of your head