Task ideas for Google Code-in/Extracting paradigm sketches from dictionaries

From Apertium
Jump to navigation Jump to search

Objective[edit]

The objective of this task is to take (or make) a dictionary in text format and extract the paradigm sketches from it. By this we mean the morphological description that comes after the headword. Usually this is not a complete paradigm, but will hopefully give the the distinctive forms of the paradigm that can be then used to generate full paradigms semi-automatically.

Example[edit]

Let's look at an example from Avar:

# бардах (-алъ, -алъул, -ал) диал. глиняный кувшин для содержания воды
# баржа (-ялъ, -ялъул, -би) баржа; ~ хадуб цІазе буксировать баржу; 
# баркала (-ялъ, -ялъул) 1. благодарность;
# барагъи (-ялъ, -ялъул) уст. бараги (ткань)
# барак (-алъ, -алъул, ал) барак; цІулал ~ал деревянные бараки; хІалтІухъаби ~ахъ руго рабочие живут в бараках
# баракат (-алъ, -алъул) см. баркат
# баракатаб см. баркатаб

The paradigm sketches for this subsection would be:

(-алъ, -алъул, -ал)
(-ялъ, -ялъул, -би)
(-ялъ, -ялъул)
(-алъ, -алъул, ал)
(-алъ, -алъул)

You will also be expected to find the grammatical meanings for each of the suffixes, and to split them into word categories. The suffixes and categories should be described in the dictionary.