Difference between revisions of "Scottish Gaelic and Irish"
Jump to navigation
Jump to search
(→Tagger) |
m (Irish doesn't have synthetic adjectives) |
||
(27 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
==Todo== |
==Todo== |
||
===Irish dictionary=== |
|||
* Add ability to analyse initial mutations to the monolingual dictionary. |
|||
::-- I have most of the work done for this -- [[User:Jimregan|Jimregan]] |
|||
# <s>put the paradigms in 1 entry-per-line format</s> |
|||
⚫ | |||
# noun paradigms |
|||
::-- |
|||
## some have only one entry -- these are defective? -- e.g. <code>bá__n_m</code> |
|||
* Improve the tagger -- write restrictions/constraints, and then retrain. |
|||
## some have three entries -- defective also? -- e.g. <code>band/ia__n_m</code> |
|||
⚫ | |||
# verb paradigms |
|||
⚫ | ** We only want stuff in the Irish analyser that we can translate into Scottish Gaelic -- so, in order for a word to be included, it should be in both the Irish monolingual, bilingual and the translation in the Scottish Gaelic monolingual. With the words for which we don't have translations we can just comment them out |
||
## sort the entries so that the order makes sense |
|||
::-- Count me out on this one; I will suggest using <e i="yes"> etc. instead of xml comments -- [[User:Jimregan|Jimregan]] |
|||
## <s>is there an imperative p1.sg ???</s> |
|||
# adjective paradigms |
|||
## some paradigms have more entries than others, e.g. <code>ca/s__adj</code> has 3, and <code>bré/an__adj</code> has 4 |
|||
# are some proper nouns marked with common noun paradigms instead of proper noun paradigms ? |
|||
## find out with <code>cat apertium-ga-gd.ga.dix | grep '<e lm="[A-Z]'</code> |
|||
# sort the entries in the <section id="main"> by a) part-of-speech, b) alphabetical order |
|||
# i think we're missing possessives and demonstratives, quantifiers and perhaps some definite/indefinite pronouns |
|||
==Old todo== |
|||
⚫ | |||
⚫ | ** We only want stuff in the Irish analyser that we can translate into Scottish Gaelic -- so, in order for a word to be included, it should be in both the Irish monolingual, bilingual and the translation in the Scottish Gaelic monolingual. With the words for which we don't have translations we can just comment them out -- or move them to a separate file in <code>dev/</code> |
||
⚫ | |||
* Do some fixing of the bilingual dictionary |
* Do some fixing of the bilingual dictionary |
||
** <s>There are some entries with unknown gender on the Scottish Gaelic side.</s> |
|||
** Some restrictions probably need adding. |
** Some restrictions probably need adding. |
||
** Some conjunctions are marked "cnj" and not subdivided for "cnjcoo", "cnjsub" etc. |
** Some conjunctions are marked "cnj" and not subdivided for "cnjcoo", "cnjsub" etc. |
||
* Making constraint grammar rules more CG-like |
|||
::-- I'll take this one too -- [[User:Jimregan|Jimregan]] |
|||
* Write rules to do initial mutations for generation. |
* Write rules to do initial mutations for generation. |
||
* Write some transfer rules. |
* Write some transfer rules. |
||
** For example to do tenses, number agreement, etc. |
** For example to do tenses, number agreement, etc. |
||
::-- We can probably take most of this stuff from another language pair and add the consonant etc. stuff later; for the most part, adjective chunks etc. should be the same as those in at least one other pair (I'll scout around for which) -- [[User:Jimregan|Jimregan]] |
|||
== |
==Testing== |
||
* [[/Regression tests]] |
|||
==Initial mutations== |
|||
* [[/Pending tests]] |
|||
As members of the group of Celtic languages, both Scottish Gaelic and Irish exhibit initial consonant mutation. There follows a brief description of how the analysis, disambiguation and generation of this phenomenon is dealt with in the <code>apertium-ga-gd</code> package. |
|||
===Analysis and disambiguation=== |
|||
===Generation=== |
|||
Generation of initial mutations takes place in two files, where <math>x</math> is the code of the language that is being generated (<code>ga</code> for Irish, <code>gd</code> for Scottish Gaelic). |
|||
* <code>apertium-ga-gd.pre-<math>x</math>.t1x</code> — Transfer rules which add tags defining the mutation to the beginning of words which should be mutated. |
|||
* <code>apertium-ga-gd.muta-<math>x</math>.dix</code> — A post-generation dictionary which takes the tag and the initial letter of the word and outputs the mutated form. |
|||
==See also== |
==See also== |
||
* [[Scottish Gaelic]] |
* [[Scottish Gaelic]] |
||
==Notes== |
|||
<references/> |
|||
==External links== |
==External links== |
||
* E. Uí Dhonnchadha and J. Van Genabith (2006) "[http://www.sdjt.si/bib/lrec06/pdf/193_pdf.pdf A Part-of-speech tagger for Irish using Finite-State Morphology and Constraint Grammar Disambiguation]". |
|||
* [http://en.wikipedia.org/wiki/Differences_between_Scottish_Gaelic_and_Irish Wikipedia: Differences between Scottish Gaelic and Irish] |
* [http://en.wikipedia.org/wiki/Differences_between_Scottish_Gaelic_and_Irish Wikipedia: Differences between Scottish Gaelic and Irish] |
||
* [http://www.smo.uhi.ac.uk/gaidhlig/ga-ge/coimeas.html Comparison of Irish and Scottish Gaelic] |
|||
* [http://www.smo.uhi.ac.uk/gaidhlig/ga-ge/faclair.html Faclair Gàidhlig-Gaeilge] |
|||
[[Category: |
[[Category:Scottish Gaelic and Irish|*]] |
Latest revision as of 09:59, 8 May 2011
Todo[edit]
Irish dictionary[edit]
put the paradigms in 1 entry-per-line format- noun paradigms
- some have only one entry -- these are defective? -- e.g.
bá__n_m
- some have three entries -- defective also? -- e.g.
band/ia__n_m
- some have only one entry -- these are defective? -- e.g.
- verb paradigms
- sort the entries so that the order makes sense
is there an imperative p1.sg ???
- adjective paradigms
- some paradigms have more entries than others, e.g.
ca/s__adj
has 3, andbré/an__adj
has 4
- some paradigms have more entries than others, e.g.
- are some proper nouns marked with common noun paradigms instead of proper noun paradigms ?
- find out with
cat apertium-ga-gd.ga.dix | grep '<e lm="[A-Z]'
- find out with
- sort the entries in the <section id="main"> by a) part-of-speech, b) alphabetical order
- i think we're missing possessives and demonstratives, quantifiers and perhaps some definite/indefinite pronouns
Old todo[edit]
- Perform an intersection on the monolingual dictionaries. (Making them consistent)
- We only want stuff in the Irish analyser that we can translate into Scottish Gaelic -- so, in order for a word to be included, it should be in both the Irish monolingual, bilingual and the translation in the Scottish Gaelic monolingual. With the words for which we don't have translations we can just comment them out -- or move them to a separate file in
dev/
- We only want stuff in the Irish analyser that we can translate into Scottish Gaelic -- so, in order for a word to be included, it should be in both the Irish monolingual, bilingual and the translation in the Scottish Gaelic monolingual. With the words for which we don't have translations we can just comment them out -- or move them to a separate file in
- Add all missing closed categories to the monolingual dictionaries.
- Do some fixing of the bilingual dictionary
- Some restrictions probably need adding.
- Some conjunctions are marked "cnj" and not subdivided for "cnjcoo", "cnjsub" etc.
- Making constraint grammar rules more CG-like
- Write rules to do initial mutations for generation.
- Write some transfer rules.
- For example to do tenses, number agreement, etc.
Testing[edit]
See also[edit]
Notes[edit]