Difference between revisions of "Icelandic and English"

From Apertium
Jump to navigation Jump to search
Line 2: Line 2:
==Pending tasks==
==Pending tasks==


* Write a [[TSX format|TSX file]] to group together fine tags into coarse tags.
* <s>Tag a corpus with IceTagger</s> and train the <code>apertium-tagger</code>
* Post-edit automatically-generated bilingual dictionaries
* Try and convert some IceTagger constraint rules to work in [[constraint grammar]]
* Try and convert some IceTagger constraint rules to work in [[constraint grammar]]
* Use IceParser to parse a corpus and extract the most frequent patterns in terms of chunks/phrases (lists of coarse POS tags) and phrase patterns (in terms of chunks/phrases).
* Merge analysed corpus (IceMorphy full-form list) with Apertium dictionary &mdash; will require matching partial information to paradigms... perhaps use [[extract]] ?
* In the Icelandic analyser (<code>apertium-is-en.is.dix</code>):
** Fix verb paradigms... most are missing:
*** Participle forms (declensions) and infinitives
*** Active and middle voice distinction (see file <code>dev/is-tags.mapping.txt</code> for tag mapping details)
*** Verbs are missing the supine
** Check declensions for adjectives to see if any parts can be factored out (e.g. the weak/strong inflection)
*** <s>Move 'sta' and 'vei' tags to the end of the declension.</s>
* Write transfer rules &mdash; start with most frequent patterns (as found from the corpus)


==Resources==


===Corpora===
==Notes==


* ind(is) → def(en): almenningur, alþjóð, alþýða
* Mediawiki l10n, KDE4, OpenSubtitles, etc. &mdash; from OPUS (~60k sentences)

* [http://eng.menntamalaraduneyti.is/Acts/ Parliamentary Acts] (?? sentences)
==Resources==


===Bilingual dictionaries===
===Bilingual dictionaries===

Revision as of 11:17, 1 March 2010

Pending tasks


Notes

  • ind(is) → def(en): almenningur, alþjóð, alþýða

Resources

Bilingual dictionaries

Example phrase

  • Hver maður er borinn frjáls og jafn öðrum að virðingu og réttindum.

IceFormat

Hver foken maður nken er sfg3en 
borinn sþgken frjáls lkensf 
og c jafn aa öðrum fokfþ að c 
virðingu nveþ og c réttindum nhfþ . . 
{*SUBJ> [NP Hver foken maður nken NP] *SUBJ>}
[VPb er sfg3en VPb]
{*COMP< [VPp borinn sþgken VPp] *COMP<}
{*COMP< [APs [AP frjáls lkensf AP] [CP og c CP] [AP jafn lkensf AP] APs] *COMP<}
[NP öðrum fokfþ NP]
[SCP að c SCP]
[NPs [NP virðingu nveþ NP] [CP og c CP] [NP réttindum nhfþ NP] NPs]

Apertium

^Hver<prn><ind><m><sg><nom>$ ^maður<n><m><sg><nom><ind>$ ^vera<vbser><pri><p3><sg>$ 
^bera<vblex><pp><m><sg><nom>$ ^frjáls<adj><sta><pst><m><sg><nom>$ 
^og<cnjcoo>$ ^jafn<adj><sta><pst><m><sg><nom>$ 
^annar<prn><ind><m><pl><dat>$ ^að<pr>$ 
^virðing<n><f><sg><dat><def>$ ^og<cnjcoo>$ ^réttindi<n><nt><pl><dat><ind>$ ^.<sent>$
^prn_nom<SN><@SUBJ→>{^Hver<prn><ind><m><sg><nom>$ ^maður<n><m><sg><nom><ind>$}$ 
^verb<SV>{^vera<vbser><pri><p3><sg>$ ^bera<vblex><pp><m><sg><nom>$}$  
^adj_cc_adj<SA>{^frjáls<adj><sta><pst><m><sg><nom>$ ^og<cnjcoo>$ ^jafn<adj><sta><pst><m><sg><nom>$}$
^nom<SN>{^annar<prn><ind><m><pl><dat>$}$
^að<Prep>{^að<pr>$}$
^nom_cc_nom{^virðing<n><f><sg><dat><def>$ ^og<cnjcoo>$ ^réttindi<n><nt><pl><dat><ind>$}$

See also

External links