Difference between revisions of "Kurmanji and English/Final report"

From Apertium
Jump to navigation Jump to search
m
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{TOCD}}
{{TOCD}}

This is the report for my 2016 Google Summer of Code project, Kurmanji-English Machine Translation.


== What was done ==
== What was done ==
My project was to improve significantly the preexisting pair, to around release quality. I have worked on adding vocabulary, disambiguation rules in CG, transfer rules and lexical selection.

The vocabulary was added from a number of sources, a few thousand were added from the work of Walther et al in their Kurmanji analyzer and POS tagger.


; Adherence to work plan
; Adherence to work plan
I have largely followed and met the goals of the work plan, however in adding vocabulary in some cases I face difficulty in meeting the goals, due to lack of digital resources I was required to add translations one by one, using Kurdish-Turkish dictionaries and translating the meanings into English.


== Statistics ==
== Statistics ==
{| class="wikitable"

!
; Before
! Monodix

! Bidix

! CG Rules
; After
! Transfer
! Paradigms
! Bilingual Coverage
|-
! Before
| 1433
| 11421
| 3
| 9
| 83
| 57%
|-
! After
| 17715
| 15597
| 97
| 23
| 157
| 85%
|}


== Future work ==
== Future work ==
The most immediate concern for future work would be adding more transfer rules, in order to improve the quality of translations, and improving the coverage of the bilingual dictionary a bit more, to around 90%.



== List of commits ==
== List of commits ==
My commits are listed below, under the folders with the name ''kmr'', which is the ISO code for Kurmanji Kurdisjh.
My commits are listed below, under the folders with the name ''kmr'', which is the ISO code for Kurmanji Kurdish.


https://apertium.projectjj.com/gsoc2016/memduhg.html
https://apertium.projectjj.com/gsoc2016/memduhg.html


== Testing the Product ==

The translation pair can be installed using the commands below.

<pre>
svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-kmr-eng/
svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-eng_feil/
svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-kmr/
cd apertium-kmr
./autogen.sh
make
cd ..
cd apertium-eng_feil
./autogen.sh
make
cd ..
cd apertium-kmr-eng
./autogen.sh --with-lang1=../apertium-kmr --with-lang2=../apertium-eng_feil
</pre>


In the apertium-kmr-eng folder, echoing or cat'ing text with a pipe to <pre>apertium -d . kmr-eng</pre> will output an english translation.
<pre>
echo "Ez gelek kefxweş im ku min ji bo GSoC kar kir" | apertium -d . kmr-eng
I very happy #be that I #for *GSoC worked#
</pre>
[[Category:Kurdish and English|*]]
[[Category:Kurdish and English|*]]

Latest revision as of 10:29, 23 August 2016

This is the report for my 2016 Google Summer of Code project, Kurmanji-English Machine Translation.

What was done[edit]

My project was to improve significantly the preexisting pair, to around release quality. I have worked on adding vocabulary, disambiguation rules in CG, transfer rules and lexical selection.

The vocabulary was added from a number of sources, a few thousand were added from the work of Walther et al in their Kurmanji analyzer and POS tagger.

Adherence to work plan

I have largely followed and met the goals of the work plan, however in adding vocabulary in some cases I face difficulty in meeting the goals, due to lack of digital resources I was required to add translations one by one, using Kurdish-Turkish dictionaries and translating the meanings into English.

Statistics[edit]

Monodix Bidix CG Rules Transfer Paradigms Bilingual Coverage
Before 1433 11421 3 9 83 57%
After 17715 15597 97 23 157 85%

Future work[edit]

The most immediate concern for future work would be adding more transfer rules, in order to improve the quality of translations, and improving the coverage of the bilingual dictionary a bit more, to around 90%.


List of commits[edit]

My commits are listed below, under the folders with the name kmr, which is the ISO code for Kurmanji Kurdish.

https://apertium.projectjj.com/gsoc2016/memduhg.html

Testing the Product[edit]

The translation pair can be installed using the commands below.

svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-kmr-eng/
svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-eng_feil/
svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-kmr/
cd apertium-kmr
./autogen.sh
make
cd ..
cd apertium-eng_feil
./autogen.sh
make
cd ..
cd apertium-kmr-eng
./autogen.sh --with-lang1=../apertium-kmr --with-lang2=../apertium-eng_feil

In the apertium-kmr-eng folder, echoing or cat'ing text with a pipe to

apertium -d . kmr-eng

will output an english translation.

echo "Ez gelek kefxweş im ku min ji bo GSoC kar kir" | apertium -d . kmr-eng
I very happy #be that I #for *GSoC  worked#