Difference between revisions of "Kurmanji and English/Final report"

From Apertium
Jump to navigation Jump to search
m
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
  +
  +
This is the report for my 2016 Google Summer of Code project, Kurmanji-English Machine Translation.
   
 
== What was done ==
 
== What was done ==
  +
My project was to improve significantly the preexisting pair, to around release quality. I have worked on adding vocabulary, disambiguation rules in CG, transfer rules and lexical selection.
   
  +
The vocabulary was added from a number of sources, a few thousand were added from the work of Walther et al in their Kurmanji analyzer and POS tagger.
 
   
 
; Adherence to work plan
 
; Adherence to work plan
  +
I have largely followed and met the goals of the work plan, however in adding vocabulary in some cases I face difficulty in meeting the goals, due to lack of digital resources I was required to add translations one by one, using Kurdish-Turkish dictionaries and translating the meanings into English.
   
 
== Statistics ==
 
== Statistics ==
  +
{| class="wikitable"
 
  +
!
; Before
 
  +
! Monodix
 
  +
! Bidix
 
  +
! CG Rules
; After
 
  +
! Transfer
  +
! Paradigms
  +
! Bilingual Coverage
  +
|-
 
! Before
  +
| 1433
  +
| 11421
  +
| 3
  +
| 9
  +
| 83
  +
| 57%
  +
|-
 
! After
  +
| 17715
  +
| 15597
  +
| 97
  +
| 23
  +
| 157
  +
| 85%
  +
|}
   
 
== Future work ==
 
== Future work ==
  +
The most immediate concern for future work would be adding more transfer rules, in order to improve the quality of translations, and improving the coverage of the bilingual dictionary a bit more, to around 90%.
  +
   
 
== List of commits ==
 
== List of commits ==
My commits are listed below, under the folders with the name ''kmr'', which is the ISO code for Kurmanji Kurdisjh.
+
My commits are listed below, under the folders with the name ''kmr'', which is the ISO code for Kurmanji Kurdish.
   
 
https://apertium.projectjj.com/gsoc2016/memduhg.html
 
https://apertium.projectjj.com/gsoc2016/memduhg.html
   
  +
== Testing the Product ==
  +
  +
The translation pair can be installed using the commands below.
  +
  +
<pre>
  +
svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-kmr-eng/
  +
svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-eng_feil/
  +
svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-kmr/
  +
cd apertium-kmr
  +
./autogen.sh
  +
make
  +
cd ..
  +
cd apertium-eng_feil
  +
./autogen.sh
  +
make
  +
cd ..
  +
cd apertium-kmr-eng
  +
./autogen.sh --with-lang1=../apertium-kmr --with-lang2=../apertium-eng_feil
  +
</pre>
   
  +
In the apertium-kmr-eng folder, echoing or cat'ing text with a pipe to <pre>apertium -d . kmr-eng</pre> will output an english translation.
  +
<pre>
  +
echo "Ez gelek kefxweş im ku min ji bo GSoC kar kir" | apertium -d . kmr-eng
  +
I very happy #be that I #for *GSoC worked#
  +
</pre>
 
[[Category:Kurdish and English|*]]
 
[[Category:Kurdish and English|*]]

Latest revision as of 10:29, 23 August 2016

This is the report for my 2016 Google Summer of Code project, Kurmanji-English Machine Translation.

What was done[edit]

My project was to improve significantly the preexisting pair, to around release quality. I have worked on adding vocabulary, disambiguation rules in CG, transfer rules and lexical selection.

The vocabulary was added from a number of sources, a few thousand were added from the work of Walther et al in their Kurmanji analyzer and POS tagger.

Adherence to work plan

I have largely followed and met the goals of the work plan, however in adding vocabulary in some cases I face difficulty in meeting the goals, due to lack of digital resources I was required to add translations one by one, using Kurdish-Turkish dictionaries and translating the meanings into English.

Statistics[edit]

Monodix Bidix CG Rules Transfer Paradigms Bilingual Coverage
Before 1433 11421 3 9 83 57%
After 17715 15597 97 23 157 85%

Future work[edit]

The most immediate concern for future work would be adding more transfer rules, in order to improve the quality of translations, and improving the coverage of the bilingual dictionary a bit more, to around 90%.


List of commits[edit]

My commits are listed below, under the folders with the name kmr, which is the ISO code for Kurmanji Kurdish.

https://apertium.projectjj.com/gsoc2016/memduhg.html

Testing the Product[edit]

The translation pair can be installed using the commands below.

svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-kmr-eng/
svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-eng_feil/
svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-kmr/
cd apertium-kmr
./autogen.sh
make
cd ..
cd apertium-eng_feil
./autogen.sh
make
cd ..
cd apertium-kmr-eng
./autogen.sh --with-lang1=../apertium-kmr --with-lang2=../apertium-eng_feil

In the apertium-kmr-eng folder, echoing or cat'ing text with a pipe to

apertium -d . kmr-eng

will output an english translation.

echo "Ez gelek kefxweş im ku min ji bo GSoC kar kir" | apertium -d . kmr-eng
I very happy #be that I #for *GSoC  worked#