Apertium kullanarak dil çifti geliştir

From Apertium
Jump to navigation Jump to search

Fransızcası

İngilizcesi

Bu rehber, Apertium'un geliştirme sürümünü kullanarak bir dil çiftinde bir değişiklik yapmayı öğretir.

Giriş[edit]

İhtiyacınız olan malzemeler:

  • bir metin düzenleyicisi (XML dosyalarının renklendirmesini iyi yapanları kolaylık sağlarlar, XML Düzenleyicileri sayfasında tercih edilen bazı seçenekler mevcut)
  • katkıda bulunacağınız çiftteki dillere hakim olmak
  • sıkıştığınız zaman soru sorabilmeniz için bir IRC programı

Değişikliklerinizin kaydedilip Apertium'a eklenmesini istiyorsanız bir Github hesabı alıp projeye bir katılımcı olarak eklenmeniz gerekir.


Hazırlık[edit]

İlk yapmanız gereken şey Apertium'un temel birleşenlerini kurmak. Kurulum sayfası bunu yapmanın yolunu gösterir, genelde Apple dışı Unix tabanlı sistemlerde paketleme yoluyla, Apple ve Windows sistemlerinde de sanal ortam (Virtualbox) aracılığıyla kurulum yapılır.


The Install language data by compiling page shows how to install an example language pair, e.g. apertium-eo-enor apertium en-ca, both of which are known and stable. You should probably try this to make sure things work before you move on to whatever language pair you plan on working on.

Note that some existing language pairs have external dependencies, like HFST or Constraint Grammar. The Installation page links to their installation procedures (if using packaging or a virtual environment, they are either a one-click install, or pre-installed).


Dil çift(ler)inizi temin edin[edit]

Using the same terminal, you can easily download and add the language pairs you want using a command like:

svn checkout svn://svn.code.sf.net/p/apertium/svn/SVN_MODULE/PAIR_NAME

In the area where it says SVN_MODULE, replace this with the the name of the svn subdirectory where the chosen language pair is.

In the area where it says PAIR_NAME, replace this with the name of the chosen language pair.

For example, if you wanted to retrieve the language pair Spanish/English (which is in trunk) and French/Portuguese (which was in staging as of June 2012) you could type:

svn checkout svn://svn.code.sf.net/p/apertium/svn/trunk/apertium-en-es
svn checkout svn://svn.code.sf.net/p/apertium/svn/staging/apertium-fr-pt

You can go to [1] to see a list of SVN modules, clicking on each shows a list of language pairs. Language pairs are sorted into SVN modules based on how "complete" they are, where trunk is release-quality, staging is very close to release-quality, nursery is for stuff that's 1-3 months of concentrated work from release quality, and incubator is fragments and anything not complete enough to live in the other modules.


Dil çifti derlenmesi[edit]

Compilation of a language pair is similar to lttoolbox and apertium, except you don't need to do "sudo ldconfig", and you should not do "sudo make install" since it's easier and faster to run from the source folder:

cd PAIR_NAME
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh
make -j3

for each language pair, replacing the text PAIR_NAME by the appropriate name.

Changing Things Around[edit]

When you want to make a change in Apertium, you more than likely want to add a word to an existing language pair. For a full explanation go to Contributing to an existing pair. You can also check out the Contact page for Apertium mailing lists and live help through IRC.

IMPORTANT: Adding a word won't do you any good if you don't recompile the modules after the change is made. Simply use the terminal like before and enter: make and press the "Enter" key and your computer will create the new files necessary.


There are 3 major steps in adding a new word to a language pair:

1. Add an entry to the dictionary for the first language that will be used.

2. Add an entry to the bilingual dictionary for the pair.

3. Add an entry to the dictionary for the second language that will be used.

You will need to find the module you want to work with on your computer and open the three dictionaries; for example: apertium-es-ca.es.dix, apertium-es-ca.es-ca.dix, and apertium-es-ca.ca.dix. Note: Each dictionary will have the suffix ".dix" You should open these files in a text editor or specialized XML editor.

Step 1: Adding to the First Dictionary[edit]

When adding entries, you have to enter the lemma (word as you would read it in a dictionary),the part between <i> and </i> that contains the prefix of the word that is common to all inflected forms, and the element in <par> that refers to the inflection paradigm of this word. All entries will have a basic structure like:

      <e lm="(lemma)">
        <i>(prefix)</i>
        <par n="(paradigm)"/>
      </e>

A good example of this would be:

      <e lm="cósmico">
        <i>cósmic</i>
        <par n="absolut/o__adj"/>
      </e>

Start by opening your first language's dictionary file. For example: apertium-en-es.es.dix (an XML file).

Then, create a new entry with the basic structure next to a similar entry in the dictionary. The order of entries doesn't matter.

Now, between the quotes in the area where it says (lemma) replace (lemma) with your word. Note: Do not include () in entries, but place input between "".

Next, you can enter the prefix in the space between <i> and </i> and replace (prefix).

Finally, enter the paradigm in <par> between the quotations. The paradigm should consist of the prefix of another word that has the same inflection and is already in the dictionary and has the morphological analysis: adj m sg, adj f sg, adj m pl and adj f pl respectively. For example: <par n="absolut/o__adj"/> for cósmico. This entry means that the adjective "cósmico" inflects like the adjective "absoluto" and has the same morphological analysis: the forms cósmico, cósmica, cósmicos, and cósmicas are equivalent to the forms absoluto, absoluta, absolutos, and absolutas and have the morphological analysis: adj m sg, adj f sg, adj m pl and adj f pl respectively.

Now, save your altered dictionary, and DO NOT change file name, directory, or file type.

To finish, use the terminal and navigate to the directory that your module is housed in and enter make. Now press the "Enter" key and allow you computer to recompile the module with the changes you just made.

Step 2: Adding to the Second Dictionary[edit]

Using the same structure, you can create the entry in your second language's dictionary that is the equivalent to your entry in the first dictionary.

The second language dictionary file name should be something such as apertium-en-es.en.dix.

Save your changes and recompile the module.

Final Step: The Bilingual Dictionary[edit]

Adding entries to the bilingual dictionary is considerably easier than adding to the other two dictionaries. An entry in this dictionary has this basic structure:

     <e> 
        <p>
          <l>(lemmafromfirst)<s n="(partofspeech)"/></l>
          <r>(lemmafromsecond)<s n="(partofspeech)"/></r>
        </p>
      </e>

Simply add an entry and replace (lemmafromfirst) with the lemma you added to the first dictionary, (lemmafromsecond) with the lemma from the second, and (partofspeech) with the part of speech for each word.

Save this dictionary and recompile the module one last time.

Adding rules to a language pair can also be done, however, that will not be discussed in this guide. See Contributing to an existing pair for a more detailed and full explanation.

Errors[edit]

It is very possible that you may encounter an error in you changes.

To know how a word is analysed by the translator and find an error, type the following in the terminal (example from Contributing to an existing pair. Follow link for more help.):

$ echo "gener" | apertium-destxt | lt-proc ca-es.automorf.bin

You can replace ca-es with the translation direction you want to test.

The output in Apertium should be:

^gener/gener<n><m><sg>$^./.<sent>$[][]

The string structure is: ^word/lemma<morphological analysis>$. The <sent> tag is the analysis of the full stop, as every sentence end is represented as a full stop by the system, whether or not explicitly indicated in the sentence.

The analysis of an unknown word is (ignoring the full stop information):

^genoma/*genoma$

and the analysis of an ambiguous word:

^casa/casa<n><f><sg>/casar<vblex><pri><p3><sg>/casar<vblex><imp><p2><sg>$

Each lexical form (lemma plus morphological analysis) is presented as a possible analysis of the word casa.

If you are still stuck remember that you can always ask questions on IRC.

Dünyaya göster[edit]

Artık bir dil çiftine katkıda bulunduğun için yaptığınız değişiklikleri Github'a yüklemeyi tercih edebilirsin. Değişikliğinizi göndermek (commitlemek) değişikliği yapmaktan bile kolay.

Öncelikle Github'dan bedava bir hesap alman gerekiyor. Sonra, bir Apertium yöneticisiyle irtibata geçip üzerinde çalıştığın projeye commit gönderme yetkisi istemeniz gerekir.

Erişim verilince tek yapmanız gereken terminal açıp değişiklik yapılan çiftin klasörüne giderek:

git add DOSYA
git commit -m "first commit"
git push origin master

yazmanız. DOSYA yazan yere değişiklik yaptığınız dosyayı veya dosyaları yazmanız gerekir. Mesela apertium-tur-tat.tur-tat.dix ve apertium-tur-tat.tur-tat.t1x değiştirildiyse:

git add apertium-tur-tat.tur-tat.dix apertium-tur-tat.tur-tat.t1x

yazılması gerekir.

Commitlerken genelde yapılan düzenleme hakkında kısaca bilgi veren bir ibare yazılır, bu bir commit mesajıdır. Geliştirme çabalarına katkıda bulundukça commit mesajlarının anlaşılır ve bilgilendirici olmasına dikkat edilmelidir.

Artık bir Apertium çifti geliştiricisisin!

Ayrıca Bakınız[edit]