Difference between revisions of "Apertium Turkic"

From Apertium
Jump to navigation Jump to search
Line 30: Line 30:
* The '''[[Azeri and Turkish|Azeri-Turkish]]''' pair was originally developed by Gianluca, but [[azmorph]] has since become obsolete.
* The '''[[Azeri and Turkish|Azeri-Turkish]]''' pair was originally developed by Gianluca, but [[azmorph]] has since become obsolete.
* The '''[[Turkmen and Turkish|Turkmen-Turkish]]''' pair needs some attention.
* The '''[[Turkmen and Turkish|Turkmen-Turkish]]''' pair needs some attention.
* The '''[[Kazakh and Uyghur|Kazakh-Uyghur]]''' pair was thrown together by Fran and Jonathan with some assistance from Märdan.


=== Planned for the future ===
=== Planned for the future ===

Revision as of 06:00, 21 July 2014

The Apertium Turkic working group includes everyone who works on Turkic-language resources as part of the Apertium project. Resources we develop include not just Machine Translation systems, but their underlying components which can be repurposed, including morphological transducers, disambiguators, and dictionaries.

You can browse our projects, evaluate our publications, see a list of our contributors, or contact us about a mistake you noticed, a project you'd like to see, or your interest in helping out. Our work is showcased at turkic.apertium.org.

Translation pairs

We have done quite a bit of work on Machine Translation systems involving Turkic languages. This section provides a short overview of some of them, roughly in order of how well they work.

Released

  • Our Kazakh-Tatar system was developed largely by Ilnar, who did the majority of work on it as his GSoC 2012 project. The project was overseen by Jonathan, who did a lot of work on the transducers (especially Kazakh), and Fran. The system was deemed production-ready and released during summer of 2013, and work is ongoing to increase its accuracy.

Approaching production quality

The following pairs are all approaching production quality, but have suffered from stalled development and need various amounts of work to bring to production quality.

  • The Turkish-Kyrgyz pair was developed in the summer of 2011 by Mirlan Ipasov under the supervision of Jonathan, and was our first Turkic-Turkic pair using HFST. Mirlan and Jonathan's work on the Kyrgyz transducer paved the way for other Turkic pairs. The pair needs some work to be brought up to date to work with newer transducers.
  • The Kazakh-Kyrgyz pair was largely developed by Qantörö under the supervision of Jonathan, but is not yet production-ready.
  • The Uzbek-Turkish pair was largely developed by Akın under the supervision of Gianluca, but is not yet production-ready.

Under development

The following pairs are under active development, but are a ways from being production-ready:

  • The English-Kazakh pair is being worked on by Aida Sundetova under the supervision of Mikel Forcada.
  • The Qaraqalpaq-Kazakh pair was originally put together by Atabek, Fran, and Jonathan, and is being developed further by Beknazar.

Prototypes

The following pairs are prototypes that could blossom if given proper attention.

  • The Tatar-Bashqort pair was developed by Röstäm, Ilnar, Jonathan, and Fran. It has very promising results as a prototype system, but the Bashqort transducer still needs a lot of work.
  • Chuvash-Turkish
  • The Khalkha-Kazakh pair has been being developed by Jonathan for fun. He's currently looking for a someone who knows Khalkha well to contribute.
  • Chuvash-Tatar
  • Tatar-Turkish
  • The Azeri-Turkish pair was originally developed by Gianluca, but azmorph has since become obsolete.
  • The Turkmen-Turkish pair needs some attention.
  • The Kazakh-Uyghur pair was thrown together by Fran and Jonathan with some assistance from Märdan.

Planned for the future

There are pairs that Apertium Turkic developers would like to see exist at some point.

  • Uzbek-Kyrgyz
  • Qaraqalpaq-Uzbek
  • Kazakh-Kumyk
  • Kazakh-Nogay

People

Active contributors

Photo Name IRC nick Turkic projects involved in (role) Other Turkic projects interested in
Spectie.260.jpg
Francis Morton Tyers
(wiki · email)
spectie, spectei, spectre
Jonathan in a Qyrgyz qalpaq.jpg
Jonathan North Washington
(wiki · email)
firespeaker, jonorthwash, kd5cfx

Pairs:

Transducers:

  • Kazakh (developed much of morphotactics and morphophonology)
  • Kyrgyz (developed almost entirety of morphotactics and morphophonology)
  • Tatar (helped develop morphotactics and morphophonology)
  • Bashqort (helped develop morphotactics and morphophonology)
  • Chuvash (helped develop morphotactics and morphophonology)
  • Turkmen (helped develop morphotactics and morphophonology)
  • Kumyk (helped develop morphotactics and morphophonology)
  • Nogay (helped develop morphotactics and morphophonology)
  • Karakalpak (developed most of the morphophonology)
  • Uzbek-Kyrgyz
  • Qaraqalpaq-Uzbek
Ilnar Salimzyanov
(wiki · email)
selimcan
  • Tatar-Bashqort
Zfe.jpg
Gianluca Grossi
(wiki)
zfe
Mlf-photo.jpg
Mikel Forcada mlforcada
Aida Sundetova Aida

Contributors emeritus

The following contributors are not currently active, but are always welcome back!

Photo Name IRC nick Turkic projects involved in (role)
Mirlan Ipasov gantu Turkish-Kyrgyz
Hèctor Alòs i Font Chuvash-Turkish
Röstäm Batalov Tatar-Bashqort
Akın Dalkı akindalki Uzbek-Turkish
Qantörö Erqulov kantoro Kazakh-Kyrgyz


Other contributors

Photo Name IRC nick Contributions
Sushain Cherivirala sushain, sushain97 apertium-apy, apertium-html-tools

We also appreciate the assistance of everyone who's helped with localising apertium-html-tools.

About our website

The turkic.apertium.org website is powered by apertium-apy and apertium-html-tools, both written and developed largely by Sushain as part of GCI 2013. It runs on a virtualhost donated to us by Bytemark.

Publications

  • Washington, Jonathan N., Ilnar Salimzyanov, and Francis M. Tyers. (2014) "Finite-state morphological transducers for three Kypchak languages". Proceedings of the 9th Conference on Language Resources and Evaluation, LREC2014. Poster, Paper
  • Salimzyanov, Ilnar, Jonathan Washington, and Francis Tyers (2013). A free/open-source Kazakh-Tatar machine translation system. MT Summit XIV. Paper
  • Tyers, Francis, Ilnar Salimzyanov, Jonathan Washington, and Rustam Batalov (2012): "A proto-type Bashkir-Tatar machine translation system". LREC 2012. Slides

Contact

Feel free to contact us if you find a mistake, there's a project you would like to see us work on, or you would like to help out.

To contact the Apertium Turkic team, you can find us on apertium's IRC channel, send one of us a message through the wiki, or send an email to contact@turkic.apertium.org — don't worry, we're friendly :)

We maintain a low-traffic mailing list (apertium-turkic@lists.sourceforge.net) where occasional discussion and announcements occur. See our archives or subscribe to join in on the fun!