Difference between revisions of "Apertium Turkic/TODO"

From Apertium
Jump to navigation Jump to search
 
(20 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{TOCD}}

This is a general to-do list for the [[Apertium Turkic]] working group.


== Website ==
== Website ==
Get [http://turkic.apertium.com/ http://turkic.apertium.com/] up and running.
This section outlines what's left to get [http://turkic.apertium.com/ http://turkic.apertium.com/] up and running.


=== software infrastructure ===
=== software infrastructure ===
* <s>Get apertium-apy working stably</s>
* <s>Get apertium-apy working stably</s>
* [http://www.google-melange.com/gci/task/view/google/gci2013/5827136816414720 merge simple-html and html-tools so that simple-html can be automatically extracted from html-tools]
* [http://www.google-melange.com/gci/task/view/google/gci2013/5827136816414720 merge simple-html and html-tools so that simple-html can be automatically extracted from html-tools]
* [http://www.google-melange.com/gci/task/view/google/gci2013/6612775656751104 apache forwarding for html-tools]
* <s>[http://www.google-melange.com/gci/task/view/google/gci2013/6612775656751104 apache forwarding for html-tools]</s> (unnecessary!)
* [http://www.google-melange.com/gci/task/view/google/gci2013/5833268486209536 init scripts] and [http://www.google-melange.com/gci/task/view/google/gci2013/5346872029872128 cron testers] for html-tools, gateway, and apertium-apy
* [http://www.google-melange.com/gci/task/view/google/gci2013/5833268486209536 init scripts] and [http://www.google-melange.com/gci/task/view/google/gci2013/5346872029872128 cron testers] for apertium-html-tools, gateway, and apertium-apy
** find some way to have it retry restarting if it fails because the port is still reserved by the OS


=== optional: spell checker stuff ===
=== optional: spell checker and language detection stuff ===
* [http://www.google-melange.com/gci/task/view/google/gci2013/5891237794021376 spell checking mode in apertium-apy]
* [http://www.google-melange.com/gci/task/view/google/gci2013/5891237794021376 spell checking mode in apertium-apy]
* [http://www.google-melange.com/gci/task/view/google/gci2013/5434520367005696 integrate spell checker interface into html-tools]
* [http://www.google-melange.com/gci/task/view/google/gci2013/5434520367005696 integrate spell checker interface into html-tools]
* <s>[http://www.google-melange.com/gci/task/view/google/gci2013/5157832131346432 get language detection interface working]</s>
* [http://www.google-melange.com/gci/task/view/google/gci2013/5720782084767744 language detection mode in apertium-apy] (prototype done)


=== what to include ===
=== what to include ===
make the following pairs available to the site:
make the following pairs available to the site:
* pairs: '''kaz-tat''', '''tur-kir''', kaz-kir, tat-bak, kaz-kaa, tuk-tur?, tur-uzb?
* pairs: '''kaz-tat''', '''tur-kir''', kaz-kir, tat-bak, kaz-kaa, tuk-tur?, tur-uzb?, kaz-eng?
* transducers: '''kaz''', '''tat''', kir, tur, bak, chv, kum, nog, kaa, uzb?, tuk?
* transducers: '''kaz''', '''tat''', kir, tur, bak, chv, kum, nog, kaa, uzb?, tuk?


=== prettifying ===
=== prettifying ===
* [https://www.google-melange.com/gci/task/view/google/gci2013/4981943422681088 localised language names in analysis and generation]
* <s>[http://www.google-melange.com/gci/task/view/google/gci2013/4981943422681088 localised language names in analysis, generation, and spell-check modes]</s>
* <s>[http://www.google-melange.com/gci/task/view/google/gci2013/5253445619548160 get a working theme together]</s>
* add a note (localised to various languages) along the lines of "Found a mistake? Help us fix it!" with link to [[Apertium Turkic]]
* <s>make sandbox mode disabled unless an appropriate switch is passed to apertium-html-tools</s>
* <s>add a note (localised to various languages) along the lines of "Found a mistake? Help us fix it!" with link to [[Apertium Turkic]]</s>

=== future ===
* consider including the web concordancer on the site (and consider what corpora to provide search access to...)


== Things that need to be figured out ==
== Things that need to be figured out ==
* [http://www.google-melange.com/gci/task/view/google/gci2013/5872152972623872 How can we count lexc stems effectively?] - JNW's bash script can be generalised (and rewritten in python), and it'll come close
* <s>[http://www.google-melange.com/gci/task/view/google/gci2013/5872152972623872 How can we count lexc stems effectively?] - JNW's bash script can be generalised (and rewritten in python), and it'll come close</s> see [[The_Right_Way_to_count_lexc_stems|The Right Way to count lexc stems]]


=== Issues introduced by new build process ===
=== Issues introduced by new build process ===
* How can we do single-category testvoc now?
* How can we do single-category testvoc now?
** Since Turkic languages have very few paradigms, we can just use a representative stem for each paradigm and do a testvoc on that prefix of the source-language transducer. Instructions to come.
* How can we make vanilla transducers (without MT-specific "wrong" POSes)
* How can we make vanilla transducers (without MT-specific "wrong" POSes)
** The problem is that "! Use/xxx-yyy" lines can't just be grepped out in the vanilla transducer anymore, since those are needed for the xxx-yyy transducers. That is, we're no longer just copying the lexc file, but copying the full transducer (no trimming before compilation), and trimming the transducer directly (based on the bidix) for use in pairs. Ideas: [[Apertium Turkic/Use/MT]]
* How can we count trimmed stems?
* How can we count trimmed stems?
** Counting unique stems on each side of the bidix should give us the equivalent.

[[Category:TODO lists]]

Latest revision as of 21:24, 19 August 2015

This is a general to-do list for the Apertium Turkic working group.

Website[edit]

This section outlines what's left to get http://turkic.apertium.com/ up and running.

software infrastructure[edit]

optional: spell checker and language detection stuff[edit]

what to include[edit]

make the following pairs available to the site:

  • pairs: kaz-tat, tur-kir, kaz-kir, tat-bak, kaz-kaa, tuk-tur?, tur-uzb?, kaz-eng?
  • transducers: kaz, tat, kir, tur, bak, chv, kum, nog, kaa, uzb?, tuk?

prettifying[edit]

future[edit]

  • consider including the web concordancer on the site (and consider what corpora to provide search access to...)

Things that need to be figured out[edit]

Issues introduced by new build process[edit]

  • How can we do single-category testvoc now?
    • Since Turkic languages have very few paradigms, we can just use a representative stem for each paradigm and do a testvoc on that prefix of the source-language transducer. Instructions to come.
  • How can we make vanilla transducers (without MT-specific "wrong" POSes)
    • The problem is that "! Use/xxx-yyy" lines can't just be grepped out in the vanilla transducer anymore, since those are needed for the xxx-yyy transducers. That is, we're no longer just copying the lexc file, but copying the full transducer (no trimming before compilation), and trimming the transducer directly (based on the bidix) for use in pairs. Ideas: Apertium Turkic/Use/MT
  • How can we count trimmed stems?
    • Counting unique stems on each side of the bidix should give us the equivalent.