Difference between revisions of "Apertium-uzb"
Firespeaker (talk | contribs) (Created page with '== Developers == === Dealing with Cyrillic and Latin === '''The plan''': there will be two separate lexcs and twols (.lat and .cyr) with the continuation lexica and rules and al…') |
Firespeaker (talk | contribs) |
||
Line 3: | Line 3: | ||
=== Dealing with Cyrillic and Latin === |
=== Dealing with Cyrillic and Latin === |
||
'''The plan''': there will be two separate lexcs and twols (.lat and .cyr) with the continuation lexica and rules and all, though you may be able to get by with one twol considering how simple things are. There will also be a master .dix, in Latin, with comments in a standarised format in Cyrillic (also possible the other way around). |
'''The plan''': there will be two separate lexcs and twols (.lat and .cyr) with the continuation lexica and rules and all, though you may be able to get by with one twol considering how simple things are. There will also be a master .dix, in Latin, with comments in a standarised format in Cyrillic (also possible the other way around). |
||
There will also be a simple script to check for dix entries without Cyrillic comments in the standard format in the master .dix, and automatically generate them, updating the Cyrillic dix, outputting "TOCHECK" or something in a comment with the converted words. Someone then goes through and checks anything with "TOCHECK", and fixes / gets rid of "TOCHECK". |
There will also be a simple script to check for dix entries without Cyrillic comments in the standard format in the master .dix, and automatically generate them, updating the Cyrillic dix, outputting "TOCHECK" or something in a comment with the converted words. Someone then goes through and checks anything with "TOCHECK", and fixes / gets rid of "TOCHECK". |
||
This is how we can trivially "convert" the dix to Cyrillic, and even convert the stems in lexc when we copy/update it from -uzb. |
This is how we can trivially "convert" the dix to Cyrillic, and even convert the stems in lexc when we copy/update it from -uzb. |
Revision as of 21:23, 18 May 2013
Developers
Dealing with Cyrillic and Latin
The plan: there will be two separate lexcs and twols (.lat and .cyr) with the continuation lexica and rules and all, though you may be able to get by with one twol considering how simple things are. There will also be a master .dix, in Latin, with comments in a standarised format in Cyrillic (also possible the other way around).
There will also be a simple script to check for dix entries without Cyrillic comments in the standard format in the master .dix, and automatically generate them, updating the Cyrillic dix, outputting "TOCHECK" or something in a comment with the converted words. Someone then goes through and checks anything with "TOCHECK", and fixes / gets rid of "TOCHECK".
This is how we can trivially "convert" the dix to Cyrillic, and even convert the stems in lexc when we copy/update it from -uzb.