Difference between revisions of "Languages"
Firespeaker (talk | contribs) (→Languages by coverage: Russian stems) |
Firespeaker (talk | contribs) m (→Languages by coverage: rus) |
||
Line 43: | Line 43: | ||
| apertium-nog|| [[Nogay]] || {{#lst:apertium-nog/stats|stems}} || [[Apertium-nog#Current_State|~{{:Apertium-nog/stats/average}}%]] |
| apertium-nog|| [[Nogay]] || {{#lst:apertium-nog/stats|stems}} || [[Apertium-nog#Current_State|~{{:Apertium-nog/stats/average}}%]] |
||
|- |
|- |
||
| apertium-rus|| Russian || {{#lst:apertium- |
| apertium-rus|| Russian || {{#lst:apertium-rus/stats|stems}} || - |
||
|- |
|- |
||
| apertium-sqi|| Albanian || - || - |
| apertium-sqi|| Albanian || - || - |
Revision as of 22:58, 11 November 2013
- If you are looking for the category, click here
Languages is a module of the SVN where monolingual language data lives. Monolingual language data in Apertium is slowly being moved to this new repository scheme. If you feel something is missing, please feel free to contact us.
New monolingual packages should be developed in incubator until they're minimally useful, at which point they can go in languages. There is no fixed criterion for what constitutes a minimally-useful language package; generally, however, a language package should have over 60% coverage on a variety of corpora and should probably have at least 2500 stems to be considered minimally useful.
The languages module can be found in svn at https://svn.code.sf.net/p/apertium/svn/languages .
Contents
Languages by coverage
Module | Language | Entries | Coverage |
---|---|---|---|
apertium-bak | Bashkir | 46,501 | ~66% |
apertium-bul | Bulgarian | 8,578 | - |
apertium-chv | Chuvash | 10,267 | ~85% |
apertium-dan | Danish | - | - |
apertium-gla | Scottish Gaelic | - | - |
apertium-hbs | Serbo-Croatian | 58,004 | - |
apertium-hye | Armenian | - | - |
apertium-kaz | Kazakh | 36,595 | ~94.5% |
apertium-kir | Kyrgyz | 14,424 | ~90.4% |
apertium-kum | Kumyk | 4,918 | ~90.2% |
apertium-lvs | Latvian | - | - |
apertium-mkd | Macedonian | 30,686 | - |
apertium-mlt | Maltese | - | - |
apertium-nog | Nogay | 1,385 | ~81.4% |
apertium-rus | Russian | 126,833 | - |
apertium-sqi | Albanian | - | - |
apertium-slv | Slovenian | 20,596 | - |
apertium-tat | Tatar | 55,702 | ~91% |
apertium-tuk | Turkmen | 2,988 | ~70.7% |
apertium-urd | Urdu | - | - |
apertium-uzb | Uzbek | 34,470 | ~82.9% |
Languages by family
- Turkic:
- Indo-European
- Slavic: Russian, Serbo-Croatian, Macedonian, Czech, Bulgarian, Ukranian
- Celtic: Scottish Gaelic
- Germanic
- West Germanic: Dutch
- North Germanic: Danish, Icelandic, Norwegian (nno, nob), Swedish, Faroese
- Indic: Urdu, Bengali, Hindi, Sanskrit
- Baltic: Latvian
- Other: Albanian, Armenian, Greek
- Semitic: Maltese
Requiring a monolingual package as a dependency of a pair
Say you want apertium-fie-bar to depend on some monolingual data from the apertium-bar package, e.g. apertium-bar/bar.lrx
and maybe other such files.
Assuming apertium-bar is set up correctly, and you've installed a recent version of apertium (-r48374 or later) you can put the following line into the configure.ac
of apertium-fie-bar:
AP_CHECK_LING([2], [apertium-bar])
and in the Makefile.am
, you can write rules like this:
bar-fie.lrx.bin: $AP_SRC2/bar.lrx lrx-comp $< $@ bar-tat.automorf.bin: $AP_LIB2/bar.automorf.bin cp $< $@
Similarly for apertium-fie (with AP_CHECK_LING([1], [apertium-fie])
). By convention, a language pair called apertium-fie-bar should use the number 1 for fie and 2 for bar.
Now if you've typed "make install" apertium-bar before running autogen.sh in apertium-fie-bar, apertium-fie-bar will use the bar.lrx and bar.automorf.bin which are installed by apertium-bar.
If you make a lot of changes to apertium-bar and want to avoid having to "make install" for each and every change, you can do this in apertium-fie-bar:
./autogen.sh --with-lang2=/path/to/apertium-bar
(./configure --with-lang2=/path/to/apertium-bar
should also work). Now each time you make, the "AP_SRC2" and "AP_LIB2" variables will both point to /path/to/apertium-bar instead of the "make install"-ed files. You can set it back to default by just running plain autogen.sh
(or ./configure
) again.
See Installation_troubleshooting#AP_CHECK_LING_not_found_when_running_configure_or_autogen.sh if you run into errors about AP_CHECK_LING.
Making a monolingual package dependable for pairs
In apertium-bar, there should be a file apertium-bar.pc.in
. This has to have the following lines:
dir=@libdir@/apertium/apertium-bar srcdir=@datarootdir@/apertium/apertium-bar
These should correspond to where the binaries and source files respectively are installed by Makefile.am
(typically named apertium_bardir
and apertium_bar_srcdir
). The configure.ac
should have a line saying something like AC_OUTPUT([Makefile apertium-bar.pc])
. See https://svn.code.sf.net/p/apertium/svn/languages/apertium-nob for a working example.
Compiled / binary files should be listed in TARGETS_COMMON as usual, while any source files can be installed using install-data-local, e.g.:
apertium_bar_srcdir=$(prefix)/share/apertium/$(BASENAME)/ install-data-local: test -d $(DESTDIR)$(apertium_bar_srcdir) || mkdir -p $(DESTDIR)$(apertium_bar_srcdir) $(INSTALL_DATA) $(BASENAME).$(LANG1).dix $(DESTDIR)$(apertium_bar_srcdir)