Difference between revisions of "Languages"

From Apertium
Jump to navigation Jump to search
Line 17: Line 17:
 
| apertium-bak|| [[Bashkir]] ||align="right"| {{#lst:apertium-bak/stats|stems}} ||align="center"| [[Apertium-bak#Current_State|~{{:Apertium-bak/stats/average}}%]]
 
| apertium-bak|| [[Bashkir]] ||align="right"| {{#lst:apertium-bak/stats|stems}} ||align="center"| [[Apertium-bak#Current_State|~{{:Apertium-bak/stats/average}}%]]
 
|-
 
|-
| apertium-ben|| Bengali ||align="right"| - ||align="center"| -
+
| apertium-ben|| [[Bengali]] ||align="right"| {{#lst:apertium-ben/stats|stems}} ||align="center"| -
 
|-
 
|-
 
| apertium-bul|| Bulgarian ||align="right"| {{#lst:apertium-bul/stats|stems}} ||align="center"| -
 
| apertium-bul|| Bulgarian ||align="right"| {{#lst:apertium-bul/stats|stems}} ||align="center"| -
Line 35: Line 35:
 
| apertium-hbs|| Serbo-Croatian ||align="right"| {{#lst:apertium-hbs/stats|stems}} ||align="center"| -
 
| apertium-hbs|| Serbo-Croatian ||align="right"| {{#lst:apertium-hbs/stats|stems}} ||align="center"| -
 
|-
 
|-
| apertium-hin|| Hindi ||align="right"| - ||align="center"| -
+
| apertium-hin|| [[Hindi]] ||align="right"| {{#lst:apertium-hin/stats|stems}} ||align="center"| -
 
|-
 
|-
 
| apertium-hye|| Armenian ||align="right"| - ||align="center"| -
 
| apertium-hye|| Armenian ||align="right"| - ||align="center"| -
Line 63: Line 63:
 
| apertium-rus|| [[Russian]] ||align="right"| {{#lst:apertium-rus/stats|stems}} ||align="center"| -
 
| apertium-rus|| [[Russian]] ||align="right"| {{#lst:apertium-rus/stats|stems}} ||align="center"| -
 
|-
 
|-
| apertium-san|| Sanskrit ||align="right"| - ||align="center"| -
+
| apertium-san|| [[Sanskrit]] ||align="right"| {{#lst:apertium-san/stats|stems}} ||align="center"| -
 
|-
 
|-
 
| apertium-slv|| Slovenian ||align="right"| {{#lst:apertium-slv/stats|stems}} ||align="center"| -
 
| apertium-slv|| Slovenian ||align="right"| {{#lst:apertium-slv/stats|stems}} ||align="center"| -
Line 79: Line 79:
 
| apertium-ukr|| [[Ukrainian]] ||align="right"| {{#lst:apertium-ukr/stats|stems}} ||align="center"| -
 
| apertium-ukr|| [[Ukrainian]] ||align="right"| {{#lst:apertium-ukr/stats|stems}} ||align="center"| -
 
|-
 
|-
| apertium-urd|| Urdu ||align="right"| - ||align="center"| -
+
| apertium-urd|| [[Urdu]] ||align="right"| {{#lst:apertium-urd/stats|stems}} ||align="center"| -
 
|-
 
|-
 
| apertium-uzb|| [[Uzbek]] ||align="right"| {{#lst:apertium-uzb/stats|stems}} ||align="center"| [[Apertium-uzb#Current_State|~{{:Apertium-uzb/stats/average}}%]]
 
| apertium-uzb|| [[Uzbek]] ||align="right"| {{#lst:apertium-uzb/stats|stems}} ||align="center"| [[Apertium-uzb#Current_State|~{{:Apertium-uzb/stats/average}}%]]

Revision as of 01:53, 23 November 2013

If you are looking for the category, click here

Languages is a module of the SVN where monolingual language data lives. Monolingual language data in Apertium is slowly being moved to this new repository scheme. If you feel something is missing, please feel free to contact us.

New monolingual packages should be developed in incubator until they're minimally useful, at which point they can go in languages. There is no fixed criterion for what constitutes a minimally-useful language package; generally, however, a language package should have over 60% coverage on a variety of corpora and should probably have at least 2500 stems to be considered minimally useful.

The languages module can be found in svn at https://svn.code.sf.net/p/apertium/svn/languages .

Contents

Languages by coverage

Module Language Entries Coverage
apertium-bak Bashkir 46,501 ~66%
apertium-ben Bengali 8,230 -
apertium-bul Bulgarian 8,578 -
apertium-ces Czech 41,199 ~90.5%
apertium-chv Chuvash 10,267 ~85%
apertium-dan Danish 52,133 -
apertium-ell Greek - -
apertium-fao Faroese - -
apertium-gla Scottish Gaelic - -
apertium-hbs Serbo-Croatian 58,004 -
apertium-hin Hindi 37,833 -
apertium-hye Armenian - -
apertium-isl Icelandic - -
apertium-kaz Kazakh 36,595 ~94.5%
apertium-kir Kyrgyz 14,424 ~90.4%
apertium-kum Kumyk 4,918 ~90.2%
apertium-lvs Latvian - -
apertium-mkd Macedonian 30,686 -
apertium-mlt Maltese - -
apertium-nld Dutch - -
apertium-nno Norwegian Nynorsk 182,497 -
apertium-nob Norwegian Bokmål 246,281 -
apertium-nog Nogay 1,385 ~81.4%
apertium-rus Russian 126,833 -
apertium-san Sanskrit 123,373 -
apertium-slv Slovenian 20,596 -
apertium-sqi Albanian - -
apertium-swe Swedish - -
apertium-tat Tatar 55,702 ~91%
apertium-tuk Turkmen 2,988 ~70.7%
apertium-tur Turkish 17,221 ~87.3%
apertium-ukr Ukrainian 10,709 -
apertium-urd Urdu 14,943 -
apertium-uzb Uzbek 34,470 ~82.9%

Languages by family

Requiring a monolingual package as a dependency of a pair

Say you want apertium-fie-bar to depend on some monolingual data from the apertium-bar package, e.g. apertium-bar/bar.rlx and maybe other such files.

This requires a recent version of apertium (-r48374 or later), and that you've exported PKG_CONFIG_PATH as described at Minimal_installation_from_SVN.

Assuming apertium-bar is set up correctly (see next section), you can put the following line into the configure.ac of apertium-fie-bar:

AP_CHECK_LING([2], [apertium-bar])

and in the Makefile.am, you can write rules like this:

bar-fie.rlx.bin: $(AP_SRC2)/bar.rlx
	cg-comp $< $@

bar-tat.automorf.bin: $(AP_LIB2)/bar.automorf.bin
	cp $< $@

Similarly for apertium-fie (with AP_CHECK_LING([1], [apertium-fie])). By convention, a language pair called apertium-fie-bar should use the number 1 for fie and 2 for bar (though variants like 1b are possible too). Also by convention, AP_SRC should point to source files and AP_LIB to compiled binaries (this is the responsibility of the monolingual package, e.g. apertium-bar).

Now if you've typed "make install" in apertium-bar before running autogen.sh in apertium-fie-bar, apertium-fie-bar will use the bar.rlx and bar.automorf.bin which are installed by apertium-bar.


If you often make a lot of changes to apertium-bar and want to avoid having to "make install" for each and every change, you can do this in apertium-fie-bar:

./autogen.sh --with-lang2=/path/to/apertium-bar

Now each time you make, the "AP_SRC2" and "AP_LIB2" variables will both point to /path/to/apertium-bar instead of the "make install"-ed files. You can set it back to default by just running plain autogen.sh (or ./configure) again.


See Installation_troubleshooting#AP_CHECK_LING_not_found_when_running_configure_or_autogen.sh if you run into errors about AP_CHECK_LING.

Making a monolingual package dependable for pairs

In apertium-bar, there should be a file apertium-bar.pc.in. This has to have the following lines:

dir=@libdir@/apertium/apertium-bar
srcdir=@datarootdir@/apertium/apertium-bar

These should correspond to where the binaries and source files respectively are installed by the Makefile.am in the monolingual package (typically the makefile names these directories apertium_bardir and apertium_bar_srcdir).

The configure.ac should have a line saying something like AC_OUTPUT([Makefile apertium-bar.pc]). See https://svn.code.sf.net/p/apertium/svn/languages/apertium-nob for a working example.

Compiled / binary files should be listed in TARGETS_COMMON as usual, while any source files can be installed using install-data-local, e.g.:

apertium_bar_srcdir=$(prefix)/share/apertium/$(BASENAME)/
install-data-local:
	test -d $(DESTDIR)$(apertium_bar_srcdir) || mkdir -p $(DESTDIR)$(apertium_bar_srcdir)
	$(INSTALL_DATA) $(BASENAME).$(LANG1).dix $(DESTDIR)$(apertium_bar_srcdir)

Now if the apertium-fie-bar pair depends on apertium-bar as its lang2, it can refer to binaries (apertium-bar's TARGETS_COMMON) using $(AP_LIB2) and source files using $(AP_SRC2), e.g. $(AP_SRC2)/apertium-$(LANG2).$(LANG2).dix for the dix file in the install-data-local example above.

See also