Difference between revisions of "Beginner's Constraint Grammar HOWTO"
(→Usage) |
(wget -> curl) |
||
(10 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
[[Installation et fonctionnement de Constraint Grammar|En français]] |
|||
''The installation part for Apertium and language pairs described below refer to Ubuntu distribution. For others Linux distributions or others operating systems, let see the general [[Installation]] page''. |
|||
==Download== |
==Download== |
||
;Apertium |
;Apertium |
||
Sourced from [[Install Apertium core using packaging]] |
|||
How to download Apertium for [http://www.ubuntu.com/ Ubuntu]. First open your terminal and copy/paste |
|||
First, remove any Apertium packages you have installed from operating system repositories. They will be out-of-date, sometimes by years. |
|||
Add the repository, |
|||
<pre> |
|||
# Pick one: |
|||
# Nightly, unstable, new, almost always use this: |
|||
First we have to install prerequisites. |
|||
curl -sS https://apertium.projectjj.com/apt/install-nightly.sh | sudo bash |
|||
*Open terminal and copy/paste this code : |
|||
'''sudo apt-get install subversion build-essential g++ pkg-config libxml2 \''' |
|||
# Release, stable, old: |
|||
'''libxml2-dev libxml2-utils xsltproc flex automake autoconf libtool libpcre3-dev ''' |
|||
curl -sS https://apertium.projectjj.com/apt/install-release.sh | sudo bash |
|||
</pre> |
|||
*Then terminal will ask for your password like this: '''[sudo] password for user'''.When you write it press '''Enter'''. |
|||
If you have already prerequisites, it will show you '''X upgraded, X newly installed, X to remove and X not upgraded.''' |
|||
If you don't have it,you have to wait until terminal show you '''user@ubuntu:~$'''.This mean the process is ready(downaload and instal prerequisites) and terminal wait for your next step, which is to copy/paste this code: |
|||
You should see messages. |
|||
'''svn co http://apertium.svn.sourceforge.net/svnroot/apertium/trunk apertium''' |
|||
Install dev tools, |
|||
This will download apertium from SVN.The process will take a few minutes. When the downloading ends we are ready to install apertium. |
|||
<pre> |
|||
sudo apt-get -f install apertium-all-dev |
|||
</pre> |
|||
====About the Debian repository install==== |
|||
Check the script installed Apertium repository details, |
|||
<pre> |
|||
apt-cache policy | grep apertium |
|||
</pre> |
|||
Unfortunately, due to the seamless upgrading of Debian packaging, it is difficult to see which packages the new repository has added, and where. Even Synaptic, the wonder GUI, has no way through. You could try this brute force commandline, |
|||
<pre> |
|||
find /var/lib/apt/lists/ |grep projectjj.*Packages | xargs grep -h Package |
|||
</pre> |
|||
Which will, if nothing else, tell you a lot about byways of the Apertium project. |
|||
;Constraint grammar |
;Constraint grammar |
||
To use CG we must have lttoolbox(we have it),apertium(we have it too) and ICU(we have to install it now). |
To use CG we must have lttoolbox (we have it), apertium (we have it too) and ICU (we have to install it now). |
||
How to install ICU for |
How to install ICU for Ubuntu. Open terminal and copy/paste this code: |
||
apt-get install libicu-dev |
|||
Now we can install and CG. |
Now we can install apertium, lttoolbox and CG. |
||
==Install== |
==Install== |
||
Line 35: | Line 61: | ||
;Apertium |
;Apertium |
||
Before installing apertium we have to install lttoolbox(which has been downloaded |
Before installing apertium we have to install lttoolbox(which has been downloaded with apertium at same time).To do that you have to copy/paste this code: |
||
'''cd apertium''' |
'''cd apertium''' |
||
Line 96: | Line 122: | ||
=Usage= |
=Usage= |
||
For the examples below, we use the language pair apertium-es-ca, but the principles should be applicable to any language pair. First we have to compile this pair. Go into the directory from where you |
For the examples below, we use the language pair apertium-es-ca, but the principles should be applicable to any language pair. First we have to compile this pair. Go into the directory from where you installed Apertium, then |
||
cd apertium/apertium-es-ca |
cd apertium/apertium-es-ca |
||
Line 110: | Line 136: | ||
^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>/lo<prn><pro><p3><f><sg>$ ^playa/playa<n><f><sg>$ |
^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>/lo<prn><pro><p3><f><sg>$ ^playa/playa<n><f><sg>$ |
||
Here we have ambiguities,one between a noun and a verb and other between a determiner and a pronoun.We can write some rules which can impose |
Here we have ambiguities,one between a noun and a verb and other between a determiner and a pronoun.We can write some rules which can impose to categorize between two ambiguities.First we define our categories, these can be tags, wordforms or lemmas. It might help to think of them as "coarse tags", which may involve a set of fine tags or lemmas. So, create a file grammar.txt, and add the following text: |
||
'''DELIMITERS = "<$.>" ;''' |
|||
'''LIST NOUN = n;''' |
|||
'''LIST VERB = vblex;''' |
|||
'''LIST DET = det;''' |
|||
'''LIST PRN = prn;''' |
|||
'''LIST PREP = pr;''' |
|||
'''SECTION''' |
|||
DELIMITERS = "<$.>" ; |
|||
LIST NOUN = n; |
|||
LIST VERB = vblex; |
|||
LIST DET = det; |
|||
LIST PRN = prn; |
|||
LIST PREP = pr; |
|||
SECTION |
|||
So first rule is states "When the current lexical unit can be a pronoun or a determiner, and it is followed on the right by a lexical unit which could be a noun, choose the determiner". We have to add this rule to the file, and compile using cg-comp: |
So first rule is states "When the current lexical unit can be a pronoun or a determiner, and it is followed on the right by a lexical unit which could be a noun, choose the determiner". We have to add this rule to the file, and compile using cg-comp: |
||
Line 132: | Line 151: | ||
# 1 |
|||
SELECT DET IF |
|||
(0 DET) |
|||
(0 PRN) |
|||
(1 NOUN) ; |
|||
compile with: |
|||
'''SELECT DET IF''' |
|||
''' (0 DET)''' |
|||
''' (0 PRN)''' |
|||
''' (1 NOUN) ;''' |
|||
adding: |
|||
'''$ ./cg-comp grammar.txt grammar.bin''' |
|||
'''Sections: 1, Rules: 1, Sets: 6, Tags: 7''' |
|||
$ ./cg-comp grammar.txt grammar.bin |
|||
Sections: 1, Rules: 1, Sets: 6, Tags: 7 |
|||
To try what we have done copy/paste this code: |
To try what we have done copy/paste this code: |
||
$ echo "vino a la playa" | lt-proc es-ca.automorf.bin | cg-proc grammar.bin |
$ echo "vino a la playa" | lt-proc es-ca.automorf.bin | cg-proc grammar.bin |
||
^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>$ ^playa/playa<n><f><sg>$ |
^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>$ ^playa/playa<n><f><sg>$ |
||
Line 161: | Line 173: | ||
rule: |
rule: |
||
# 2 |
|||
REMOVE NOUN IF |
|||
(0 NOUN) |
|||
'''REMOVE NOUN IF''' |
|||
(0 VERB) |
|||
(1 PREP) |
|||
(2 DET) ; |
|||
''' (0 VERB)''' |
|||
''' (1 PREP)''' |
|||
''' (2 DET) ;''' |
|||
re-compile the grammar and test: |
re-compile the grammar and test: |
||
$ echo "vino a la playa" | lt-proc es-ca.automorf.bin | cg-proc grammar.bin |
|||
^vino/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>$ ^playa/playa<n><f><sg>$ |
|||
'''^vino/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>$ ^playa/playa<n><f><sg>$''' |
|||
Third rule is states "Remove interjection if the preceeding word is a modal verb." |
|||
For example: |
|||
'''echo "Ах боли ме" | lt-proc bg-mk.automorf.bin''' |
|||
'''^Ах/*Ах$ ^боли/*боли$ ^ме/clitic<prn><pers><clt><p1><mfn><sg><acc>''' |
|||
Fourth rule is states "Select preposition reading when the word could be a preposition or an adverb and it is followed by a Noun or Pre-Noun." |
|||
For example: |
|||
'''echo "oколо 2 млрд. евро" | lt-proc bg-mk.automorf.bin''' |
|||
'''^около/около<pr>/около<adv>$ ^2/2<num>$ ^млрд./милиард<num><mfn><sg><nom><ind>$ ^евро/евро<n><nt><sg><nom><ind>''' |
|||
Fifth rule is state "Remove 2nd person singular verb reading if there is no second person singular pronoun." |
|||
For example: |
|||
'''echo "Всеки трябва да заяви около" | lt-proc bg-mk.automorf.bin''' |
|||
'''^Всеки/Всеки<prn><tot><mfn><pl><nom>/Всеки<prn><tot><m><sg><nom>$ ^трябва/трябва<vbmod><pres><p3><sg>/трябва<vbmod><aor><p2>''' |
|||
'''<sg>/трябва<vbmod><aor><p3><sg>$ ^да/да<part>/да<ij>$ ^заяви/заяви<vblex><perf><tv><imp><sg>/заяви<vblex><perf><tv><pres><p3><sg>''' |
|||
'''/заяви<vblex><perf><tv><aor><p2><sg>/заяви<vblex><perf><tv><aor><p3><sg>$ ^около/около<pr>/около<adv>$''' |
|||
Third rule states "Remove interjection if the preceeding word is a modal verb." |
|||
[[Category:Documentation]] |
[[Category:Documentation in English]] |
Latest revision as of 20:55, 2 April 2021
The installation part for Apertium and language pairs described below refer to Ubuntu distribution. For others Linux distributions or others operating systems, let see the general Installation page.
Download[edit]
- Apertium
Sourced from Install Apertium core using packaging First, remove any Apertium packages you have installed from operating system repositories. They will be out-of-date, sometimes by years.
Add the repository,
# Pick one: # Nightly, unstable, new, almost always use this: curl -sS https://apertium.projectjj.com/apt/install-nightly.sh | sudo bash # Release, stable, old: curl -sS https://apertium.projectjj.com/apt/install-release.sh | sudo bash
You should see messages.
Install dev tools,
sudo apt-get -f install apertium-all-dev
About the Debian repository install[edit]
Check the script installed Apertium repository details,
apt-cache policy | grep apertium
Unfortunately, due to the seamless upgrading of Debian packaging, it is difficult to see which packages the new repository has added, and where. Even Synaptic, the wonder GUI, has no way through. You could try this brute force commandline,
find /var/lib/apt/lists/ |grep projectjj.*Packages | xargs grep -h Package
Which will, if nothing else, tell you a lot about byways of the Apertium project.
- Constraint grammar
To use CG we must have lttoolbox (we have it), apertium (we have it too) and ICU (we have to install it now).
How to install ICU for Ubuntu. Open terminal and copy/paste this code:
apt-get install libicu-dev
Now we can install apertium, lttoolbox and CG.
Install[edit]
- Apertium
Before installing apertium we have to install lttoolbox(which has been downloaded with apertium at same time).To do that you have to copy/paste this code:
cd apertium
cd lttoolbox/
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh
make
sudo make install
sudo ldconfig
Terminal will ask us for password again [sudo] password for user: When you write it press Enter.
Wait to show you terminal user@ubuntu:~/apertium/lttoolbox$ then copy/paste this code:
cd ..
cd apertium/
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh
make
sudo make install
sudo ldconfig
This will start installing apertium.You have to wait a few minutes.When shows you
vasil@ubuntu:~/apertium/apertium$ sudo ldconfig
vasil@ubuntu:~/apertium/apertium$
the process is ready.
- Constraint grammar
How to install CG.Open terminal and copy/paste this code:
$ svn co --username anonymous --password anonymous http://beta.visl.sdu.dk/svn/visl/tools/vislcg3/trunk vislcg3
$ cd vislcg3
$ sh autogen.sh --prefix=<prefix>
$ make
$ make install
It will ask you for password [sudo] password for user: . When you write it press Enter.
We are ready.
Usage[edit]
For the examples below, we use the language pair apertium-es-ca, but the principles should be applicable to any language pair. First we have to compile this pair. Go into the directory from where you installed Apertium, then
cd apertium/apertium-es-ca sh autogen.sh make
Let's try that what we installed is working. First copy/paste this code:
echo "vino a la playa" | lt-proc es-ca.automorf.bin
This should give you:
^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>/lo<prn><pro><p3><f><sg>$ ^playa/playa<n><f><sg>$
Here we have ambiguities,one between a noun and a verb and other between a determiner and a pronoun.We can write some rules which can impose to categorize between two ambiguities.First we define our categories, these can be tags, wordforms or lemmas. It might help to think of them as "coarse tags", which may involve a set of fine tags or lemmas. So, create a file grammar.txt, and add the following text:
DELIMITERS = "<$.>" ; LIST NOUN = n; LIST VERB = vblex; LIST DET = det; LIST PRN = prn; LIST PREP = pr; SECTION
So first rule is states "When the current lexical unit can be a pronoun or a determiner, and it is followed on the right by a lexical unit which could be a noun, choose the determiner". We have to add this rule to the file, and compile using cg-comp:
rule:
# 1 SELECT DET IF (0 DET) (0 PRN) (1 NOUN) ;
compile with:
$ ./cg-comp grammar.txt grammar.bin Sections: 1, Rules: 1, Sets: 6, Tags: 7
To try what we have done copy/paste this code:
$ echo "vino a la playa" | lt-proc es-ca.automorf.bin | cg-proc grammar.bin ^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>$ ^playa/playa<n><f><sg>$
Second rule is states "When the current lexical unit can be a noun or a verb, if the subsequent two units to the right are preposition and determiner, remove the noun reading." Now we have to add this rule:
rule:
# 2 REMOVE NOUN IF (0 NOUN) (0 VERB) (1 PREP) (2 DET) ;
re-compile the grammar and test:
$ echo "vino a la playa" | lt-proc es-ca.automorf.bin | cg-proc grammar.bin ^vino/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>$ ^playa/playa<n><f><sg>$
Third rule states "Remove interjection if the preceeding word is a modal verb."