Difference between revisions of "Apertium-ara-heb"

From Apertium
Jump to navigation Jump to search
 
Line 10: Line 10:
 
* http://babelnet.org/search?word=قال&lang=AR&langTrans=HE is NC
 
* http://babelnet.org/search?word=قال&lang=AR&langTrans=HE is NC
 
* http://compling.hss.ntu.edu.sg/omw/ – the Hebrew and Arabic wordnets are GPL and CC-BY-SA, resp.
 
* http://compling.hss.ntu.edu.sg/omw/ – the Hebrew and Arabic wordnets are GPL and CC-BY-SA, resp.
  +
  +
Using http://compling.hss.ntu.edu.sg/omw/wns/arb.zip and http://compling.hss.ntu.edu.sg/omw/wns/heb.zip we can match up some thousands of entries:
 
<pre>$ get () { awk -vpos=$1 -F'\t' '$1~"-"pos"$" && $2=="lemma"' | sort; }
 
<pre>$ get () { awk -vpos=$1 -F'\t' '$1~"-"pos"$" && $2=="lemma"' | sort; }
  +
for pos in n v a r; do
for pos in n v a r; do echo -n "$pos "; join -j1 -t$'\t' <(get $pos < arb/wn-data-arb.tab) <(get $pos <heb/wn-data-heb.tab) |wc -l;done
 
  +
echo -n "$pos "
 
join -j1 -t$'\t' <(get $pos < arb/wn-data-arb.tab) <(get $pos <heb/wn-data-heb.tab) | wc -l
  +
done
 
n 4687
 
n 4687
 
v 1316
 
v 1316
Line 17: Line 22:
 
r 82
 
r 82
 
</pre>
 
</pre>
  +
(We may have to strip the pronounciation diacritics off the Hebrew, should be scriptable?)

Latest revision as of 09:23, 20 April 2015

Resources[edit]

Using http://compling.hss.ntu.edu.sg/omw/wns/arb.zip and http://compling.hss.ntu.edu.sg/omw/wns/heb.zip we can match up some thousands of entries:

$ get () { awk -vpos=$1 -F'\t' '$1~"-"pos"$" && $2=="lemma"' | sort; }
for pos in n v a r; do 
  echo -n "$pos "
  join -j1 -t$'\t' <(get $pos < arb/wn-data-arb.tab)  <(get $pos <heb/wn-data-heb.tab) | wc -l
done
n 4687
v 1316
a 115
r 82

(We may have to strip the pronounciation diacritics off the Hebrew, should be scriptable?)