Difference between revisions of "Apertium-ara-heb"
Jump to navigation
Jump to search
(4 intermediate revisions by the same user not shown) | |||
Line 8: | Line 8: | ||
* http://starling.rinet.ru/cgi-bin/response.cgi?root=config&morpho=0&basename=\data\semham\semet&first=1 big and scary database of semitic etymology |
* http://starling.rinet.ru/cgi-bin/response.cgi?root=config&morpho=0&basename=\data\semham\semet&first=1 big and scary database of semitic etymology |
||
** http://www.semiticroots.net/index.php?r=root/view&id=1 same thing maybe? |
** http://www.semiticroots.net/index.php?r=root/view&id=1 same thing maybe? |
||
+ | * http://babelnet.org/search?word=قال&lang=AR&langTrans=HE is NC |
||
+ | * http://compling.hss.ntu.edu.sg/omw/ – the Hebrew and Arabic wordnets are GPL and CC-BY-SA, resp. |
||
+ | |||
+ | Using http://compling.hss.ntu.edu.sg/omw/wns/arb.zip and http://compling.hss.ntu.edu.sg/omw/wns/heb.zip we can match up some thousands of entries: |
||
+ | <pre>$ get () { awk -vpos=$1 -F'\t' '$1~"-"pos"$" && $2=="lemma"' | sort; } |
||
+ | for pos in n v a r; do |
||
+ | echo -n "$pos " |
||
+ | join -j1 -t$'\t' <(get $pos < arb/wn-data-arb.tab) <(get $pos <heb/wn-data-heb.tab) | wc -l |
||
+ | done |
||
+ | n 4687 |
||
+ | v 1316 |
||
+ | a 115 |
||
+ | r 82 |
||
+ | </pre> |
||
+ | (We may have to strip the pronounciation diacritics off the Hebrew, should be scriptable?) |
Latest revision as of 09:23, 20 April 2015
Resources[edit]
- http://www.mideastweb.org/arabic_hebrew_eng.htm tiny ara-heb-eng dictionary
- https://en.wiktionary.org/wiki/Appendix:Afro-Asiatic_Swadesh_lists whynot
- http://www.morim.com/shemi.pdf some proper nouns transliterated
- https://glosbe.com/he/ar/ translation memories and dictionaries
- glosbe uses mostly open-source stuff … can we use what they use?
- http://web.archive.org/web/20000816051011/http://www.arava.org/Info_Center/Glossary/glossary.html glossary of environmental terms eng-heb-ara
- http://starling.rinet.ru/cgi-bin/response.cgi?root=config&morpho=0&basename=\data\semham\semet&first=1 big and scary database of semitic etymology
- http://www.semiticroots.net/index.php?r=root/view&id=1 same thing maybe?
- http://babelnet.org/search?word=قال&lang=AR&langTrans=HE is NC
- http://compling.hss.ntu.edu.sg/omw/ – the Hebrew and Arabic wordnets are GPL and CC-BY-SA, resp.
Using http://compling.hss.ntu.edu.sg/omw/wns/arb.zip and http://compling.hss.ntu.edu.sg/omw/wns/heb.zip we can match up some thousands of entries:
$ get () { awk -vpos=$1 -F'\t' '$1~"-"pos"$" && $2=="lemma"' | sort; } for pos in n v a r; do echo -n "$pos " join -j1 -t$'\t' <(get $pos < arb/wn-data-arb.tab) <(get $pos <heb/wn-data-heb.tab) | wc -l done n 4687 v 1316 a 115 r 82
(We may have to strip the pronounciation diacritics off the Hebrew, should be scriptable?)