Difference between revisions of "Asturian"
Jump to navigation
Jump to search
(New page: ==Resources== * Asturian Wiktionary — 120 nouns + genders + plural forms :Retrieved all pages, converted into speling format, and derived paradigms. Category:Languages) |
|||
Line 3: | Line 3: | ||
* Asturian Wiktionary — 120 nouns + genders + plural forms |
* Asturian Wiktionary — 120 nouns + genders + plural forms |
||
:Retrieved all pages, converted into [[speling format]], and derived paradigms. |
:Retrieved all pages, converted into [[speling format]], and derived paradigms. |
||
+ | |||
+ | * Asturian Wikipedia: |
||
+ | <pre> |
||
+ | # Make the file so that each line starts with a determiner |
||
+ | cat ast.crp.txt | sed 's/el /\nel /g' | sed 's/la /\nla /g' | sed 's/lo /\nlo /g' | sed 's/las /\nlas /g' | sed 's/les /\nles /g' | sed 's/los /\nlos /g' > ast.dets.txt |
||
+ | # Grep out the determiners |
||
+ | cat ast.dets.txt | grep -e '^el' -e '^la' -e '^lo' -e '^les' -e '^las' -e '^los' > dets.txt |
||
+ | # Grep out the lines starting with feminine determiners in plural followed by one word (hopefully a noun) |
||
+ | cat dets.txt | grep '^les' | sort | grep -v 'les y' | cut -f1,2 -d' ' | sort -u > det.les.txt |
||
+ | # Grep out the lines starting with feminine determiners in singular followed by one word (hopefully a noun) |
||
+ | cat dets.txt | grep '^la' | sort | grep -v 'la súa' | cut -f1,2 -d' ' | sort -u > det.la.txt |
||
+ | # Combine the two previous files |
||
+ | cat det.la.txt det.les.txt > det.la_les.txt |
||
+ | # Get extract style paradigms from existing dictionary |
||
+ | python /home/fran/scripts/apertium2extract.py /home/fran/svnroot/apertium/trunk/incubator/apertium-es-ast.ast.dix > EXT.PDMS.FEM.TXT |
||
+ | # Apply extract to the wordlist (hopefully) with only singular+plural feminine nouns |
||
+ | extract -nobad -utf8 -e -u -id EXT.PDMS.FEM.TXT det.la_les.txt | awk -F' ' '{print $2"; "$1"; "$3}' | sort -u > extract.la_les.out.txt |
||
+ | # Grep out the lines where both singular + plural were found |
||
+ | cat extract.la_les.out.txt | grep ',' > EX.txt |
||
+ | # Re-organise lines |
||
+ | cat EX.txt | sed 's/;/\t/g' | awk '{print $3"; "$2"; "$1}' | sed 's/;/\t\t\t/g' |
||
+ | |||
+ | </pre> |
||
+ | |||
[[Category:Languages]] |
[[Category:Languages]] |
Revision as of 16:06, 11 June 2008
Resources
- Asturian Wiktionary — 120 nouns + genders + plural forms
- Retrieved all pages, converted into speling format, and derived paradigms.
- Asturian Wikipedia:
# Make the file so that each line starts with a determiner cat ast.crp.txt | sed 's/el /\nel /g' | sed 's/la /\nla /g' | sed 's/lo /\nlo /g' | sed 's/las /\nlas /g' | sed 's/les /\nles /g' | sed 's/los /\nlos /g' > ast.dets.txt # Grep out the determiners cat ast.dets.txt | grep -e '^el' -e '^la' -e '^lo' -e '^les' -e '^las' -e '^los' > dets.txt # Grep out the lines starting with feminine determiners in plural followed by one word (hopefully a noun) cat dets.txt | grep '^les' | sort | grep -v 'les y' | cut -f1,2 -d' ' | sort -u > det.les.txt # Grep out the lines starting with feminine determiners in singular followed by one word (hopefully a noun) cat dets.txt | grep '^la' | sort | grep -v 'la súa' | cut -f1,2 -d' ' | sort -u > det.la.txt # Combine the two previous files cat det.la.txt det.les.txt > det.la_les.txt # Get extract style paradigms from existing dictionary python /home/fran/scripts/apertium2extract.py /home/fran/svnroot/apertium/trunk/incubator/apertium-es-ast.ast.dix > EXT.PDMS.FEM.TXT # Apply extract to the wordlist (hopefully) with only singular+plural feminine nouns extract -nobad -utf8 -e -u -id EXT.PDMS.FEM.TXT det.la_les.txt | awk -F' ' '{print $2"; "$1"; "$3}' | sort -u > extract.la_les.out.txt # Grep out the lines where both singular + plural were found cat extract.la_les.out.txt | grep ',' > EX.txt # Re-organise lines cat EX.txt | sed 's/;/\t/g' | awk '{print $3"; "$2"; "$1}' | sed 's/;/\t\t\t/g'