Difference between revisions of "User:Jimregan"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
		
		
		
		
		
	
| m (Enriched Corpus of the Frequency Dictionary/A Grammar of the Polish Language) | m (anaphora) | ||
| (27 intermediate revisions by 4 users not shown) | |||
| Line 1: | Line 1: | ||
| {{TOCD}} | |||
| [[User:Jimregan/apertium-en-pl.pl.dix|Polish monodix]] | |||
| [[IRC]] nick: jimregan | |||
| [[User:Jimregan/apertium-en-pl.en-pl.dix|Polish-English monodix]] | |||
| Melange link_id: jimregan | |||
| == One-liners == | |||
| ==Polish-English texts under free licences== | |||
| <pre> | |||
| {{see-also|Corpora}} | |||
| filtered-expand () { if [ $1 = "rl" ]; then dir=":<:"; else dir=":>:";fi; lt-expand $2 |grep -v __REGEXP__|grep -v '^:<:'|grep -v ':>:$' |grep -v $dir ; } | |||
| *[http://www.oreilly.com/openbook/freedom/index.html Free As In Freedom] - [http://stallman.helion.pl/ W obronie wolności] ("In the Defense of Freedom") | |||
| select-expand () { if [ $1 = "lr" ]; then dir=":<:"; else dir=":>:";fi; lt-expand $2 |grep -v __REGEXP__|grep -v '^:<:'|grep -v ':>:$' |grep -v $dir ; } | |||
| *[http://www.gutenberg.org/dirs/etext04/lchch10.txt Chess and Checkers: the Way to Mastership] - [http://www.gutenberg.org/files/15201/15201-8.txt Szachy i Warcaby: Droga do mistrzostwa] | |||
| *[http://en.wikisource.org/wiki/The_Tragedy_of_Romeo_and_Juliet The Tragedy of Romeo and Juliet] - [http://pl.wikisource.org/wiki/Romeo_i_Julia Romeo i Julia] | |||
| *[http://en.wikisource.org/wiki/Robinson_Crusoe Robinson Crusoe] - [http://pl.wikisource.org/wiki/Robinson_Cruzoe Przypadki Robinsona Cruzoe] | |||
| **Wikisource has a mechanism where they ''try'' to present automatic bilingual editions of any works they have: see [http://pl.wikisource.org/wiki/Robinson_Cruzoe?match=en Robinson Crusoe] for example. Unfortunately, it doesn't work, as different choices have been made in the laying out of different language editions. But it looks interesting. | |||
| list-multiple () { select-expand $1 $2 | awk -F':|:<:|:>:' -v dir="$1" '{ if (dir == "lr") print "^" $1 "$"; else print "^" $2 "$" }' | lt-proc -b $3 | awk -F/ '(NF > 2) { print $0 }' ; } | |||
| ==Polish texts under free licences== | |||
| </pre> | |||
| *[http://www.mimuw.edu.pl/polszczyzna/ Enriched Corpus of the Frequency Dictionary] - Monolingual corpus of Polish. Manually tagged. | |||
| == Anaphora resolution == | |||
| ⚫ | |||
| *[http://free.of.pl/g/grzegorj/gram/gram00.html A Grammar of the Polish Language] by Grzegorz Jagodziński | |||
| <pre> | |||
| [00:17]  <jimregan> anaphora is one of those polarising things about MT | |||
| [00:18]  <jimregan> RBMT is like a 1920s man's man: a man's a man, even if he's a woman | |||
| [00:18]  <jimregan> SMT is like a Thai prostitute: sometimes it's a man, sometimes it's a woman, sometimes it's both | |||
| </pre> | |||
| == Random IRC == | |||
| <pre> | |||
| <jimregan>      pl->cs adjectives up 17% | |||
| <jimregan>      like they friggin' shares or somthing | |||
| <jimregan>      *they're | |||
| <jimregan>      *something | |||
| <jimregan>      damn | |||
| <spectie>       haha | |||
| <Kanmuri>       "In market news today, PLCS ADJ was up 17%, while JR TYPNG was down 25%" ;D | |||
| </pre> | |||
| == On spectie and questions... == | |||
| <pre> | |||
| <jimregan2> still though | |||
| <jimregan2> if the question ever comes about how to shoot your own leg off, you'd happily discuss aiming techniques | |||
| <spectie> haha | |||
| <spectie> ...or chop of your own legs with an axe while sitting in a wheelchair... | |||
| <jimregan2> Fuck! You've /thought/ about it! | |||
| <spectie> http://jayg123.googlepages.com/bestexitinterviewever | |||
| <spectie> LOL | |||
| </pre> | |||
| ⚫ | |||
| *[http://www.mimuw.edu.pl/~jsbien/BW/SSSP/SSSP.tex Polish-Swahili] | |||
| ==Untranslatable==  | |||
|  - Wczoraj, bandyta napadł mię (Yesterday, a bandit attacked me) | |||
|  - Co się stało? (What happened?) | |||
|  - Mówił pieniądze albo śmierć (He said "money or death") | |||
|  - A co zrobiłeś? (What did you do?) | |||
|  - Ale śmierdziałem! (Oh, but I stank) | |||
| (Voiced consonants in Polish become devoiced at the end of words, so "śmierdź" and "śmierć" sound the same.) | |||
| [[Category:Users|Jimregan]] | [[Category:Users|Jimregan]] | ||
Latest revision as of 23:02, 28 March 2012
IRC nick: jimregan
Melange link_id: jimregan
One-liners[edit]
filtered-expand () { if [ $1 = "rl" ]; then dir=":<:"; else dir=":>:";fi; lt-expand $2 |grep -v __REGEXP__|grep -v '^:<:'|grep -v ':>:$' |grep -v $dir ; }
select-expand () { if [ $1 = "lr" ]; then dir=":<:"; else dir=":>:";fi; lt-expand $2 |grep -v __REGEXP__|grep -v '^:<:'|grep -v ':>:$' |grep -v $dir ; }
list-multiple () { select-expand $1 $2 | awk -F':|:<:|:>:' -v dir="$1" '{ if (dir == "lr") print "^" $1 "$"; else print "^" $2 "$" }' | lt-proc -b $3 | awk -F/ '(NF > 2) { print $0 }' ; }
Anaphora resolution[edit]
[00:17] <jimregan> anaphora is one of those polarising things about MT [00:18] <jimregan> RBMT is like a 1920s man's man: a man's a man, even if he's a woman [00:18] <jimregan> SMT is like a Thai prostitute: sometimes it's a man, sometimes it's a woman, sometimes it's both
Random IRC[edit]
<jimregan> pl->cs adjectives up 17% <jimregan> like they friggin' shares or somthing <jimregan> *they're <jimregan> *something <jimregan> damn <spectie> haha <Kanmuri> "In market news today, PLCS ADJ was up 17%, while JR TYPNG was down 25%" ;D
On spectie and questions...[edit]
<jimregan2> still though <jimregan2> if the question ever comes about how to shoot your own leg off, you'd happily discuss aiming techniques <spectie> haha <spectie> ...or chop of your own legs with an axe while sitting in a wheelchair... <jimregan2> Fuck! You've /thought/ about it! <spectie> http://jayg123.googlepages.com/bestexitinterviewever <spectie> LOL
Polish dictionaries[edit]
Untranslatable[edit]
- Wczoraj, bandyta napadł mię (Yesterday, a bandit attacked me) - Co się stało? (What happened?) - Mówił pieniądze albo śmierć (He said "money or death") - A co zrobiłeś? (What did you do?) - Ale śmierdziałem! (Oh, but I stank)
(Voiced consonants in Polish become devoiced at the end of words, so "śmierdź" and "śmierć" sound the same.)

