Difference between revisions of "User:Unhammer"
(New page: I am Kevin Brubeck Unhammer, currently studying for a Master's degree in Computational Linguistics at the University of Bergen, Norway. I'm interested in Nynorsk. * IRC: unh...) |
(→Quotes) |
||
(53 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
I am Kevin Brubeck Unhammer. |
|||
I am Kevin Brubeck Unhammer, currently studying for a Master's degree in Computational Linguistics at the University of Bergen, Norway. I'm interested in [[Norwegian|Nynorsk]]. |
|||
In Apertium, I work on |
|||
* IRC: unhammer |
|||
* [[Northern Sámi and Norwegian]], (which uses [[HFST]] for analysis), |
|||
* [http://www.student.uib.no/~kun041/ home page at UiB] |
|||
* [[Norwegian Nynorsk and Norwegian Bokmål]], originally with support from GsoC (see [[User:Unhammer/Application|project application]]), lately as part of NTB/NPK's [https://www.nm.no/app/uploads/2020/03/nt-02-19.pdf#page=17 Nynorskroboten], |
|||
* [[Special:Emailuser/Unhammer|email]] |
|||
* [[Lt-trim]] and [[Emacs]] modes, |
|||
* [[Apy]] |
|||
* the build system / autotools, |
|||
* and have dabbled with [[Faroese and English]] (using [[Matxin]] and [[CG]]), Georgian, Dutch, Maltese, … |
|||
I've studied computational linguistics / NLP at the University of Bergen, developed grammar checkers and Norwegian WordNets for Kaldera språkteknologi AS, and worked on Saami grammar checking, machine translation and corpus crawling for the University of Tromsø. |
|||
Me on the web: |
|||
* [[IRC]] nick: unhammer (most likely to reply between 8 and 15 CET) |
|||
* Google/GsoC/GCI link_id: unhammer |
|||
* [https://unhammer.org/k home page] |
|||
* [https://unhammer.wordpress.com dusty old blog] |
|||
* [https://github.com/unhammer GitHub] |
|||
* [https://nn.wikipedia.org/wiki/Brukar:Unhammer nn.wikipedia page] |
|||
* [[Special:Emailuser/Unhammer|email]] me |
|||
** [https://unhammer.org/766AC60C_unhammer@mm.st.asc.txt PGP/GPG key] |
|||
I have an Apertium [[/wishlist]]. |
|||
<center><big> |
|||
''♪ Unhaaamer Unhaaamer, He beat the Hun by luck. '' |
|||
''Unhaaamer Unhaaamer, he's smarter than a duck ♪'' |
|||
</big></center> |
|||
==Quotes== |
|||
<center> |
|||
They've a temper, some of them—particularly verbs: they're the proudest<br/> |
|||
—adjectives you can do anything with, but not verbs—however, <em>I</em> can<br/> |
|||
manage the whole lot of them! Impenetrability! That's what I say!<br/> |
|||
— Humpty Dumpty |
|||
</center> |
|||
<pre> |
|||
<Unhammer> Every time I start working on a new Apertium lang. pair, I get water damage in my apartment. |
|||
</pre> |
|||
<pre> |
|||
<spectie> Unhammer, are you sure you want to start working on ht-en |
|||
<spectie> what with your precarious plumbing situation ? |
|||
... |
|||
<Unhammer> back to sme-nob, hopefully averting more water damage |
|||
</pre> |
|||
<pre> |
|||
<Claude_Royet-Journoud> Une liste d'infinitifs prolonge l'accident. |
|||
</pre> |
|||
<center>But how powerful, how stimulating to the very faculty that produced it, <br/> |
|||
was the invention of the adjective: no spell or incantation in Faerie is more potent.<br/> |
|||
–J.R.R. Tolkien |
|||
</center> |
|||
<pre> |
|||
<miri> now the internet is back |
|||
<miri> but there's no water in my building |
|||
<miri> I hope there is no correlation ;) |
|||
</pre> |
|||
<pre> |
|||
<Unhammer> [-#Ipmil-] {+Ipmil+} |
|||
<Unhammer> blasphemy |
|||
</pre> |
|||
<center> |
|||
<em>the warm soft short pants of the quick-scribbler: the vocative lapse from which it begins and the accusative hole in which it ends itself</em> |
|||
– JJ |
|||
</center> |
|||
<center> |
|||
<em>There are a number of languages spoken by human beings in this world.</em> |
|||
– Harald Tveit Alvestrand, in RFC 1766, "Tags for the Identification of Languages" |
|||
</center> |
|||
IRC looks much better with some [http://www.vidarholen.net/contents/rage/ rage]. |
|||
<blockquote> |
|||
Er man nihilistisk nok, kunne man også ta det uskyldigste av alle ord, infinitivsmerket, og misbruke og skjende det på denne måte: Det begynte «å» regne. Kan man se på verden med mindre begeistring? |
|||
–Bjørneboe |
|||
</blockquote> |
|||
==Compounding is fun== |
|||
lemurtvillingene: lem|urt|villingene |
|||
nyrestaurert: nyre|staur|ert |
|||
angrepsoppstillinger: angrep|sopp|stillinger |
|||
snusleverandør: snus|leve|rand|ør |
|||
$ echo bildreportagen |apertium -d . swe-dan |
|||
billede #rids mugmide tagene |
|||
^einannan/ein<n><m><sg><ind><cmp>+nan<n><m><sg><ind><cmp>+nan<n><m><sg><ind>$ |
|||
… |
|||
$ echo nannannannannan|apertium -d . nno-nob-morph |
|||
^nannannannannan/nan<n><m><sg><ind><cmp>+nan<n><m><sg><ind><cmp>+nan<n><m><sg><ind><cmp>+nan<n><m><sg><ind><cmp>+nan<n><m><sg><ind>$^./.<sent><clb>$ |
|||
$ echo fornybart|apertium -d . nob-dan |
|||
fodernyoverskæg |
|||
Let's try compounding on verb+noun: |
|||
$ echo regionspresident | apertium -d . nob-nno_e |
|||
region spreie seie dent |
|||
noun+verb: |
|||
$ echo forringelse | apertium -d . nob-nno_e |
|||
for ringel sjå |
|||
noun+verb+adj: |
|||
$ echo autoritært|lt-proc -we nob-dan.automorf.bin |
|||
^autoritært/auto<n><mf><sp><cmp>+ri<vblex><inf><cmp>+tære<adj><pp><pl>/auto<n><mf><sp><cmp>+ri<vblex><inf><cmp>+tære<adj><pp><nt><sg><ind>/auto<n><mf><sp><cmp>+ri<vblex><inf><cmp>+tære<adj><pp><mf><sg><ind>$ |
|||
What if we allow turning double consonants into single before the compound border, then we can analyse compounds of words ending in double consonants followed by a word starting with the same consonant: |
|||
$ echo topprøve|apertium -d . nob-nno_e-morph|cg-conv |
|||
"<topprøve>" |
|||
"røve" adj pp pl |
|||
"topp" n m sg ind cmp |
|||
"prøve" n m sg ind |
|||
"topp" n m sg ind cmp detriple |
|||
but of course the analyser decompounding doesn't know that the second word has to actually start with that same consonant: |
|||
$ echo HurtigrutenLive|apertium -d . nob-nno_e |
|||
hurr TigruteinLive |
|||
[[Category:Users]] |
Latest revision as of 18:11, 12 December 2022
I am Kevin Brubeck Unhammer.
In Apertium, I work on
- Northern Sámi and Norwegian, (which uses HFST for analysis),
- Norwegian Nynorsk and Norwegian Bokmål, originally with support from GsoC (see project application), lately as part of NTB/NPK's Nynorskroboten,
- Lt-trim and Emacs modes,
- Apy
- the build system / autotools,
- and have dabbled with Faroese and English (using Matxin and CG), Georgian, Dutch, Maltese, …
I've studied computational linguistics / NLP at the University of Bergen, developed grammar checkers and Norwegian WordNets for Kaldera språkteknologi AS, and worked on Saami grammar checking, machine translation and corpus crawling for the University of Tromsø.
Me on the web:
- IRC nick: unhammer (most likely to reply between 8 and 15 CET)
- Google/GsoC/GCI link_id: unhammer
- home page
- dusty old blog
- GitHub
- nn.wikipedia page
- email me
I have an Apertium /wishlist.
♪ Unhaaamer Unhaaamer, He beat the Hun by luck.
Unhaaamer Unhaaamer, he's smarter than a duck ♪
Quotes[edit]
They've a temper, some of them—particularly verbs: they're the proudest
—adjectives you can do anything with, but not verbs—however, I can
manage the whole lot of them! Impenetrability! That's what I say!
— Humpty Dumpty
<Unhammer> Every time I start working on a new Apertium lang. pair, I get water damage in my apartment.
<spectie> Unhammer, are you sure you want to start working on ht-en <spectie> what with your precarious plumbing situation ? ... <Unhammer> back to sme-nob, hopefully averting more water damage
<Claude_Royet-Journoud> Une liste d'infinitifs prolonge l'accident.
was the invention of the adjective: no spell or incantation in Faerie is more potent.
–J.R.R. Tolkien
<miri> now the internet is back <miri> but there's no water in my building <miri> I hope there is no correlation ;)
<Unhammer> [-#Ipmil-] {+Ipmil+} <Unhammer> blasphemy
the warm soft short pants of the quick-scribbler: the vocative lapse from which it begins and the accusative hole in which it ends itself
– JJ
There are a number of languages spoken by human beings in this world.
– Harald Tveit Alvestrand, in RFC 1766, "Tags for the Identification of Languages"
IRC looks much better with some rage.
Er man nihilistisk nok, kunne man også ta det uskyldigste av alle ord, infinitivsmerket, og misbruke og skjende det på denne måte: Det begynte «å» regne. Kan man se på verden med mindre begeistring?
–Bjørneboe
Compounding is fun[edit]
lemurtvillingene: lem|urt|villingene nyrestaurert: nyre|staur|ert angrepsoppstillinger: angrep|sopp|stillinger snusleverandør: snus|leve|rand|ør
$ echo bildreportagen |apertium -d . swe-dan billede #rids mugmide tagene
^einannan/ein<n><m><sg><ind><cmp>+nan<n><m><sg><ind><cmp>+nan<n><m><sg><ind>$ … $ echo nannannannannan|apertium -d . nno-nob-morph ^nannannannannan/nan<n><m><sg><ind><cmp>+nan<n><m><sg><ind><cmp>+nan<n><m><sg><ind><cmp>+nan<n><m><sg><ind><cmp>+nan<n><m><sg><ind>$^./.<sent><clb>$
$ echo fornybart|apertium -d . nob-dan fodernyoverskæg
Let's try compounding on verb+noun:
$ echo regionspresident | apertium -d . nob-nno_e region spreie seie dent
noun+verb:
$ echo forringelse | apertium -d . nob-nno_e for ringel sjå
noun+verb+adj:
$ echo autoritært|lt-proc -we nob-dan.automorf.bin ^autoritært/auto<n><mf><sp><cmp>+ri<vblex><inf><cmp>+tære<adj><pp><pl>/auto<n><mf><sp><cmp>+ri<vblex><inf><cmp>+tære<adj><pp><nt><sg><ind>/auto<n><mf><sp><cmp>+ri<vblex><inf><cmp>+tære<adj><pp><mf><sg><ind>$
What if we allow turning double consonants into single before the compound border, then we can analyse compounds of words ending in double consonants followed by a word starting with the same consonant:
$ echo topprøve|apertium -d . nob-nno_e-morph|cg-conv "<topprøve>" "røve" adj pp pl "topp" n m sg ind cmp "prøve" n m sg ind "topp" n m sg ind cmp detriple
but of course the analyser decompounding doesn't know that the second word has to actually start with that same consonant:
$ echo HurtigrutenLive|apertium -d . nob-nno_e hurr TigruteinLive