Difference between revisions of "Talk:Northern Sámi and Norwegian"
(→Wishlist / Difficulties with the architecture / Ugly hacks: new section) |
|||
Line 54: | Line 54: | ||
== Wishlist / Difficulties with the architecture / Ugly hacks == |
== Wishlist / Difficulties with the architecture / Ugly hacks == |
||
=== Clipping a substring in transfer (or any better solution) === |
|||
⚫ | For inserting prepositions, we first tried just adding them to the chunk name in t1x (adj_nom => til_adj_nom), reading them off in t4x. However, since there is no function in apertium-interchunkt to remove the first n letters of a string, we couldn't have a general method in t2x or t3x to eg. switch the preposition or remove it based on a larger context. |
||
Having the preposition in a tag is rather ugly. |
|||
⚫ | For inserting prepositions, we first tried just adding them to the chunk name in t1x (adj_nom => til_adj_nom), reading them off in t4x. However, since there is no function in apertium-interchunkt to remove the first n letters of a string, we couldn't have a general method in t2x or t3x to eg. switch the preposition or remove it based on a larger context. |
||
We ended up just adding it as a chunk -- however, this means that all t2x/t3x rules working on eg. SN now have to be duplicated for the possibility of PR SN too. |
|||
===No UTF in sdefs === |
|||
@←SPRED is not a valid sdef. Is this just because of it being an XML ID? |
Revision as of 07:59, 10 February 2010
Contents
Transfer strategy
So far I've been thinking this:
- t1x: chunking
- Turn adjectives and nouns into SN chunks, give them the right gender and number
- Derivations into phrases?
- t2x: movement
- Put adpositions in front of SN chunks
- In general move SN chunks around verbs, adverbs etc. to get right word order
- Guess definiteness from word order, case, syntactic function
- t3x: cleanup
- Eg. if definiteness changed, make sure adj tags are consistent
- We could also do:
- t1x: light chunking (SN, ...)
- t2x: more chunking (Relatives, subordinate clauses)
- t3x: moving around and stuff
- t4x: cleanup.
- Francis Tyers 18:32, 18 January 2010 (UTC)
The 1-4 are different files, is that it? There are both easy and hard issues when it comes to phrases, this speaks in favour of 4. But the clear-cut criterion for light vs. heavy?Trondtr 12:26, 19 January 2010 (UTC).
- We'll need rules to cover both compounding and derivation, this speaks for 4-stage (eg. each noun could be a compound, multiplying each noun rule by two--or more if we have longer compounds?). We need to figure out what phenomena go in what stage though.unhammer 13:09, 19 January 2010 (UTC)
- t1x
- (de-)compounding,
- derivation,
- simple noun phrases (heads and their simple modifiers/specifiers: adj nom, adj adj nom, det adj adj nom, num adj nom),
- simple periphrastic verb combinations (verb, vaux pp, vaux inf)
- t2x
- relatives (SN "who" SV -> SN)
- co-ordination (SN "and" SN -> SN)
- genitive modifiers (SN SN-Gen " [University of Reykjavik] [big old library]-GEN"
- t3x
- move postpositions (SN ADPOS -> ADPOS SN) "[1 big house which is on the hill] [2 in]"
- V2? --unhammer 13:04, 20 January 2010 (UTC) +1 Francis Tyers
- Insert dropped pronouns? (Or tags for them?)--unhammer 14:25, 20 January 2010 (UTC) +1 Francis Tyers
- t4x
- Insert prepositions.
- Insert articles? --unhammer 13:32, 20 January 2010 (UTC)
- Cleanup
- t1x
- - Francis Tyers 14:37, 19 January 2010 (UTC)
Level | Description | Test case |
---|---|---|
t1x | (de-)compounding | Politiijastašuvnna |
Wishlist / Difficulties with the architecture / Ugly hacks
Clipping a substring in transfer (or any better solution)
For inserting prepositions, we first tried just adding them to the chunk name in t1x (adj_nom => til_adj_nom), reading them off in t4x. However, since there is no function in apertium-interchunkt to remove the first n letters of a string, we couldn't have a general method in t2x or t3x to eg. switch the preposition or remove it based on a larger context.
Having the preposition in a tag is rather ugly.
We ended up just adding it as a chunk -- however, this means that all t2x/t3x rules working on eg. SN now have to be duplicated for the possibility of PR SN too.
No UTF in sdefs
@←SPRED is not a valid sdef. Is this just because of it being an XML ID?