Difference between revisions of "Ideas for Google Summer of Code/Discontiguous multiwords"
Jump to navigation
Jump to search
(Created page with ' Here is a cheap hack for how to deal with analysing discontiguous multiword units when translating from Germanic languages. <pre> For example, vísa manninum frá landinu -> v…') |
|||
| Line 1: | Line 1: | ||
Here is a cheap hack for how to deal with analysing |
Here is a cheap hack for how to deal with analysing |
||
discontiguous multiword units when translating from Germanic languages. |
discontiguous multiword units when translating from Germanic languages. |
||
| Line 47: | Line 46: | ||
Drawbacks: Might be too simple ? Creates more dependencies on CG ? |
Drawbacks: Might be too simple ? Creates more dependencies on CG ? |
||
</pre> |
</pre> |
||
==See also== |
|||
* [[Módulo de procesamiento de expresiones separables]] |
|||
* [[Separable verbs]] |
|||
Revision as of 15:51, 13 February 2010
Here is a cheap hack for how to deal with analysing discontiguous multiword units when translating from Germanic languages.
For example,
vísa manninum frá landinu -> vísa# frá manninum landinu
'deport the man from the country'
vísa ekki frá -> vísa# frá ekki
'deport not'
The idea is to distinguish verbs which can be parts of discontiguous
multiwords, and particles/adverbs which can also be. For example:
1) vísa/=vísa manninum frá/~frá landinu .
2) vísa/=vísa manninum undan/~undan landinu .
3) vísa/=vísa manninum upp/~upp landinu .
We will use constraint grammar rules to select the appropriate particle
if a verb exists.
LIST VISAPART = ~frá ~upp ;
REMOVE ("=vísa") (NOT 1* VISAPART);
SELECT ("=vísa") (1* VISAPART);
etc.
We will then use a mode of pretransfer (I suggest -m) to join the two
parts thus:
=vísa manninum ~frá landinu -> vísa# frá manninum landinu
'If LU starts with =, read buffering until ~ or ."
The '.<sent>' will be considered a hard delimiter, so that if no
particle is found in the sentence, the buffered part is output without
the initial '='.
Initial ~ and = found without both parts will be stripped.
Benefits: Can be implemented now in a backwards compatible way.
Drawbacks: Might be too simple ? Creates more dependencies on CG ?