Difference between revisions of "Named entity recognition"
Line 30: | Line 30: | ||
and |
and |
||
:Marijom → Marija{{fade|<np><ant><f><sg> |
:Marijom → Marija{{fade|<np><ant><f><sg><ins>}} |
||
==Pipeline== |
==Pipeline== |
Revision as of 23:01, 24 December 2007
Named entity recognition is about recognising named entities, for example proper nouns, etc. in text. When working with long rules, one of the problems in having them applied can be proper nouns. For example, names, companies, places etc. that aren't in the dictionaries and thus are not analysed. So for example in a sentence like:
- Die man het John gesien.
would be analysed something like (simplifying slightly):
- Die<det> man<n> hê<vbhaver> *John gesien<vblex><past>
If we have a rule that says something like:
- <vbhaver> <noun phrase> <vblex><past> → <vbhaver> <vblex><past> <noun phrase>
This will not apply, because "John" is not detected as anything. As a result the translation will be worse because the word re-ordering has not taken place. So, instead of getting:
- The man had seen John
We would get:
- The man had John seen.
Which is less than ideal. What we need is something that can tag "John" as a proper noun (<np>
), so that the rules may be applied in the appropriate fashion.
Examples
The problem becomes more acute in other language groups where proper nouns have cases. For example in Serbo-Croatian or Polish:
- Władysława → Władysław<np><ant><m><sg><gen>
and
- Marijom → Marija<np><ant><f><sg><ins>
Pipeline
It should probably go in between tagging and transfer, and work only on unknown words.
Further reading
- Babych, B. and Hartley, T. (2003) "Improving machine translation quality with automatic named entity recognition ". Procs. EACL-EAMT 2003: Improving MT through other language technology tools, Budapest, Hungary, April 2003 .