Difference between revisions of "Task ideas for Google Code-in/Intersection of ATT format transducers"

Revision as of 10:33, 27 November 2013

Objective

The objective of these tasks is to write code to intersect two finite-state transducers. One transducer is a morphological dictionary, the other transducer is a bilingual dictionary which is converted into prefixes.

The intersection of the morphological dictionary with the prefixes of the bilingual dictionary will give us the set of strings in the morphological dictionary which have translations in the bilingual dictionary.

Example

Input

Monolingual transducer

Bilingual transducer

0	1	b	b	
1	2	e	e	
1	3	a	a	
1	4	u	u	
2	5	e	e	
2	6	d	d	
3	6	t	t	
4	6	g	g	
5	6	r	r	
6	7	ε	<n>	
6	8	s	<n>	
7	9	ε	<sg>	
8	9	ε	<pl>	
9

0	1	b	o	
0	2	b	s	
1	3	e	h	
2	4	a	a	
3	5	d	e	
4	6	t	g	
5	7	<n>	<n>	
6	8	<n>	u	
8	9	ε	z	
9	10	ε	a	
10	11	ε	r	
11	7	ε	<n>	
7

Output

Trimmed monolingual transducer
0 1 b b 1 2 e e 1 3 a a 2 4 d d 3 4 t t 4 5 s <n> 4 6 ε <n> 5 7 ε <pl> 6 7 ε <sg> 7

Steps

The bilingual dictionary (or one side of it). Note that the loops at the end show that these are prefixes, so will match any prefix of a lexical unit.

Load both transducers
Convert the bilingual dictionary transducer into prefixes by selecting only one side of it and adding a loop.
Perform the intersection:

Tasks

Implementation in python

Come up with a data structure for storing the transducers
Write a function to convert a transducer to a prefix transducer (e.g. make the final loopback over any symbol)
Write the intersection algorithm

Implementation in C++

Look up the Transducer class in lttoolbox
Write a function to convert a transducer to a prefix transducer (e.g. make the final loopback over any symbol)
Write the intersection algorithm as a method.

Front-end program in C++

Write a command-line program in the style of lt-comp and lt-proc to load two transducers and perform the intersection.

Compound support in python

Compound support in C++

#-type multiword support in python

The # is used for "multiwords" (words that consist of several words), when you have inflection in the middle of the word, and an invariant part at the end.

In the analyser, the # appears after the the inflected part has been fully read, before the invariant part, so if we want to have both "coffee with milk" and "coffees with milk" as possible single-unit inputs, the analyser would have these possible outputs:

coffee<n><sg># with milk
coffee<n><pl># with milk

But in the bidix, the invariant part is represented before the tags, ie.

coffee# with milk<n><sg>
coffee# with milk<n><pl>

To intersect the analyser with the bidix, we actually need to intersect it with a transformed bidix that has the #-part moved after the tags.

@@ Line 99: / Line 99: @@
 ===Compound support in C++===
+===#-type multiword support in python===
+The '''#''' is used for "multiwords" (words that consist of several words), when you have inflection in the middle of the word, and an invariant part at the end.
+In the analyser, the # appears after the the inflected part has been fully read, before the invariant part, so if we want to have both "coffee with milk" and "coffees with milk" as possible single-unit inputs, the analyser would have these possible outputs:
+<pre>coffee<n><sg># with milk
+coffee<n><pl># with milk</pre>
+But in the bidix, the invariant part is represented before the tags, ie.
+<pre>coffee# with milk<n><sg>
+coffee# with milk<n><pl></pre>
+To intersect the analyser with the bidix, we actually need to intersect it with a transformed bidix that has the #-part moved after the tags.
+===#-type multiword support in C++===
 ==Further reading==

Difference between revisions of "Task ideas for Google Code-in/Intersection of ATT format transducers"

Revision as of 10:33, 27 November 2013

Contents

Objective

Example

Input

Output

Steps

Tasks

Implementation in python

Implementation in C++

Front-end program in C++

Compound support in python

Compound support in C++

#-type multiword support in python

#-type multiword support in C++

Further reading

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools