Difference between revisions of "Task ideas for Google Code-in/Intersection of ATT format transducers"

From Apertium
Jump to navigation Jump to search
Line 104: Line 104:
 
* [http://www.cs.um.edu.mt/gordon.pace/Research/Software/Relic/Transformations/FSA/intersection.html Intersection of Finite State Automata]
 
* [http://www.cs.um.edu.mt/gordon.pace/Research/Software/Relic/Transformations/FSA/intersection.html Intersection of Finite State Automata]
 
* [https://svn.code.sf.net/p/apertium/svn/branches/lt-trim lt-trim (implemented with HFST)]
 
* [https://svn.code.sf.net/p/apertium/svn/branches/lt-trim lt-trim (implemented with HFST)]
  +
* [[Talk:Automatically trimming a monodix]]
   
 
[[Category:Tasks for Google Code-in|Intersection of ATT format transducers]]
 
[[Category:Tasks for Google Code-in|Intersection of ATT format transducers]]

Revision as of 08:34, 24 November 2013

Objective

The objective of these tasks is to write code to intersect two finite-state transducers. One transducer is a morphological dictionary, the other transducer is a bilingual dictionary which is converted into prefixes.

The intersection of the morphological dictionary with the prefixes of the bilingual dictionary will give us the set of strings in the morphological dictionary which have translations in the bilingual dictionary.

Example

Input

Monolingual transducer Bilingual transducer
0	1	b	b	
1	2	e	e	
1	3	a	a	
1	4	u	u	
2	5	e	e	
2	6	d	d	
3	6	t	t	
4	6	g	g	
5	6	r	r	
6	7	ε	<n>	
6	8	s	<n>	
7	9	ε	<sg>	
8	9	ε	<pl>	
9
0	1	b	o	
0	2	b	s	
1	3	e	h	
2	4	a	a	
3	5	d	e	
4	6	t	g	
5	7	<n>	<n>	
6	8	<n>	u	
8	9	ε	z	
9	10	ε	a	
10	11	ε	r	
11	7	ε	<n>	
7

Output

Trimmed monolingual transducer
0	1	b	b
1	2	e	e
1	3	a	a
2	4	d	d
3	4	t	t
4	5	s	<n>
4	6	ε	<n>
5	7	ε	<pl>
6	7	ε	<sg>
7

Steps

Imonodix.png
The bilingual dictionary (or one side of it). Note that the loops at the end show that these are prefixes, so will match any prefix of a lexical unit.
Trimmed.png
  1. Load both transducers
  2. Convert the bilingual dictionary transducer into prefixes by selecting only one side of it and adding a loop.
  3. Perform the intersection:

Tasks

Implementation in python

  1. Come up with a data structure for storing the transducers
  2. Write a function to convert a transducer to a prefix transducer (e.g. make the final loopback over any symbol)
  3. Write the intersection algorithm

Implementation in C++

  1. Look up the Transducer class in lttoolbox
  2. Write a function to convert a transducer to a prefix transducer (e.g. make the final loopback over any symbol)
  3. Write the intersection algorithm as a method.

Front-end program in C++

  1. Write a command-line program in the style of lt-comp and lt-proc to load two transducers and perform the intersection.

Compound support in python

Compound support in C++

Further reading