Objective
The objective of these tasks is to write code to intersect two finite-state transducers. One transducer is a morphological dictionary, the other transducer is a bilingual dictionary which is converted into prefixes.
The intersection of the morphological dictionary with the prefixes of the bilingual dictionary will give us the set of strings in the morphological dictionary which have translations in the bilingual dictionary.
Example
Input
Monolingual transducer |
Bilingual transducer
|
0 1 b b
1 2 e e
1 3 a a
1 4 u u
2 5 e e
2 6 d d
3 6 t t
4 6 g g
5 6 r r
6 7 ε <n>
6 8 s <n>
7 9 ε <sg>
8 9 ε <pl>
9
|
0 1 b o
0 2 b s
1 3 e h
2 4 a a
3 5 d e
4 6 t g
5 7 <n> <n>
6 8 <n> u
8 9 ε z
9 10 ε a
10 11 ε r
11 7 ε <n>
7
|
Output
Trimmed monolingual transducer
|
0 1 b b
1 2 e e
1 3 a a
2 4 d d
3 4 t t
4 5 s <n>
4 6 ε <n>
5 7 ε <pl>
6 7 ε <sg>
7
|
Steps
- Load both transducers
- Convert the bilingual dictionary transducer into prefixes by selecting only one side of it and adding a loop.
- Perform the intersection:
Tasks
Implementation in python
- Come up with a data structure for storing the transducers
- Write a function to convert a transducer to a prefix transducer (e.g. make the final loopback over any symbol)
- Write the intersection algorithm
Implementation in C++
- Look up the
Transducer
class in lttoolbox
- Write a function to convert a transducer to a prefix transducer (e.g. make the final loopback over any symbol)
- Write the intersection algorithm as a method.
Front-end program in C++
- Write a program to load two transducers and perform the intersection.
Further reading