Difference between revisions of "User:Fpetkovski/GSOC 2012 Report"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
| Fpetkovski (talk | contribs)  (Created page with '==Documentation / HOWTO== * Corpus based preposition selection - HOWTO * Building a pseudo-parallel corpus  ==Reports== * Lexical feature transfer - First report  * […') |  (backlink) | ||
| (One intermediate revision by one other user not shown) | |||
| Line 2: | Line 2: | ||
| * [[Corpus based preposition selection - HOWTO]] | * [[Corpus based preposition selection - HOWTO]] | ||
| * [[Building a pseudo-parallel corpus]] | * [[Building a pseudo-parallel corpus]] | ||
| * [[Ideas for Google Summer of Code/Corpus-based lexicalised feature transfer]] | |||
| ==Reports== | ==Reports== | ||
Latest revision as of 11:18, 9 February 2015
Documentation / HOWTO[edit]
- Corpus based preposition selection - HOWTO
- Building a pseudo-parallel corpus
- Ideas for Google Summer of Code/Corpus-based lexicalised feature transfer
Reports[edit]
TODO[edit]
- Try generating corpus from monolingual SL corpus:
- Оваа лабавост на регулативите се одразува врз третманот на уапсените корисници на дрога.
 
*** Run through lexical transfer mk-en-biltrans
- Run through apertium-lex-tools/scripts/biltrans-to-multitrans.py
- Run through the rest of the pipeline from apertium-transfer -bonwards
- Run through- apertium-lex-learner/irstlm-ranker
 
- Run through 
- This will give:
 
*** SL:TL selection possibilities 
- probabilities from the TL language model for each selection
 
- Select a subset for training where one translation has a substantially higher proportion of the probability mass than the rest.
- Look at finding out how to work out what "substantially" should be.
 
- Improve current method:
** Split test corpus in two (dev, test)
- Rerun the experiments and check with test corpus
- Look at dev corpus to see what kind of patterns there are in lines that aren't getting matched
 
- Look at combining the 1-feature with the 2-feature model as backoff.
 
- Evaluation
- Try pair bootstrap resampling between best system and default translation for both WER and BLEU.
 
- Check the bidix entries that were added automatically

