Difference between revisions of "User:Francis Tyers/Apertium 4"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
		
		
		
		
		
	
|  (→Engine) |  (→Engine) | ||
| Line 22: | Line 22: | ||
| ** e.g. can we treat the modules of the pipeline as a neural net and train the weights for them via backprop? | ** e.g. can we treat the modules of the pipeline as a neural net and train the weights for them via backprop? | ||
| * Fully functional recursive transfer | * Fully functional recursive transfer | ||
| * Per session state, this could be stored in something like a special blank that could be updated. It might contain things like domain, etc.  | |||
| * Per session state. | |||
| ; Neural system | ; Neural system | ||
Revision as of 21:25, 21 June 2020
Wish list for Apertium 4:
Software engineering
- Make the different parts of the engine code more coherent in terms of modules
- Better test coverage and testing of existing (particularly newer) modules
Linguistic data
- Extract multiwords from lexicons into "separable" FSTs
- Train taggers for all languages using available corpora and a TLM
- At least one state-of-the-art language pair (wrt. Google).
Engine
- Better support for Unicode in lttoolbox.
- Use embeddings for morphological disambiguation and lexical selection
- Pass the surface form until transfer (to allow modules to look up surface form embeddings)
- Retire the HMM tagger
- Be able to train weights for morph analysis + morph. disambiguation + lexical selection + transfer end to end.
- e.g. can we treat the modules of the pipeline as a neural net and train the weights for them via backprop?
 
- Fully functional recursive transfer
- Per session state, this could be stored in something like a special blank that could be updated. It might contain things like domain, etc.
- Neural system
- apertium-neural: There should be a basic NMT implementation that functions in the Apertium ecosystem (C++,autotools,bash,apy,html-tools) for communities that want to build their own NMT systems and still take advantage of our ecosystem. We should be a one-stop shop for MT for marginalised langs.
End user
- Format handling
- user-based chunking: Better treatment of "no-translate"
- User dictionaries
- Better handling of code-switching/mixed texts and informal text.

