User:Francis Tyers/Apertium 4
< User:Francis TyersJump to navigation Jump to search
Revision as of 20:02, 21 June 2020 by Francis Tyers (Created page with "Wish list for Apertium 4: == Software engineering == * Make the different parts of the engine code more coherent in terms of modules == Linguistic data == * Extract multiw...")
Wish list for Apertium 4:
- Make the different parts of the engine code more coherent in terms of modules
- Extract multiwords from lexicons into "separable" FSTs
- Train taggers for all languages
- At least one state-of-the-art language pair (wrt. Google).
- Use embeddings for morphological disambiguation and lexical selection
- Pass the surface form until transfer (to allow modules to look up surface form embeddings)
- Retire the HMM tagger
- Be able to train weights for morph analysis + morph. disambiguation + lexical selection + transfer end to end.
- Fully functional recursive transfer
- Neural system
- There should be a basic NMT implementation that functions in the Apertium ecosystem (C++,autotools,bash,apy,html-tools) for communities that want to build their own NMT systems and still take advantage of our ecosystem. We should be a one-stop shop for MT for marginalised langs.
- Format handling
- Better treatment of "no-translate"
- User dictionaries