Difference between revisions of "Ideas for Google Summer of Code/FieldWorks data extraction"
Jump to navigation
Jump to search
TommiPirinen (talk | contribs) m (flex data as corpus) |
Popcorndude (talk | contribs) m (categorize) |
||
Line 25: | Line 25: | ||
* https://github.com/sillsdev/FieldWorks |
* https://github.com/sillsdev/FieldWorks |
||
** FieldWorks internals (might need this to figure out formats, but hopefully not) |
** FieldWorks internals (might need this to figure out formats, but hopefully not) |
||
+ | |||
+ | [[Category:Ideas_for_Google_Summer_of_Code]] |
Revision as of 19:52, 24 March 2020
FieldWorks stores a lot of data of the sort that we want for building monodix.
Things we might be able to get:
- Lexicon entries
- Morphology
- Bidix entries
- Reference corpus / gold standars
- any number of things that might be extractable from glossed text
Coding Challenge
Write a script that reads a FieldWorks file and outputs the headword and part of speech of each lexicon entry.
Downloading FieldWorks and making up your own data to test this is fine (you'll probably end up doing a lot of it over the course of the project).
Links
- http://software.sil.org/fieldworks/resources/tutorial/lexicon/
- Description of lexicon features
- http://software.sil.org/fieldworks/resources/tutorial/grammar/
- Links to morphological stuff
- http://downloads.sil.org/FieldWorks/WW-ConceptualIntro/ConceptualIntroduction.htm
- Long list of data we might be able to get
- https://github.com/sillsdev/FieldWorks
- FieldWorks internals (might need this to figure out formats, but hopefully not)