Difference between revisions of "Google Season of Docs 2022/Organize and Update Apertium User Documentation"

From Apertium
Jump to navigation Jump to search
Line 117: Line 117:
 
! Details
 
! Details
 
! Deliverable
 
! Deliverable
  +
|-
  +
| '''Phase 1: Reference'''
  +
|
  +
|
  +
|
 
|-
 
|-
 
| Week 1
 
| Week 1
Line 133: Line 138:
 
* (see [[#Formal_descriptions]])
 
* (see [[#Formal_descriptions]])
 
| Up-to-date formal documentation of main pipeline modules and common build scripts
 
| Up-to-date formal documentation of main pipeline modules and common build scripts
  +
|-
  +
| '''Phase 2: Tutorials'''
  +
|
  +
|
  +
|
 
|-
 
|-
 
| Weeks 4-5
 
| Weeks 4-5
Line 142: Line 152:
 
* Instructions for going from a linguistic paradigm to monodix/lexc/lexd
 
* Instructions for going from a linguistic paradigm to monodix/lexc/lexd
 
* Introduction to twol
 
* Introduction to twol
  +
| Information sufficient to get a beginner set up and contributing to lexicons
  +
|-
  +
| Weeks 6-7
  +
June 5-18
  +
| Transfer tutorials
  +
|
  +
* How to go from a word-order or agreement difference to a working transfer rule in either formalism
  +
| Systematic tutorial for writing transfer rules
  +
|-
  +
| Weeks 8-9
  +
June 19-July 2
  +
| Other tutorials
  +
|
  +
* Lexical selection
  +
* Training a tagger
  +
* Writing CG rules
  +
* Anaphora resolution
  +
* Separable
  +
| End-to-end tutorial for the translation pipeline
  +
|-
  +
| '''Phase 3: Explanation'''
  +
|
  +
|
  +
|
  +
|-
  +
| Week 10
  +
July 3-9
  +
| Theoretical background
  +
|
  +
* RBMT
  +
* FSTs
  +
* other things, if time
  +
| Introductions to why Apertium uses the technology that it does
  +
|-
  +
| '''Phase 4: How-to guides and code structure'''
  +
|
  +
|
  +
|
 
|}
 
|}
   

Revision as of 23:18, 23 March 2022

About Apertium

About the project

The problem

Apertium's wiki and other documentation are out of date, poorly organized, not visible enough, and just plain not user-friendly.

This ranges from documentation of individual tools not reflecting their current state, to our best how-to guides reflecting how things were done a decade ago. Documentation is scattered between the Apertium wiki, individual GitHub repos, an out-of-date pdf "Book", and even published papers and third party sites.

The result is new users and contributors wasting time reading out-of-date materials, and even long-time contributors having no way to be aware of changes to the tools they use.

The solution

The solution to the above problem is to create updated documentation for all pipeline modules and/or a full tutorial.

Ideally documentation on a given tool will exist in a single place, and a full tutorial will also have a single unified source. One possibility is to generate one set of docs from another, or from a single unified source. For example, if we want tools to be documented in both their GitHub repos and on the wiki, we should generate one set of documentation from the other (or a third source). If we want a full tutorial to be on the wiki but also available in PDF format, then we should designate one source as the original and generate the others from them.

The scope

  • Overview of the Apertium platform
  • All stages of the Apertium pipeline
  • The main approaches to and tools for each stage

Measuring success

Existing Documentation

Formal Descriptions

Source Mostly Complete Partial
2.0 docs
  • stream format
  • transfer
  • monodix
  • bidix
  • tagger
  • lrx
  • format handling
wiki
  • recursive
  • anaphora
  • separable
  • makefiles and modes
github
  • lexd
external sources
  • HFST (probably don't redo)
  • CG3 (link to, don't redo)

missing:

  • build scripts (filter-rules, etc)
  • spellchecker
  • postgenerator?

Tutorials

Even things in the "substantive" column will likely need a fair amount of work for the purposes of this project.

Source Substantive Fragmentary
Apertium wiki
  • monodix
  • bidix
  • init
  • transfer
  • recursive
  • anaphora
User:Firespeaker's course wiki
  • lexd
  • bidix
  • lrx
  • recursive
  • CG3

missing:

  • HFST
  • tagger
  • separable
  • anaphora

Timeline

Time Period Goal Details Deliverable
Phase 1: Reference
Week 1

May 1-7

Gather and convert existing documentation
  • Set up repo for canonical copy
  • Copy all existing docs to canonical repo
  • Delete outdated info
Single canonical source containing existing info
Weeks 2-3

May 8-21

Fill in gaps in formal docs Up-to-date formal documentation of main pipeline modules and common build scripts
Phase 2: Tutorials
Weeks 4-5

May 22-June 4

Dictionary tutorials
  • Basic introduction to shell and common Apertium-related commands
  • Guidance for selecting arguments for apertium-init
  • Instructions for going from a linguistic paradigm to monodix/lexc/lexd
  • Introduction to twol
Information sufficient to get a beginner set up and contributing to lexicons
Weeks 6-7

June 5-18

Transfer tutorials
  • How to go from a word-order or agreement difference to a working transfer rule in either formalism
Systematic tutorial for writing transfer rules
Weeks 8-9

June 19-July 2

Other tutorials
  • Lexical selection
  • Training a tagger
  • Writing CG rules
  • Anaphora resolution
  • Separable
End-to-end tutorial for the translation pipeline
Phase 3: Explanation
Week 10

July 3-9

Theoretical background
  • RBMT
  • FSTs
  • other things, if time
Introductions to why Apertium uses the technology that it does
Phase 4: How-to guides and code structure

Budget