Difference between revisions of "Getting started with Annotatrix"

From Apertium
Jump to navigation Jump to search
(Second deliverable)
Line 1: Line 1:
Annotatrix is an open source tool included on the Apertium project that let you train corpora and manage related files with a friendly user interface and letting you focus you effort on the disambiguation process abstracting your mind of what is happening under the hook
+
Annotatrix is an open source tool included on the Apertium project that let you train corpora and manage related files with a friendly user interface and letting you focus your effort on the disambiguation process abstracting your mind of what is happening under the hook
   
 
==Run Annotatrix locally==
 
==Run Annotatrix locally==
Line 7: Line 7:
 
[http://sourceforge.net/p/apertium/svn/46512/tree/branches/apertium-annotatrix/web/ Annotatrix repository]
 
[http://sourceforge.net/p/apertium/svn/46512/tree/branches/apertium-annotatrix/web/ Annotatrix repository]
   
  +
==How to use Annotatrix ==
==Annotatrix views with pictures==
 
Here is detalled the system step by step with pictures from all view
 
   
  +
The welcome view of annotatrix has the links to go to tagger index, login, sign up, go to admin site and to insert a new corpus
Main Annotatrix's index
 
   
[[File:main_index.png| 600px ]]
+
[[File:annotatrix_main_index.png | 600px]]
   
  +
Once you click in one of the other views the system ask you to login, If you have an account you can log in as user, otherwise click on the upper right button to get an account signing up
Here are the links to the applications already installed in the system and the link to go to the admin site
 
   
  +
Once you complete the sign up form it will send you an email with the activation link valid for a week (so hurry up and activate your account before the time expire)
If we click on the Corpus index link, we have the next page:
 
   
[[File:corpus_index_view.png| 600px ]]
+
[[File:annotatrix_login.png | 600px]]
   
  +
Once you are logged in you can see the latests corpora and trainings made on the system from the tagger index and see corpus and training details, insert new corpora and train them easily
If you click in some title you will get the corpus' detail view:
 
   
[[File:Corpus_detail.png| 600px]]
+
[[File:annotatrix_empty_tagger_index.png | 600px]]
   
  +
[[File:annotatrix_tagger_index_populated.png | 600px]]
Where we have can see the lasts corpora (and in the future the lasts trainings as well) and a link to insert a new corpus
 
   
  +
You can insert a new corpus using the link associated and you will see this other view:
[[File:insert_corpus_view.png| 600px ]]
 
   
  +
[[File:annotatrix_tagger_insert_corpus.png | 600px]]
The insert corpus view is the view that let us insert new Corpora in the system.
 
If you want to insert text just browse on your file system and select a plain text file, or insert the corpus text directly on the main text-area
 
   
Once inserted the text, you have to set an ''unique'' title and select a language from the ones already installed on the server
+
Once you click on Train this corpus you will go to train this corpus, where you can select a mode of the already installed language pairs on the system and train the corpus with this mode, and also upload your own tsx and dix files
   
  +
[[File:annotatrix_tagger_new_trainer.png | 600px]]
Once you have all filled up click on the button ''go to tagger'' that in the future will connect this view with the trainer (still on develop), but by now it connects this view with the corpus index view
 
   
  +
Once you have selected de mode and files (tsx and dix files upload is optional, it will use the language pair tsx and dix default files instead), you can go to the Trainer and start with the corpus disambiguation
From the admin site we have so many things, here is the admin site index:
 
   
[[File:admin_index_site.png| 600px ]]
+
[[File:annotatrix_tagger_trainer_upper.png | 600px]]
   
  +
You have some information as, corpus title, corpus language and mode of this training on the top of the page, on the left part you have the corpus tagged with the ambiguous words in bold and on the right side you have a panel with the disambiguation information for each ambiguous tag
Here we can see directly all the corpora, trainers, users, groups, etc. from the admin server view
 
   
  +
You can choose the tag (bold word) that you desire to disambiguate and you disambiguate it using the numeric pad or the usual numbers according to the alternative number showed on the right panel
We can see the corpus index orderer by creation date, already trained and corpus language:
 
   
  +
You can select the ambiguous word using the right and left arrow keys and also clicking on them
[[File:corpus_index_admin_site.png| 600px ]]
 
   
  +
[[File:annotatrix_tagger_trainer_bottom.png | 600px]]
Also from the admin site we are able to insert new corpora:
 
   
  +
The corpus tagged is paginated in order to have handle the file in the easiest way for the user, you can go from page to page using next and preview page (if they are available) and also go directly to one page
[[File:Add_corpus_admin.png| 600px ]]
 
   
  +
Once you disambiguate at least one word you can save the current training status, and once you finish the training it will generate the prob file automaticatly and also the logs file
To see the develop documentation from the web you can check the Documentation in the admin site on the ''tagger'' secctions:
 
   
  +
You can always check the training details (if the training exists) using the training detail view
[[File:Documentation_index.png| 600px ]]
 
  +
  +
[[File:annotatrix_tagger_trainer_detail_finished.png | 600px]]
  +
  +
Where will show you the training status, showing the corpus text on the lefts, the corpus tagged on the centre (with the ambiguous words on bold) and on the right side the training important information as links to download the logs and prob files and if the training is already finished
  +
  +
Once a corpus is inserted on the system you can see the details on the corpus detail view
  +
  +
[[File:annotatrix_tagger_corpus_detail.png | 600px]]
  +
  +
Also annotatrix has an admin site where the site administrator can add manually corpora and trainings and see the system from the backend
  +
 
[[File:annotatrix_admin_site.png | 600px]]

Revision as of 12:28, 15 August 2013

Annotatrix is an open source tool included on the Apertium project that let you train corpora and manage related files with a friendly user interface and letting you focus your effort on the disambiguation process abstracting your mind of what is happening under the hook

Run Annotatrix locally

To have Annotatrix running locally here you have the installation tutorial on the README file of the project

Annotatrix repository

How to use Annotatrix

The welcome view of annotatrix has the links to go to tagger index, login, sign up, go to admin site and to insert a new corpus

Annotatrix main index.png

Once you click in one of the other views the system ask you to login, If you have an account you can log in as user, otherwise click on the upper right button to get an account signing up

Once you complete the sign up form it will send you an email with the activation link valid for a week (so hurry up and activate your account before the time expire)

Annotatrix login.png

Once you are logged in you can see the latests corpora and trainings made on the system from the tagger index and see corpus and training details, insert new corpora and train them easily

Annotatrix empty tagger index.png

Annotatrix tagger index populated.png

You can insert a new corpus using the link associated and you will see this other view:

Annotatrix tagger insert corpus.png

Once you click on Train this corpus you will go to train this corpus, where you can select a mode of the already installed language pairs on the system and train the corpus with this mode, and also upload your own tsx and dix files

Annotatrix tagger new trainer.png

Once you have selected de mode and files (tsx and dix files upload is optional, it will use the language pair tsx and dix default files instead), you can go to the Trainer and start with the corpus disambiguation

Annotatrix tagger trainer upper.png

You have some information as, corpus title, corpus language and mode of this training on the top of the page, on the left part you have the corpus tagged with the ambiguous words in bold and on the right side you have a panel with the disambiguation information for each ambiguous tag

You can choose the tag (bold word) that you desire to disambiguate and you disambiguate it using the numeric pad or the usual numbers according to the alternative number showed on the right panel

You can select the ambiguous word using the right and left arrow keys and also clicking on them

Annotatrix tagger trainer bottom.png

The corpus tagged is paginated in order to have handle the file in the easiest way for the user, you can go from page to page using next and preview page (if they are available) and also go directly to one page

Once you disambiguate at least one word you can save the current training status, and once you finish the training it will generate the prob file automaticatly and also the logs file

You can always check the training details (if the training exists) using the training detail view

Annotatrix tagger trainer detail finished.png

Where will show you the training status, showing the corpus text on the lefts, the corpus tagged on the centre (with the ambiguous words on bold) and on the right side the training important information as links to download the logs and prob files and if the training is already finished

Once a corpus is inserted on the system you can see the details on the corpus detail view

Annotatrix tagger corpus detail.png

Also annotatrix has an admin site where the site administrator can add manually corpora and trainings and see the system from the backend

Annotatrix admin site.png