Difference between revisions of "Easy dictionary maintenance"

From Apertium
Jump to navigation Jump to search
Line 164: Line 164:
 
* Show some statistics
 
* Show some statistics
   
====TODO: Problemns to Fix====
+
====TODO: Problems to Fix====
 
* Improve Internal Classes with Perfomase on Saving a Big Dic Class
 
* Improve Internal Classes with Perfomase on Saving a Big Dic Class
 
* Use Test Framework for development
 
* Use Test Framework for development

Revision as of 18:35, 4 July 2010

Introduction

This space will report developments in the project. It is also a space to post comments and suggestions.

Original Ideias
http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code
http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Easy_dictionary_maintenance
Original GSOC2010 Application
http://wiki.apertium.org/wiki/User:Alessiojr/Easy_dictionary_-_Application-GSOC2010
Studant Information
Student: Alessio Miranda Junior
E-mail: alessio@inf.ufpr.br or alessio@alessiojr.com
Msn: msn@juninho.com.br
IRC: AlessioJr
GTalk: alessiojunin@gmail.com

Description

Abstract:

The idea is to develop a GUI tool to manage Apertium Monolingual and Bilingual XML files with the follow objectives
  • Create a alternative form to edit dix files with GUI resources.
  • Develop, initially, monolingual dictionaries but keeping the particular format of each file.
  • Minimize the direct manipulation of XML files, providing features that reduce this need.
  • Making use of DixTools to keep code reuse.

Why?

The number of language pairs in development for Apertium is increasing, and so is the complexity of these pairs. This increased complexity has made the job getting more complicated, thus the need for tools for the task is evident. The proposed want to make this management easier and probably will increase the probability of development for new language pairs. With better tools, more people will be able to develop language pairs.

How can use?

I believe that all Apertium society will have direct or indirect benefit. Directly, the developers of language-pairs will have their task facilitated. With a good tool to help with the work, to create or maintain a language will become easier, and probably it will take less time to get better results. Indirectly, the users will have benefits with this better and robust result.

What its the plan?

  • We're planing to create a GUI interface with features that facilitate common tasks of a user who wishes to manipulate a existing language pair or dictionary. These tasks will also be of great value to users, who have an intuitive tool to start new language pairs.
  • DixTools, tool developed for the apertium, currently already solves half the problem, especially the fact that a load XML into memory and do the reverse, it returns the XML in a suitable format.
  • We believe that the main challenge of this task is to find a way to expand DixTools by adapting the existing classes as a persistence layer connected to a framework for GUI applications, supporting an integration of elements, providing tools to search, filter, integration and change.
  • The application is developed for monolingual dictionaries manipulation, but its architecture will have to provide support for future extensions (Web and Collaborative) and bilingual dictionary.

Development Report

What we are using?

Development Paradigm:

Model-View-Controller concept. The solid line represents a direct association, the dashed an indirect association via an observer (for example).
  • Model–View–Controller (MVC)
is a software architecture, currently considered an architectural pattern used in software engineering. The pattern isolates "domain logic" (the application logic for the user) from input and presentation (GUI), permitting independent development, testing and maintenance of each.
The model layer is used to manage information and notify observers when that information changes. The model is the domain-specific representation of the data upon which the application operates. Domain logic adds meaning to raw data (for example, calculating whether today is the user's birthday, or the totals, taxes, and shipping charges for shopping cart items). When a model changes its state, it notifies its associated views so they can be refreshed. Many applications use a persistent storage mechanism such as a database to store data, a model which knows how to persist itself.
The view layer renders the model into a form suitable for interaction, typically a user interface element. Multiple views can exist for a single model for different purposes. A viewport typically has a one to one correspondence with a display surface and knows how to render to it.
The controller layer receives input and initiates a response by making calls on model objects. A controller accepts input from the user and instructs the model and viewport to perform actions based on that input.
An MVC application may be a collection of model/view/controller triplets, each responsible for a different UI element.

Program Language:

  • Java

Persistence:

  • XML (Apertium XML Files)
  • Database: JavaDB or Postgres

Framworks, APIs:

  • Dixtools
Is a package of java console tools to help in development of Apertium XML Files.
Basic Jpa classes structure.
  • JPA (Java Persistence API)
JPA simplifies the entity persistence model and adds new capabilities. Now developers can directly map the persistence object (POJO classes) with the relational database. The Java Persistence API has standardized the object-relational mapping technique. You can use JPA in your swing applications or web based applications.
  • JPA supports pluggable, third party persistence providers such as Hibernate and Toplink
  • JPA application can run outside the container also. So, developers can use JPA capabilities in desktop applications also
  • No need to write deployment descriptors. Annotations based meta-data are supported in JPA applications
  • Annotations defaults can be used in model class, which saves a lot of development time
  • Provides cleaner, easier, standardized object-relational mapping
  • JPA supports inheritance, polymorphism, and polymorphic queries.
  • JPA also supports named (static) and dynamic queries.
  • JEB QL is very powerfully query language provided by JPA
  • JPA helps you build a persistence layer that is vendor neutral and any persistence provider can be used
  • Netbeans Platform
Netbeans Platform Reference
The NetBeans Platform is a generic framework for commercial and open source desktop Swing applications. It provides the “plumbing” that you would otherwise need to write yourself, such as the code for managing windows, connecting actions to menu items, and updating applications at runtime. The NetBeans Platform provides all of these out of the box on top of a reliable, flexible, and well-tested modular architecture. In this refcard, you are introduced to the key concerns of the NetBeans Platform, so that you can save years of work when developing robust and extensible applications.
The key benefit of the NetBeans Platform:
  • OpenSource
  • Multplatform
  • Modular architecture.
  • Reliance on the Swing UI toolkit in combination with "Matisse" GUI Builder.
  • Designed with the idea that Software should be re-usable.
  • Generic Desktop framework
  • NetBeans platform provides the basic underpinning
  • NetBeans platform is a set of frameworks built into a single integrated software
    • Collection of libraries
    • Swing Extensions
    • NetBeans platform toolkit
  • Modules, modules and some more modules.
  • Modular architecture gives extensibility and helps to maintain the compatibility

How To?

Prototype 1 - Refactor and First Release

New DixTools Architecture
New Integrated Architecture
DixToolsSuite Components Architecture
DixToolsSuite Components Architecture

Time Lime

Week Stage Description
1, 2 Analysis of technology in handling memory To investigate and select an effective way to view and manipulate the XML files of Apertium in memory using Java.
Analysis of the best technologie that complement the functionality of DixTools during manipulation of XML.
Maybe a database integration, trying to use VTD-XML or extend dixTools Classes.
Testing and choosing the best alternative.
2, 3 Development of first prototype Development of an interface that tries to use a core of features like Load, Save, list , search and Filter elements.
Prototype Milestone 1

Month Activities

  • Refactor Apertium-DixTools:
    • Separate Model Classes (Java Beans) from Control Classes into a new Jar Pack.
    • Integrate Java Beans With JPA Features
    • Write code to Import/Export Xml To DataBase
    • First Prototype
    • Test Features
    • Integrating with Plataform
    • First Crud prototype with Sdefs

Prototype 1

Its called DixToolsSuite and now is using an embedded version of a Java DataBase, no Database need to be installed. For now It may be slow to Import large dictionaries (I will fix Latter, It will be a better performace with real DataBase Systems, like Postgres). On my PC to import a dictionary of 2Mb it will take +- 3min.

How to Use
  • Operation:
    • Import at least one Dix File. (The first few times, testing with small dictionaries)
    • Open the Project Window. There's a combo which can be selected from a Dix. and click Open.
    • The fields are filled.
    • To delete all dictionaries click on Reset / (Delete all).
Download Link

Prototype is available in SVN. It is an installer with versions for Windows and Linux (V0.3). Also is available all the source project, developed with NetBeans 6.9. Prototype Link

Features
  • Import / Export Files Dix
  • Select an Imported Dix.
  • Show/Edit/Delete Symbols (Sdefs)
  • Show some statistics

TODO: Problems to Fix

  • Improve Internal Classes with Perfomase on Saving a Big Dic Class
  • Use Test Framework for development
  • Easy Flexibility for chose, JavaDb, MySql and Postgress
  • Final Implementation to Manager Mult Dics in DataBase
  • Auto Fix If Import same DixFile
  • DicManager
  • Plataform Bugs
    • Open File when Component is Closer.

Prototype 2 - Implementing Real Funcionalities

Time Lime

Week Stage Description
5, 6 Simple Structures Implementation of Symbols, Alphabet and statistic features.
Need Drawings experiments to create interface to users.
7 Paradigms First implementation of features with paradigms.
Need Drawings experiments to create interface to users.
8 Lemmas First implementation of features with lemmas.
Need Drawings experiments to create interface to users.
Prototype Milestone 2 Version for testing with huge dictionaries and complete edition test with basic features.

Real Actions

TODO

Prototype 3 - Real Driver Test

Time Lime

Week Stage Description
9 Paradigms With feedback of the community, adjusting the interface and implementation, and probably adding new features.
10 Lemmas With feedback of the community, adjusting the interface and implementation, and probably adding new features.
11 Pré-Release Security time to improve integration functionalities
Prototype Release Candidate
12 Makeup Fix remain bugs, final adjustments and documentation in Wiki
Final Release

Real Actions

TODO