https://wiki.apertium.org/w/api.php?action=feedcontributions&user=Tino+Didriksen&feedformat=atomApertium - User contributions [en]2024-03-28T17:14:17ZUser contributionsMediaWiki 1.34.1https://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code&diff=75357Ideas for Google Summer of Code2023-10-26T11:09:23Z<p>Tino Didriksen: Snip completed l10n/i18n</p>
<hr />
<div>{{TOCD}}<br />
This is the ideas page for [[Google Summer of Code]], here you can find ideas on interesting projects that would make Apertium more useful for people and improve or expand our functionality.<br />
<br />
'''Current Apertium contributors''': If you have an idea please add it below, if you think you could mentor someone in a particular area, add your name to "Interested mentors" using <code><nowiki>~~~</nowiki></code>.<br />
<br />
'''Prospective GSoC contributors''': The page is intended as an overview of the kind of projects we have in mind. If one of them particularly piques your interest, please come and discuss with us on <code>#apertium</code> on <code>irc.oftc.net</code> ([[IRC|more on IRC]]), mail the [[Contact|mailing list]], or draw attention to yourself in some other way. <br />
<br />
Note that if you have an idea that isn't mentioned here, we would be very interested to hear about it.<br />
<br />
Here are some more things you could look at:<br />
<br />
* [[Top tips for GSOC applications]] <br />
* Get in contact with one of our long-serving [[List of Apertium mentors|mentors]] &mdash; they are nice, honest!<br />
* Pages in the [[:Category:Development|development category]]<br />
* Resources that could be converted or expanded in the [[incubator]]. Consider doing or improving a language pair (see [[incubator]], [[nursery]] and [[staging]] for pairs that need work)<br />
* Unhammer's [[User:Unhammer/wishlist|wishlist]]<br />
<!--* The open issues [https://github.com/search?q=org%3Aapertium&state=open&type=Issues on Github] - especially the [https://github.com/search?q=org%3Aapertium+label%3A%22good+first+issue%22&state=open&type=Issues Good First Issues]. --><br />
<br />
__TOC__<br />
<br />
If you're a prospective GSoC contributor trying to propose a topic, the recommended way is to request a wiki account and then go to <pre>http://wiki.apertium.org/wiki/User:[[your username]]/GSoC2023Proposal</pre> and click the "create" button near the top of the page. It's also nice to include <code><nowiki>[[</nowiki>[[:Category:GSoC_2023_student_proposals|Category:GSoC_2023_student_proposals]]<nowiki>]]</nowiki></code> to help organize submitted proposals.<br />
<br />
== Language Data ==<br />
<br />
Can you read or write a language other than English (and we do mean any language)? If so, you can help with one of these and we can help you figure out the technical parts.<br />
<br />
{{IdeaSummary<br />
| name = Develop a morphological analyser<br />
| difficulty = easy<br />
| length = either<br />
| skills = XML or HFST or lexd<br />
| description = Write a morphological analyser and generator for a language that does not yet have one<br />
| rationale = A key part of an Apertium machine translation system is a morphological analyser and generator. The objective of this task is to create an analyser for a language that does not yet have one.<br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Jonathan Washington]], [[User: Sevilay Bayatlı|Sevilay Bayatlı]], Hossep, nlhowell, [[User:Popcorndude]]<br />
| more = /Morphological analyser<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = apertium-separable language-pair integration<br />
| difficulty = Medium<br />
| length = short<br />
| skills = XML, a scripting language (Python, Perl), some knowledge of linguistics and/or at least one relevant natural language<br />
| description = Choose a language you can identify as having a good number of "multiwords" in the lexicon. Modify all language pairs in Apertium to use the [[Apertium-separable]] module to process the multiwords, and clean up the dictionaries accordingly.<br />
| rationale = Apertium-separable is a newish module to process lexical items with discontinguous dependencies, an area where Apertium has traditionally fallen short. Despite all the module has to offer, many translation pairs still don't use it.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Popcorndude]]<br />
| more = /Apertium separable<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Bring an unreleased translation pair to releasable quality<br />
| difficulty = Medium<br />
| length = long<br />
| skills = shell scripting<br />
| description = Take an unstable language pair and improve its quality, focusing on testvoc<br />
| rationale = Many Apertium language pairs have large dictionaries and have otherwise seen much development, but are not of releasable quality. The point of this project would be bring one translation pair to releasable quality. This would entail obtaining good naïve coverage and a clean [[testvoc]].<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Seviay Bayatlı|Sevilay Bayatlı]], [[User:Unhammer]], [[User:hectoralos|Hèctor Alòs i Font]]<br />
| more = /Make a language pair state-of-the-art<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Develop a prototype MT system for a strategic language pair<br />
| difficulty = Medium<br />
| length = long<br />
| skills = XML, some knowledge of linguistics and of one relevant natural language <br />
| description = Create a translation pair based on two existing language modules, focusing on the dictionary and structural transfer<br />
| rationale = Choose a strategic set of languages to develop an MT system for, such that you know the target language well and morphological transducers for each language are part of Apertium. Develop an Apertium MT system by focusing on writing a bilingual dictionary and structural transfer rules. Expanding the transducers and disambiguation, and writing lexical selection rules and multiword sequences may also be part of the work. The pair may be an existing prototype, but if it's a heavily developed but unreleased pair, consider applying for "Bring an unreleased translation pair to releasable quality" instead.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Sevilay Bayatlı| Sevilay Bayatlı]], [[User:Unhammer]], [[User:hectoralos|Hèctor Alòs i Font]]<br />
| more = /Adopt a language pair<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Add a new variety to an existing language<br />
| difficulty = easy<br />
| length = either<br />
| skills = XML, some knowledge of linguistics and of one relevant natural language <br />
| description = Add a language variety to one or more released pairs, focusing on the dictionary and lexical selection<br />
| rationale = Take a released language, and define a new language variety for it: e.g. Quebec French or Provençal Occitan. Then add the new variety to one or more released language pairs, without diminishing the quality of the pre-existing variety(ies). The objective is to facilitate the generation of varieties for languages with a weak standardisation and/or pluricentric languages.<br />
| mentors = [[User:hectoralos|Hèctor Alòs i Font]], [[User:Firespeaker|Jonathan Washington]],[[User:piraye|Sevilaybayatlı]]<br />
| more = /Add a new variety to an existing language<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Leverage and integrate language preferences into language pairs<br />
| difficulty = easy<br />
| length = short<br />
| skills = XML, some knowledge of linguistics and of one relevant natural language <br />
| description = Update language pairs with lexical and orthographical variations to leverage the new [[Dialectal_or_standard_variation|preferences]] functionality<br />
| rationale = Currently, preferences are implemented via language variant, which relies on multiple dictionaries, increasing compilation time exponentially every time a new preference gets introduced.<br />
| mentors = [[User:Xavivars|Xavi Ivars]] [[User:Unhammer]]<br />
| more = /Use preferences in SPA-CAT<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Add Capitalization Handling Module to a Language Pair<br />
| difficulty = easy<br />
| length = short<br />
| skills = XML, knowledge of some relevant natural language<br />
| description = Update a language pair to make use make use of the new [[Capitalization_restoration|Capitalization handling module]]<br />
| rationale = Correcting capitalization via transfer rules is tedious and error prone, but putting them in a separate set of rules should allow them to be handled in a more concise and maintainable way. Additionally, it is possible that capitalization rule could be moved to monolingual modules, thus reducing development effort on translators.<br />
| mentors = [[User:Popcorndude]]<br />
| more = /Capitalization<br />
}}<br />
<br />
== Data Extraction ==<br />
<br />
A lot of the language data we need to make our analyzers and translators work already exists in other forms and we just need to figure out how to convert it. If you know of another source of data that isn't listed, we'd love to hear about it.<br />
<br />
{{IdeaSummary<br />
| name = dictionary induction from wikis<br />
| difficulty = Medium<br />
| length = either<br />
| skills = MySQL, mediawiki syntax, perl, maybe C++ or Java; Java, Scala, RDF, and DBpedia to use DBpedia extraction<br />
| description = Extract dictionaries from linguistic wikis<br />
| rationale = Wiki dictionaries and encyclopedias (e.g. omegawiki, wiktionary, wikipedia, dbpedia) contain information (e.g. bilingual equivalences, morphological features, conjugations) that could be exploited to speed up the development of dictionaries for Apertium. This task aims at automatically building dictionaries by extracting different pieces of information from wiki structures such as interlingual links, infoboxes and/or from dbpedia RDF datasets.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Popcorndude]]<br />
| more = /Dictionary induction from wikis<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Dictionary induction from parallel corpora / Revive ReTraTos<br />
| difficulty = Medium<br />
| length = short<br />
| skills = C++, perl, python, xml, scripting, machine learning<br />
| description = Extract dictionaries from parallel corpora<br />
| rationale = Given a pair of monolingual modules and a parallel corpus, we should be able to run a program to align tagged sentences and give us the best entries that are missing from bidix. [[ReTraTos]] (from 2008) did this back in 2008, but it's from 2008. We want a program which builds and runs in 2022, and does all the steps for the user.<br />
| mentors = [[User:Unhammer]], [[User:Popcorndude]]<br />
| more = /Dictionary induction from parallel corpora<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Extract morphological data from FLEx<br />
| difficulty = hard<br />
| length = long<br />
| skills = python, XML parsing<br />
| description = Write a program to extract data from [https://software.sil.org/fieldworks/ SIL FieldWorks] and convert as much as possible to monodix (and maybe bidix).<br />
| rationale = There's a lot of potentially useful data in FieldWorks files that might be enough to build a whole monodix for some languages but it's currently really hard to use<br />
| mentors = [[User:Popcorndude|Popcorndude]], [[User:TommiPirinen|Flammie]]<br />
| more = /FieldWorks_data_extraction<br />
}}<br />
<br />
== Tooling ==<br />
<br />
These are projects for people who would be comfortable digging through our C++ codebases (you will be doing a lot of that).<br />
<br />
{{IdeaSummary<br />
| name = Python API for Apertium<br />
| difficulty = medium<br />
| length = either<br />
| skills = C++, Python<br />
| description = Update the Python API for Apertium to expose all Apertium modes and test with all major OSes<br />
| rationale = The current Python API misses out on a lot of functionality, like phonemicisation, segmentation, and transliteration, and doesn't work for some OSes <s>like Debian</s>.<br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /Python API<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Robust tokenisation in lttoolbox<br />
| difficulty = Medium<br />
| length = long<br />
| skills = C++, XML, Python<br />
| description = Improve the longest-match left-to-right tokenisation strategy in [[lttoolbox]] to handle spaceless orthographies.<br />
| rationale = One of the most frustrating things about working with Apertium on texts "in the wild" is the way that the tokenisation works. If a letter is not specified in the alphabet, it is dealt with as whitespace, so e.g. you get unknown words split in two so you can end up with stuff like ^G$ö^k$ı^rmak$ which is terrible for further processing. Additionally, the system is nearly impossible to use for languages that don't use spaces, such as Japanese.<br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:TommiPirinen|Flammie]]<br />
| more = /Robust tokenisation<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = rule visualization tools<br />
| difficulty = Medium<br />
| length = either<br />
| skills = python? javascript? XML<br />
| description = make tools to help visualize the effect of various rules<br />
| rationale = TODO see https://github.com/Jakespringer/dapertium for an example<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Sevilay Bayatlı|Sevilay Bayatlı]], [[User:Popcorndude]]<br />
| more = /Visualization tools<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Extend Weighted transfer rules<br />
| difficulty = Medium<br />
| length = short<br />
| skills = C++, python<br />
| description = The weighted transfer module is already applied to the chunker transfer rules. And the idea here is to extend that module to be applied to interchunk and postchunk transfer rules too. <br />
| rationale = As a resource see https://github.com/aboelhamd/Weighted-transfer-rules-module<br />
| mentors = [[User: Sevilay Bayatlı|Sevilay Bayatlı]]<br />
| more = /Make a module <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Automatic Error-Finder / Pseudo-Backpropagation<br />
| difficulty = Hard<br />
| length = long<br />
| skills = python?<br />
| description = Develop a tool to locate the approximate source of translation errors in the pipeline.<br />
| rationale = Being able to generate a list of probable error sources automatically makes it possible to prioritize issues by frequency, frees up developer time, and is a first step towards automated generation of better rules.<br />
| mentors = [[User:Popcorndude]]<br />
| more = /Backpropagation<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Language Server Protocol<br />
| difficulty = Medium<br />
| length = short<br />
| skills = any programming language<br />
| description = Build a [https://microsoft.github.io/language-server-protocol/|Language Server] for the various Apertium rule formats<br />
| rationale = We have some static analysis tools and syntax highlighters already and it would be great if we could combine and expand them to support more text editors.<br />
| mentors = [[User:Popcorndude]]<br />
| more = /Language Server Protocol<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = WASM Compilation<br />
| difficulty = hard<br />
| length = long<br />
| skills = C++, Javascript<br />
| description = Compile the pipeline modules to WASM and provide JS wrappers for them.<br />
| rationale = There are situations where it would be nice to be able to run the entire pipeline in the browser<br />
| mentors = [[User:Tino Didriksen|Tino Didriksen]]<br />
| more = /WASM<br />
}}<br />
<br />
== Web ==<br />
<br />
If you know Python and JavaScript, here's some ideas for improving our [https://apertium.org website]. Some of these should be fairly short and it would be a good idea to talk to the mentors about doing a couple of them together.<br />
<br />
{{IdeaSummary<br />
| name = Web API extensions<br />
| difficulty = medium<br />
| length = short<br />
| skills = Python<br />
| description = Update the web API for Apertium to expose all Apertium modes <br />
| rationale = The current Web API misses out on a lot of functionality, like phonemicisation, segmentation, transliteration, and paradigm generation.<br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Apertium APY<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Misc<br />
| difficulty = Medium<br />
| length = short<br />
| skills = html, css, js, python<br />
| description = Improve elements of Apertium's web infrastructure<br />
| rationale = Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]] have numerous open issues. This project would entail choosing a subset of open issues and features that could realistically be completed in the summer. You're encouraged to speak with the Apertium community to see which features and issues are the most pressing.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Dictionary Lookup<br />
| difficulty = Medium<br />
| length = short<br />
| skills = html, css, js, python<br />
| description = Finish implementing dictionary lookup mode in Apertium's web infrastructure<br />
| rationale = Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]] have numerous open issues, including half-completed features like dictionary lookup. This project would entail completing the dictionary lookup feature. Some additional features which would be good to work would include automatic reverse lookups (so that a user has a better understanding of the results), grammatical information (such as the gender of nouns or the conjugation paradigms of verbs), and information about MWEs.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]], [[User:Popcorndude]]<br />
| more = https://github.com/apertium/apertium-html-tools/issues/105 the open issue on GitHub<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Spell checking<br />
| difficulty = Medium<br />
| length = short<br />
| skills = html, js, css, python<br />
| description = Add a spell-checking interface to Apertium's web tools<br />
| rationale = [[Apertium-html-tools]] has seen some prototypes for spell-checking interfaces (all in stale PRs and branches on GitHub), but none have ended up being quite ready to integrate into the tools. This project would entail polishing up or recreating an interface, and making sure [[APy]] has a mode that allows access to Apertium voikospell modules. The end result should be a slick, easy-to-use interface for proofing text, with intuitive underlining of text deemed to be misspelled and intuitive presentation and selection of alternatives. [https://github.com/apertium/apertium-html-tools/issues/390 the open issue on GitHub]<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Spell checker web interface<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Suggestions<br />
| difficulty = Medium<br />
| length = short<br />
| skills = html, css, js, python<br />
| description = Finish implementing a suggestions interface for Apertium's web infrastructure<br />
| rationale = Some work has been done to add a "suggestions" interface to Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]], whereby users can suggest corrected translations. This project would entail finishing that feature. There are some related [https://github.com/apertium/apertium-html-tools/issues/55 issues] and [https://github.com/apertium/apertium-html-tools/pull/252 PRs] on GitHub.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Orthography conversion interface<br />
| difficulty = Medium<br />
| length = short<br />
| skills = html, js, css, python<br />
| description = Add an orthography conversion interface to Apertium's web tools<br />
| rationale = Several Apertium language modules (like Kazakh, Kyrgyz, Crimean Tatar, and Hñähñu) have orthography conversion modes in their mode definition files. This project would be to expose those modes through [[APy|Apertium APy]] and provide a simple interface in [[Apertium-html-tools]] to use them.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Add support for NMT to web API<br />
| difficulty = Medium<br />
| length = long<br />
| skills = python, NMT<br />
| description = Add support for a popular NMT engine to Apertium's web API<br />
| rationale = Currently Apertium's web API [[APy|Apertium APy]], supports only Apertium language modules. But the front end could just as easily interface with an API that supports trained NMT models. The point of the project is to add support for one popular NMT package (e.g., translateLocally/Bergamot, OpenNMT or JoeyNMT) to the APy.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = <br />
}}<br />
<br />
== Integrations ==<br />
<br />
In addition to incorporating data from other projects, it would be nice if we could also make our data useful to them.<br />
<br />
{{IdeaSummary<br />
| name = OmniLingo and Apertium<br />
| difficulty = medium<br />
| length = either<br />
| skills = JS, Python<br />
| description = OmniLingo is a language learning system for practicing listening comprehension using Common Voice data. There is a lot of text processing involved (for example tokenisation) that could be aided by Apertium tools. <br />
| rationale = <br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /OmniLingo<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Support for Enhanced Dependencies in UD Annotatrix<br />
| difficulty = medium<br />
| length = either<br />
| skills = NodeJS<br />
| description = UD Annotatrix is an annotation interface for Universal Dependencies, but does not yet support all functionality.<br />
| rationale = <br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /Annotatrix enhanced dependencies<br />
}}<br />
<br />
<!--<br />
This one was done, but could do with more work. Not sure if it's a full gsoc though?<br />
<br />
{{IdeaSummary<br />
| name = User-friendly lexical selection training<br />
| difficulty = Medium<br />
| skills = Python, C++, shell scripting<br />
| description = Make it so that training/inference of lexical selection rules is a more user-friendly process<br />
| rationale = Our lexical selection module allows for inferring rules from corpora and word alignments, but the procedure is currently a bit messy, with various scripts involved that require lots of manual tweaking, and many third party tools to be installed. The goal of this task is to make the procedure as user-friendly as possible, so that ideally only a simple config file would be needed, and a driver script would take care of the rest.<br />
| mentors = [[User:Unhammer|Unhammer]], [[User:Mlforcada|Mikel Forcada]]<br />
| more = /User-friendly lexical selection training<br />
}}<br />
--><br />
<br />
{{IdeaSummary<br />
| name = UD and Apertium integration<br />
| difficulty = Entry level<br />
| length = short<br />
| skills = python, javascript, HTML, (C++)<br />
| description = Create a range of tools for making Apertium compatible with Universal Dependencies<br />
| rationale = Universal dependencies is a fast growing project aimed at creating a unified annotation scheme for treebanks. This includes both part-of-speech and morphological features. Their annotated corpora could be extremely useful for Apertium for training models for translation. In addition, Apertium's rule-based morphological descriptions could be useful for software that relies on Universal dependencies.<br />
| mentors = [[User:Francis Tyers]], [[User:Firespeaker| Jonathan Washington]], [[User:Popcorndude]]<br />
| more = /UD and Apertium integration <br />
}}<br />
<br />
[[Category:Development]]<br />
[[Category:Google Summer of Code]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Project_Management_Committee&diff=74368Project Management Committee2023-05-29T15:14:33Z<p>Tino Didriksen: </p>
<hr />
<div>The '''Project Management Committee''' is a group of seven Apertium committers, elected to be in charge of things like granting commit rights, signing off on releases, managing repositories and web sites, distributing funds and so on. See the [[Bylaws#Project Management Committee|Bylaws]] for details.<br />
<br />
After the elections in [[PMC_election|april 2022]], the committee is composed of the following members<br />
<br />
{| class="wikitable"<br />
|-<br />
! Name !! status<br />
|-<br />
| Francis M. Tyers || president<br />
|-<br />
| Mikel L. Forcada || elected<br />
|-<br />
| Tino Didriksen || elected<br />
|-<br />
| Xavier Ivars || elected<br />
|-<br />
| Jonathan North Washington || elected<br />
|-<br />
| Kevin Brubeck Unhammer || elected<br />
|-<br />
| <s>Tanmai Khanna</s> || elected; stepped down 2023-05-29<br />
|}<br />
<br />
[[Category:Project Management Committee]][[Category:Governance]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Google_Summer_of_Code/Application_2023&diff=74181Google Summer of Code/Application 20232023-01-28T23:21:08Z<p>Tino Didriksen: /* Program Retention Survey */</p>
<hr />
<div>== Register org ==<br />
<br />
=== Years previously participated in GSoC ===<br />
2021, 2020, 2019, 2018, 2017, 2016, 2014, 2013, 2012, 2011, 2010, 2009<br />
<br />
== Org Profile ==<br />
<br />
=== Website URL ===<br />
[http://wiki.apertium.org http://wiki.apertium.org]<br />
<br />
=== Logo ===<br />
[https://upload.wikimedia.org/wikipedia/commons/thumb/b/b4/Apertium_logo.svg/1214px-Apertium_logo.svg.png https://upload.wikimedia.org/wikipedia/commons/thumb/b/b4/Apertium_logo.svg/1214px-Apertium_logo.svg.png]<br />
<br />
=== Tagline ===<br />
A free/open-source machine translation platform<br />
<br />
=== Primary Open Source License ===<br />
GNU General Public License version 3<br />
<br />
=== Year organisation started ===<br />
<br />
2006 (???)<br />
<br />
=== Link to source code ===<br />
<br />
https://github.com/apertium/<br />
<br />
=== Organisation categories ===<br />
<br />
* Science and medicine (healthcare, biotech, life sciences, academic research, etc.)<br />
* Other<br />
<br />
=== Organisation technologies ===<br />
C++, python, bash, XML, javascript <br />
<br />
=== Organisation topics ===<br />
machine translation, natural language processing, less-resourced languages, language technology<br />
<br />
=== Organisation description ===<br />
<br />
Apertium is a free/open-source machine translation platform, and the organisation focuses on primarily symbolic language technology for less-resourced languages.<br />
<br />
=== Contributor guidance ===<br />
<br />
https://wiki.apertium.org/wiki/Top_tips_for_GSOC_applications<br />
<br />
=== Communication Methods ===<br />
<br />
* Chat: https://wiki.apertium.org/wiki/IRC<br />
* Mailing List / Forum: apertium-stuff@lists.sourceforge.net<br />
<br />
== Organisation questionnaire ==<br />
=== Why does your org want to participate in Google Summer of Code? ===<br />
Apertium has been part of GSoC for over a decade and it has been a great experience. Apertium loves GSoC: it supports free/open-source (FOS) software as much as we do! Apertium needs GSoC: it offers an incredible opportunity (and resources!) allowing us to spread the word about our project, to attract new developers and consolidate the contribution of existing developers through mentoring, and to improve the platform in many ways: improving the engine, generating new tools and user interfaces, making Apertium available to other applications, improving the quality of the languages currently supported, adding new languages to it. Apertium loves less-resourced languages and GSoC gives an opportunity for developers speaking them to generate FOS language technologies for them. Apertium will gain: more developers getting to know FOS software and the ethos that comes with it, contributing to it, and especially contributors who are passionate about languages and computers.<br />
<br />
<br />
=== What would your org consider to be a successful GSoC program? ===<br />
<br />
<!-- New contributors, new features completed, more code written, better being able to guide new developers into open source world, etc. --><br />
<br />
A successful GSoC would see any combination of newly released language pairs, the addition of new technologies to the Apertium framework, the addition of features to our web infrastructure, and a fresh round of developers becoming excited by Apertium. We would especially be happy to see a successful project form the basis of a published academic paper and to gain new long-term contributors.<br />
<br />
=== How will you keep mentors engaged with their GSoC contributors? ===<br />
We select our mentors from among very active developers, with long-term commitment to this 18-year-old project — they are people we know well and whom we have met face-to-face at conferences, workshops, or even in daily life; some of them teach and do research at universities or work at companies using Apertium. For this reason, it is quite unlikely for mentors to disappear, since most of them have been embedded in our community for years. However, there is always the possibility that some problem comes up, so we also assign back-up mentors to all contributors, in many cases more than one back-up. If a mentor cannot continue for whatever reason, one of the backup co-mentors will take over, and one of the organisation administrators (themselves experienced GSoC mentors) will take on the role of second backup mentor. <br />
<br />
=== How will you keep your GSoC contributors on schedule to complete their projects? ===<br />
<br />
Apertium only accepts applications with a well-defined weekly schedule, clear milestones and deliverables, and, if possible, a section on risk management (risks, their probability, their severity, & mitigating actions). Applications should also plan for holidays, exams, and other absences. Contributors will be encouraged to let us know if they need to reschedule or take a break if needed. Contributors may also need consultation when they are stuck, or personal matters interfere with their work: we will, as we have in the past, try our best to reach out to them, be open and friendly, and provide as much support as we can to help them out. We've been in situations like this too! Detailed scheduling will avoid both mentors and contributors wasting time. If a mentor reports the unscheduled disappearance of a contributors (unexpected 72-hour silence), the contributors will be contacted by the administrators. If silence persists, their task will be frozen and we will report to Google, to proceed according to the rules of GSoC.<br />
<br />
=== How will you get your GSoC contributors involved in your community during GSoC? ===<br />
<br />
First, we encourage all prospective contributors to visit our IRC channel (irc.oftc.net#apertium) as often as possible, even before the start of the program, since that will help them find a suitable mentor and a useful project that they can work on. We advise them strongly to read our wiki pages and manuals, use our system, try to break it and fix it, and finally tell us about it. As a result, contributors get familiar with Apertium before the coding period starts, which increases their chances of ending up with a successful project. In addition, we define coding challenges for each of the proposed projects, which serve both as an entry task, and as a means for getting our contributors familiar with Apertium and involved in our community in the early stages of the program. Finally, during the coding stage, we are available to talk to our contributors on a daily basis and give them suggestions and advice when they get stuck.<br />
<br />
=== Anything else we should know? (optional) ===<br />
<br />
<br />
=== Is your organization part of any government? ===<br />
No<br />
<br />
== Program Application ==<br />
<br />
=== Ideas list ===<br />
<br />
https://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code<br />
<br />
=== Mentors ===<br />
(How many Mentors does your Organization have available to participate in this program?)<br />
<br />
* Daniel<br />
* Jonathan<br />
(add your names here!)<br />
<br />
=== Program Retention Survey ===<br />
<br />
(We're looking for more details on how many of your students/GSoC contributors from the above program are still active in your community today.)<br />
<br />
* Number of accepted students/contributors: 10<br />
* Number of those participants that are still active today: 3</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=PMC_election/2022&diff=74038PMC election/20222022-05-02T08:51:14Z<p>Tino Didriksen: </p>
<hr />
<div>== 2022 ==<br />
<br />
A call for candidates was made on [https://www.mail-archive.com/apertium-stuff@lists.sourceforge.net/msg09011.html 2022-04-19]. After a week, on [https://www.mail-archive.com/apertium-stuff@lists.sourceforge.net/msg09025.html 2022-04-26] there were 7 candidates for PMC and 2 candidates for President. In the interest of efficiency, Tino Didriksen proposed to yield his president candidacy and settle the election by [https://en.wikipedia.org/wiki/Unanimous_consent unanimous consent].<br />
<br />
Given lack of dissent from Apertium voters not running for any seats, unanimous consent was achieved on 2022-04-30.<br />
<br />
New PMC took office on 2022-05-02.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Name !! status<br />
|-<br />
| Francis M. Tyers || president<br />
|-<br />
| Mikel L. Forcada || elected<br />
|-<br />
| Tino Didriksen || elected<br />
|-<br />
| Xavier Ivars || elected<br />
|-<br />
| Jonathan North Washington || elected<br />
|-<br />
| Kevin Brubeck Unhammer || elected<br />
|-<br />
| Tanmai Khanna || elected<br />
|}<br />
<br />
== Previous years ==<br />
* [[PMC election/2013]]<br />
* [[PMC election/2014]]<br />
* [[PMC election/2017]]<br />
* [[PMC election/2020]]<br />
<br />
[[Category:Project Management Committee]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=PMC_election&diff=74037PMC election2022-05-02T08:50:10Z<p>Tino Didriksen: Changed redirect target from PMC election/2020 to PMC election/2022</p>
<hr />
<div>#REDIRECT [[PMC election/2022]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Project_Management_Committee&diff=74036Project Management Committee2022-04-29T10:22:58Z<p>Tino Didriksen: </p>
<hr />
<div>The '''Project Management Committee''' is a group of seven Apertium committers, elected to be in charge of things like granting commit rights, signing off on releases, managing repositories and web sites, distributing funds and so on. See the [[Bylaws#Project Management Committee|Bylaws]] for details.<br />
<br />
After the elections in [[PMC_election|april 2022]], the committee is composed of the following members<br />
<br />
{| class="wikitable"<br />
|-<br />
! Name !! status<br />
|-<br />
| Francis M. Tyers || president<br />
|-<br />
| Mikel L. Forcada || elected<br />
|-<br />
| Tino Didriksen || elected<br />
|-<br />
| Xavier Ivars || elected<br />
|-<br />
| Jonathan North Washington || elected<br />
|-<br />
| Kevin Brubeck Unhammer || elected<br />
|-<br />
| Tanmai Khanna || elected<br />
|}<br />
<br />
[[Category:Project Management Committee]][[Category:Governance]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=PMC_election/2022&diff=74035PMC election/20222022-04-29T10:21:51Z<p>Tino Didriksen: Created page with "== 2022 == A call for candidates was made on [https://www.mail-archive.com/apertium-stuff@lists.sourceforge.net/msg09011.html 2022-04-19]. After a week, on [https://www.mail-..."</p>
<hr />
<div>== 2022 ==<br />
<br />
A call for candidates was made on [https://www.mail-archive.com/apertium-stuff@lists.sourceforge.net/msg09011.html 2022-04-19]. After a week, on [https://www.mail-archive.com/apertium-stuff@lists.sourceforge.net/msg09025.html 2022-04-26] there were 7 candidates for PMC and 2 candidates for President. In the interest of efficiency, Tino Didriksen proposed to yield his president candidacy and settle the election by [https://en.wikipedia.org/wiki/Unanimous_consent unanimous consent].<br />
<br />
Given lack of dissent from Apertium voters not running for any seats, unanimous consent was achieved on 2022-04-30.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Name !! status<br />
|-<br />
| Francis M. Tyers || president<br />
|-<br />
| Mikel L. Forcada || elected<br />
|-<br />
| Tino Didriksen || elected<br />
|-<br />
| Xavier Ivars || elected<br />
|-<br />
| Jonathan North Washington || elected<br />
|-<br />
| Kevin Brubeck Unhammer || elected<br />
|-<br />
| Tanmai Khanna || elected<br />
|}<br />
<br />
== Previous years ==<br />
* [[PMC election/2013]]<br />
* [[PMC election/2014]]<br />
* [[PMC election/2017]]<br />
* [[PMC election/2020]]<br />
<br />
[[Category:Project Management Committee]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Google_Season_of_Docs_2022/Organize_and_Update_Apertium_User_Documentation&diff=73949Google Season of Docs 2022/Organize and Update Apertium User Documentation2022-03-25T17:59:18Z<p>Tino Didriksen: Protected "Google Season of Docs 2022/Organize and Update Apertium User Documentation": Locked for Google's review ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite))</p>
<hr />
<div><br />
== About Apertium ==<br />
<br />
Apertium (current version 3.8, first release 2004) is a free and open source (mainly GPLv3) rule-based machine translation and language technology platform. We have over 500 languages and pairs, maintained using 15+ different tools, with contributors from all around the globe.<br />
<br />
== About the project ==<br />
<br />
=== The problem ===<br />
[https://wiki.apertium.org Apertium's wiki] and other documentation are out of date, poorly organized, not visible enough, and just plain not user-friendly.<br />
<br />
This ranges from documentation of individual tools not reflecting their current state, to our best how-to guides reflecting how things were done a decade ago. Documentation is scattered between the Apertium wiki, individual GitHub repos, an out-of-date pdf "Book", and even published papers and third party sites.<br />
<br />
The result is new users and contributors wasting time reading out-of-date materials, and even long-time contributors having no way to be aware of changes to the tools they use.<br />
<br />
=== The solution ===<br />
<br />
Following the 4-part division proposed by https://documentation.divio.com into Reference, Tutorials, How-to Guides, and Explanations, this project will gather and reorganize existing documentation into a single, easily-located, authoritative source to replace the existing hodge-podge of often unmaintained fragments.<br />
<br />
The majority of existing documents will fall under Reference and Tutorials, which will then be expanded and updated to reflect the current state of all the commonly used components of a translation system.<br />
<br />
How-to Guides and Explanations, on the other hand, will be gathered and those that are outdated will be corrected, but expansion of this material will primarily take the form of examples and guidelines for future contributors.<br />
<br />
=== The scope ===<br />
<br />
* Overview of the Apertium platform<br />
* Reference documentation and tutorials for all stages of the Apertium pipeline<br />
* Organized collection of how-to guides and background material<br />
<br />
=== Measuring success ===<br />
<br />
Unfortunately, the only metric we have is how many people contact us either via mailing list or IRC, and that number has fallen drastically during the Covid-19 pandemic. But from both feedback and direct questioning, we know contributors (potential and current) manage to find incorrect or outdated documentation, or are puzzled about as-yet undocumented features and behavior.<br />
<br />
So the way we would measure success is that the number of contributors somehow winding up following an old tutorial drops close to zero.<br />
<br />
=== Timeline ===<br />
<br />
Our technical writer is an active contributor who is very familiar with the various components of Apertium. We estimate that this project will take 3-4 months. A conservative timeline is given below.<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Time Period<br />
! Goal<br />
! Details<br />
! Deliverable<br />
|-<br />
| '''Phase 1: Reference'''<br />
|<br />
|<br />
|<br />
|-<br />
| Week 1<br />
May 1-7<br />
| Gather and convert existing documentation<br />
|<br />
* Set up repo for canonical copy<br />
* Copy all existing docs to canonical repo<br />
* Delete outdated info<br />
| Single canonical source containing existing info<br />
|-<br />
| Weeks 2-4<br />
May 8-28<br />
| Fill in gaps in formal docs<br />
|<br />
* (see [[#Formal_descriptions]])<br />
| Up-to-date formal documentation of main pipeline modules and common build scripts<br />
|-<br />
| '''Phase 2: Tutorials'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 5-7<br />
May 29-June 18<br />
| Dictionary tutorials<br />
|<br />
* Basic introduction to shell and common Apertium-related commands<br />
* Guidance for selecting arguments for apertium-init<br />
* Instructions for going from a linguistic paradigm to monodix/lexc/lexd<br />
* Introduction to twol<br />
| Information sufficient to get a beginner set up and contributing to lexicons<br />
|-<br />
| Weeks 8-10<br />
June 19-July 2<br />
| Transfer tutorials<br />
|<br />
* How to go from a word-order or agreement difference to a working transfer rule in either formalism<br />
| Systematic tutorial for writing transfer rules<br />
|-<br />
| Weeks 11-13<br />
July 3-23<br />
| Other tutorials<br />
|<br />
* Lexical selection<br />
* Training a tagger<br />
* Writing CG rules<br />
* Anaphora resolution<br />
* Separable<br />
| End-to-end tutorial for the translation pipeline<br />
|-<br />
| '''Phase 3: Explanation'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 14-15<br />
July 24-August 6<br />
| Theoretical background<br />
|<br />
* RBMT<br />
* FSTs<br />
* other things, if time<br />
| Introductions to why Apertium uses the technology that it does<br />
|-<br />
| '''Phase 4: How-to guides and code structure'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 16-18<br />
August 7-27<br />
| How-to and code<br />
|<br />
* A few how-to guides and make it easy to add more<br />
* For each core repo:<br />
** Document listing the general purpose of each source file<br />
** Doc-comment for each noteworthy function<br />
** Outline of the operation and control flow of each class corresponding to an executable<br />
| Guidelines for contributing to the code<br />
|}<br />
<br />
== Budget ==<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Budget item<br />
! Amount<br />
|-<br />
| Paying technical writer<br />
| $6000<br />
|-<br />
! TOTAL:<br />
! $6000<br />
|}<br />
<br />
We considered adding a $500 "just in case" post, but we can't imagine anything else to cover. We've never paid org mentors, and we don't need to restore from ancient archives or broken hardware - and even if we did, it'd likely be faster to just rewrite that part.<br />
<br />
== Additional information ==<br />
<br />
Apertium has participated in Google Summer of Code 12 times: 2009, 2010, 2011, 2012, 2013, 2014, 2016, 2017, 2018, 2019, 2020, and 2021.<br />
<br />
The technical writer participated in GSoC as a student in 2019 and 2021, and as a mentor in 2020.<br />
<br />
== Appendix: Survey of existing documentation ==<br />
<br />
=== Formal descriptions ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Source<br />
! Mostly Complete<br />
! Partial<br />
|-<br />
| [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf 2.0 docs]<br />
|<br />
* stream format<br />
* transfer<br />
* monodix<br />
* bidix<br />
|<br />
* tagger<br />
* lrx<br />
* format handling<br />
|-<br />
| wiki<br />
|<br />
* recursive<br />
* anaphora<br />
* regtest<br />
|<br />
* separable<br />
* makefiles and modes<br />
|-<br />
| github<br />
|<br />
* lexd<br />
|<br />
|-<br />
| external sources<br />
|<br />
* HFST (probably don't redo)<br />
* CG3 (link to, don't redo)<br />
|<br />
|}<br />
<br />
missing:<br />
<br />
* common build scripts (filter-rules, etc)<br />
* postgenerator?<br />
<br />
=== Tutorials ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Source<br />
! Substantive<br />
! Fragmentary<br />
|-<br />
| Apertium wiki<br />
|<br />
* monodix<br />
* bidix<br />
* init<br />
|<br />
* transfer<br />
* recursive<br />
* anaphora<br />
|-<br />
| [[User:Firespeaker]]'s course wiki<br />
|<br />
* lexd<br />
* bidix<br />
* lrx<br />
* recursive<br />
|<br />
* CG3<br />
|}<br />
<br />
missing:<br />
<br />
* HFST<br />
* tagger<br />
* separable<br />
* regtest</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Google_Season_of_Docs_2022/Organize_and_Update_Apertium_User_Documentation&diff=73946Google Season of Docs 2022/Organize and Update Apertium User Documentation2022-03-25T10:35:50Z<p>Tino Didriksen: /* Measuring success */</p>
<hr />
<div><br />
== About Apertium ==<br />
<br />
Apertium (current version 3.8, first release 2004) is a free and open source (mainly GPLv3) rule-based machine translation and language technology platform. We have over 500 languages and pairs, maintained using 15+ different tools, with contributors from all around the globe.<br />
<br />
== About the project ==<br />
<br />
=== The problem ===<br />
[https://wiki.apertium.org Apertium's wiki] and other documentation are out of date, poorly organized, not visible enough, and just plain not user-friendly.<br />
<br />
This ranges from documentation of individual tools not reflecting their current state, to our best how-to guides reflecting how things were done a decade ago. Documentation is scattered between the Apertium wiki, individual GitHub repos, an out-of-date pdf "Book", and even published papers and third party sites.<br />
<br />
The result is new users and contributors wasting time reading out-of-date materials, and even long-time contributors having no way to be aware of changes to the tools they use.<br />
<br />
=== The solution ===<br />
<br />
Following the 4-part division proposed by https://documentation.divio.com into Reference, Tutorials, How-to Guides, and Explanations, this project will gather and reorganize existing documentation into a single, easily-located, authoritative source to replace the existing hodge-podge of often unmaintained fragments.<br />
<br />
The majority of existing documents will fall under Reference and Tutorials, which will then be expanded and updated to reflect the current state of all the commonly used components of a translation system.<br />
<br />
How-to Guides and Explanations, on the other hand, will be gathered and those that are outdated will be corrected, but expansion of this material will primarily take the form of examples and guidelines for future contributors.<br />
<br />
=== The scope ===<br />
<br />
* Overview of the Apertium platform<br />
* Reference documentation and tutorials for all stages of the Apertium pipeline<br />
* Organized collection of how-to guides and background material<br />
<br />
=== Measuring success ===<br />
<br />
Unfortunately, the only metric we have is how many people contact us either via mailing list or IRC, and that number has fallen drastically during the Covid-19 pandemic. But from both feedback and direct questioning, we know contributors (potential and current) manage to find incorrect or outdated documentation, or are puzzled about as-yet undocumented features and behavior.<br />
<br />
So the way we would measure success is that the number of contributors somehow winding up following an old tutorial drops close to zero.<br />
<br />
=== Timeline ===<br />
<br />
Our technical writer is an active contributor who is very familiar with the various components of Apertium. We estimate that this project will take 3-4 months. A conservative timeline is given below.<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Time Period<br />
! Goal<br />
! Details<br />
! Deliverable<br />
|-<br />
| '''Phase 1: Reference'''<br />
|<br />
|<br />
|<br />
|-<br />
| Week 1<br />
May 1-7<br />
| Gather and convert existing documentation<br />
|<br />
* Set up repo for canonical copy<br />
* Copy all existing docs to canonical repo<br />
* Delete outdated info<br />
| Single canonical source containing existing info<br />
|-<br />
| Weeks 2-4<br />
May 8-28<br />
| Fill in gaps in formal docs<br />
|<br />
* (see [[#Formal_descriptions]])<br />
| Up-to-date formal documentation of main pipeline modules and common build scripts<br />
|-<br />
| '''Phase 2: Tutorials'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 5-7<br />
May 29-June 18<br />
| Dictionary tutorials<br />
|<br />
* Basic introduction to shell and common Apertium-related commands<br />
* Guidance for selecting arguments for apertium-init<br />
* Instructions for going from a linguistic paradigm to monodix/lexc/lexd<br />
* Introduction to twol<br />
| Information sufficient to get a beginner set up and contributing to lexicons<br />
|-<br />
| Weeks 8-10<br />
June 19-July 2<br />
| Transfer tutorials<br />
|<br />
* How to go from a word-order or agreement difference to a working transfer rule in either formalism<br />
| Systematic tutorial for writing transfer rules<br />
|-<br />
| Weeks 11-13<br />
July 3-23<br />
| Other tutorials<br />
|<br />
* Lexical selection<br />
* Training a tagger<br />
* Writing CG rules<br />
* Anaphora resolution<br />
* Separable<br />
| End-to-end tutorial for the translation pipeline<br />
|-<br />
| '''Phase 3: Explanation'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 14-15<br />
July 24-August 6<br />
| Theoretical background<br />
|<br />
* RBMT<br />
* FSTs<br />
* other things, if time<br />
| Introductions to why Apertium uses the technology that it does<br />
|-<br />
| '''Phase 4: How-to guides and code structure'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 16-18<br />
August 7-27<br />
| How-to and code<br />
|<br />
* A few how-to guides and make it easy to add more<br />
* For each core repo:<br />
** Document listing the general purpose of each source file<br />
** Doc-comment for each noteworthy function<br />
** Outline of the operation and control flow of each class corresponding to an executable<br />
| Guidelines for contributing to the code<br />
|}<br />
<br />
== Budget ==<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Budget item<br />
! Amount<br />
|-<br />
| Paying technical writer<br />
| $6000<br />
|-<br />
! TOTAL:<br />
! $6000<br />
|}<br />
<br />
We considered adding a $500 "just in case" post, but we can't imagine anything else to cover. We've never paid org mentors, and we don't need to restore from ancient archives or broken hardware - and even if we did, it'd likely be faster to just rewrite that part.<br />
<br />
== Additional information ==<br />
<br />
Apertium has participated in Google Summer of Code 12 times: 2009, 2010, 2011, 2012, 2013, 2014, 2016, 2017, 2018, 2019, 2020, and 2021.<br />
<br />
The technical writer participated in GSoC as a student in 2019 and 2021, and as a mentor in 2020.<br />
<br />
== Appendix: Survey of existing documentation ==<br />
<br />
=== Formal descriptions ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Source<br />
! Mostly Complete<br />
! Partial<br />
|-<br />
| [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf 2.0 docs]<br />
|<br />
* stream format<br />
* transfer<br />
* monodix<br />
* bidix<br />
|<br />
* tagger<br />
* lrx<br />
* format handling<br />
|-<br />
| wiki<br />
|<br />
* recursive<br />
* anaphora<br />
* regtest<br />
|<br />
* separable<br />
* makefiles and modes<br />
|-<br />
| github<br />
|<br />
* lexd<br />
|<br />
|-<br />
| external sources<br />
|<br />
* HFST (probably don't redo)<br />
* CG3 (link to, don't redo)<br />
|<br />
|}<br />
<br />
missing:<br />
<br />
* common build scripts (filter-rules, etc)<br />
* postgenerator?<br />
<br />
=== Tutorials ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Source<br />
! Substantive<br />
! Fragmentary<br />
|-<br />
| Apertium wiki<br />
|<br />
* monodix<br />
* bidix<br />
* init<br />
|<br />
* transfer<br />
* recursive<br />
* anaphora<br />
|-<br />
| [[User:Firespeaker]]'s course wiki<br />
|<br />
* lexd<br />
* bidix<br />
* lrx<br />
* recursive<br />
|<br />
* CG3<br />
|}<br />
<br />
missing:<br />
<br />
* HFST<br />
* tagger<br />
* separable<br />
* regtest</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Google_Season_of_Docs_2022/Organize_and_Update_Apertium_User_Documentation&diff=73945Google Season of Docs 2022/Organize and Update Apertium User Documentation2022-03-25T10:31:06Z<p>Tino Didriksen: /* About Apertium */</p>
<hr />
<div><br />
== About Apertium ==<br />
<br />
Apertium (current version 3.8, first release 2004) is a free and open source (mainly GPLv3) rule-based machine translation and language technology platform. We have over 500 languages and pairs, maintained using 15+ different tools, with contributors from all around the globe.<br />
<br />
== About the project ==<br />
<br />
=== The problem ===<br />
[https://wiki.apertium.org Apertium's wiki] and other documentation are out of date, poorly organized, not visible enough, and just plain not user-friendly.<br />
<br />
This ranges from documentation of individual tools not reflecting their current state, to our best how-to guides reflecting how things were done a decade ago. Documentation is scattered between the Apertium wiki, individual GitHub repos, an out-of-date pdf "Book", and even published papers and third party sites.<br />
<br />
The result is new users and contributors wasting time reading out-of-date materials, and even long-time contributors having no way to be aware of changes to the tools they use.<br />
<br />
=== The solution ===<br />
<br />
Following the 4-part division proposed by https://documentation.divio.com into Reference, Tutorials, How-to Guides, and Explanations, this project will gather and reorganize existing documentation into a single, easily-located, authoritative source to replace the existing hodge-podge of often unmaintained fragments.<br />
<br />
The majority of existing documents will fall under Reference and Tutorials, which will then be expanded and updated to reflect the current state of all the commonly used components of a translation system.<br />
<br />
How-to Guides and Explanations, on the other hand, will be gathered and those that are outdated will be corrected, but expansion of this material will primarily take the form of examples and guidelines for future contributors.<br />
<br />
=== The scope ===<br />
<br />
* Overview of the Apertium platform<br />
* Reference documentation and tutorials for all stages of the Apertium pipeline<br />
* Organized collection of how-to guides and background material<br />
<br />
=== Measuring success ===<br />
<br />
Unfortunately, the only metric we have is how many people contact us either via mailing list or IRC, and that number has fallen drastically during the Covid-19 pandemic. But from both feedback and direct questioning, we know contributors (potential and current) manage to find incorrect or outdated documentation.<br />
<br />
So the way we would measure success is that the number of contributors somehow winding up following an old tutorial drops close to zero.<br />
<br />
=== Timeline ===<br />
<br />
Our technical writer is an active contributor who is very familiar with the various components of Apertium. We estimate that this project will take 3-4 months. A conservative timeline is given below.<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Time Period<br />
! Goal<br />
! Details<br />
! Deliverable<br />
|-<br />
| '''Phase 1: Reference'''<br />
|<br />
|<br />
|<br />
|-<br />
| Week 1<br />
May 1-7<br />
| Gather and convert existing documentation<br />
|<br />
* Set up repo for canonical copy<br />
* Copy all existing docs to canonical repo<br />
* Delete outdated info<br />
| Single canonical source containing existing info<br />
|-<br />
| Weeks 2-4<br />
May 8-28<br />
| Fill in gaps in formal docs<br />
|<br />
* (see [[#Formal_descriptions]])<br />
| Up-to-date formal documentation of main pipeline modules and common build scripts<br />
|-<br />
| '''Phase 2: Tutorials'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 5-7<br />
May 29-June 18<br />
| Dictionary tutorials<br />
|<br />
* Basic introduction to shell and common Apertium-related commands<br />
* Guidance for selecting arguments for apertium-init<br />
* Instructions for going from a linguistic paradigm to monodix/lexc/lexd<br />
* Introduction to twol<br />
| Information sufficient to get a beginner set up and contributing to lexicons<br />
|-<br />
| Weeks 8-10<br />
June 19-July 2<br />
| Transfer tutorials<br />
|<br />
* How to go from a word-order or agreement difference to a working transfer rule in either formalism<br />
| Systematic tutorial for writing transfer rules<br />
|-<br />
| Weeks 11-13<br />
July 3-23<br />
| Other tutorials<br />
|<br />
* Lexical selection<br />
* Training a tagger<br />
* Writing CG rules<br />
* Anaphora resolution<br />
* Separable<br />
| End-to-end tutorial for the translation pipeline<br />
|-<br />
| '''Phase 3: Explanation'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 14-15<br />
July 24-August 6<br />
| Theoretical background<br />
|<br />
* RBMT<br />
* FSTs<br />
* other things, if time<br />
| Introductions to why Apertium uses the technology that it does<br />
|-<br />
| '''Phase 4: How-to guides and code structure'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 16-18<br />
August 7-27<br />
| How-to and code<br />
|<br />
* A few how-to guides and make it easy to add more<br />
* For each core repo:<br />
** Document listing the general purpose of each source file<br />
** Doc-comment for each noteworthy function<br />
** Outline of the operation and control flow of each class corresponding to an executable<br />
| Guidelines for contributing to the code<br />
|}<br />
<br />
== Budget ==<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Budget item<br />
! Amount<br />
|-<br />
| Paying technical writer<br />
| $6000<br />
|-<br />
! TOTAL:<br />
! $6000<br />
|}<br />
<br />
We considered adding a $500 "just in case" post, but we can't imagine anything else to cover. We've never paid org mentors, and we don't need to restore from ancient archives or broken hardware - and even if we did, it'd likely be faster to just rewrite that part.<br />
<br />
== Additional information ==<br />
<br />
Apertium has participated in Google Summer of Code 12 times: 2009, 2010, 2011, 2012, 2013, 2014, 2016, 2017, 2018, 2019, 2020, and 2021.<br />
<br />
The technical writer participated in GSoC as a student in 2019 and 2021, and as a mentor in 2020.<br />
<br />
== Appendix: Survey of existing documentation ==<br />
<br />
=== Formal descriptions ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Source<br />
! Mostly Complete<br />
! Partial<br />
|-<br />
| [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf 2.0 docs]<br />
|<br />
* stream format<br />
* transfer<br />
* monodix<br />
* bidix<br />
|<br />
* tagger<br />
* lrx<br />
* format handling<br />
|-<br />
| wiki<br />
|<br />
* recursive<br />
* anaphora<br />
* regtest<br />
|<br />
* separable<br />
* makefiles and modes<br />
|-<br />
| github<br />
|<br />
* lexd<br />
|<br />
|-<br />
| external sources<br />
|<br />
* HFST (probably don't redo)<br />
* CG3 (link to, don't redo)<br />
|<br />
|}<br />
<br />
missing:<br />
<br />
* common build scripts (filter-rules, etc)<br />
* postgenerator?<br />
<br />
=== Tutorials ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Source<br />
! Substantive<br />
! Fragmentary<br />
|-<br />
| Apertium wiki<br />
|<br />
* monodix<br />
* bidix<br />
* init<br />
|<br />
* transfer<br />
* recursive<br />
* anaphora<br />
|-<br />
| [[User:Firespeaker]]'s course wiki<br />
|<br />
* lexd<br />
* bidix<br />
* lrx<br />
* recursive<br />
|<br />
* CG3<br />
|}<br />
<br />
missing:<br />
<br />
* HFST<br />
* tagger<br />
* separable<br />
* regtest</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Google_Season_of_Docs_2022/Organize_and_Update_Apertium_User_Documentation&diff=73944Google Season of Docs 2022/Organize and Update Apertium User Documentation2022-03-25T07:06:27Z<p>Tino Didriksen: /* About Apertium */</p>
<hr />
<div><br />
== About Apertium ==<br />
<br />
Apertium (current version 3.8, first release 2004) is a free and open source (mainly GPLv3) rule-based machine translation and language technology platform. We have over 500 languages and pairs, maintained using 15+ different tools.<br />
<br />
== About the project ==<br />
<br />
=== The problem ===<br />
[https://wiki.apertium.org Apertium's wiki] and other documentation are out of date, poorly organized, not visible enough, and just plain not user-friendly.<br />
<br />
This ranges from documentation of individual tools not reflecting their current state, to our best how-to guides reflecting how things were done a decade ago. Documentation is scattered between the Apertium wiki, individual GitHub repos, an out-of-date pdf "Book", and even published papers and third party sites.<br />
<br />
The result is new users and contributors wasting time reading out-of-date materials, and even long-time contributors having no way to be aware of changes to the tools they use.<br />
<br />
=== The solution ===<br />
<br />
Following the 4-part division proposed by https://documentation.divio.com into Reference, Tutorials, How-to Guides, and Explanations, this project will gather and reorganize existing documentation into a single, easily-located, authoritative source to replace the existing hodge-podge of often unmaintained fragments.<br />
<br />
The majority of existing documents will fall under Reference and Tutorials, which will then be expanded and updated to reflect the current state of all the commonly used components of a translation system.<br />
<br />
How-to Guides and Explanations, on the other hand, will be gathered and those that are outdated will be corrected, but expansion of this material will primarily take the form of examples and guidelines for future contributors.<br />
<br />
=== The scope ===<br />
<br />
* Overview of the Apertium platform<br />
* Reference documentation and tutorials for all stages of the Apertium pipeline<br />
* Organized collection of how-to guides and background material<br />
<br />
=== Measuring success ===<br />
<br />
Unfortunately, the only metric we have is how many people contact us either via mailing list or IRC, and that number has fallen drastically during the Covid-19 pandemic. But from both feedback and direct questioning, we know contributors (potential and current) manage to find incorrect or outdated documentation.<br />
<br />
So the way we would measure success is that the number of contributors somehow winding up following an old tutorial drops close to zero.<br />
<br />
=== Timeline ===<br />
<br />
Our technical writer is an active contributor who is very familiar with the various components of Apertium. We estimate that this project will take 3-4 months. A conservative timeline is given below.<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Time Period<br />
! Goal<br />
! Details<br />
! Deliverable<br />
|-<br />
| '''Phase 1: Reference'''<br />
|<br />
|<br />
|<br />
|-<br />
| Week 1<br />
May 1-7<br />
| Gather and convert existing documentation<br />
|<br />
* Set up repo for canonical copy<br />
* Copy all existing docs to canonical repo<br />
* Delete outdated info<br />
| Single canonical source containing existing info<br />
|-<br />
| Weeks 2-4<br />
May 8-28<br />
| Fill in gaps in formal docs<br />
|<br />
* (see [[#Formal_descriptions]])<br />
| Up-to-date formal documentation of main pipeline modules and common build scripts<br />
|-<br />
| '''Phase 2: Tutorials'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 5-7<br />
May 29-June 18<br />
| Dictionary tutorials<br />
|<br />
* Basic introduction to shell and common Apertium-related commands<br />
* Guidance for selecting arguments for apertium-init<br />
* Instructions for going from a linguistic paradigm to monodix/lexc/lexd<br />
* Introduction to twol<br />
| Information sufficient to get a beginner set up and contributing to lexicons<br />
|-<br />
| Weeks 8-10<br />
June 19-July 2<br />
| Transfer tutorials<br />
|<br />
* How to go from a word-order or agreement difference to a working transfer rule in either formalism<br />
| Systematic tutorial for writing transfer rules<br />
|-<br />
| Weeks 11-13<br />
July 3-23<br />
| Other tutorials<br />
|<br />
* Lexical selection<br />
* Training a tagger<br />
* Writing CG rules<br />
* Anaphora resolution<br />
* Separable<br />
| End-to-end tutorial for the translation pipeline<br />
|-<br />
| '''Phase 3: Explanation'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 14-15<br />
July 24-August 6<br />
| Theoretical background<br />
|<br />
* RBMT<br />
* FSTs<br />
* other things, if time<br />
| Introductions to why Apertium uses the technology that it does<br />
|-<br />
| '''Phase 4: How-to guides and code structure'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 16-18<br />
August 7-27<br />
| How-to and code<br />
|<br />
* A few how-to guides and make it easy to add more<br />
* For each core repo:<br />
** Document listing the general purpose of each source file<br />
** Doc-comment for each noteworthy function<br />
** Outline of the operation and control flow of each class corresponding to an executable<br />
| Guidelines for contributing to the code<br />
|}<br />
<br />
== Budget ==<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Budget item<br />
! Amount<br />
|-<br />
| Paying technical writer<br />
| $6000<br />
|-<br />
! TOTAL:<br />
! $6000<br />
|}<br />
<br />
We considered adding a $500 "just in case" post, but we can't imagine anything else to cover. We've never paid org mentors, and we don't need to restore from ancient archives or broken hardware - and even if we did, it'd likely be faster to just rewrite that part.<br />
<br />
== Additional information ==<br />
<br />
Apertium has participated in Google Summer of Code 12 times: 2009, 2010, 2011, 2012, 2013, 2014, 2016, 2017, 2018, 2019, 2020, and 2021.<br />
<br />
The technical writer participated in GSoC as a student in 2019 and 2021, and as a mentor in 2020.<br />
<br />
== Appendix: Survey of existing documentation ==<br />
<br />
=== Formal descriptions ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Source<br />
! Mostly Complete<br />
! Partial<br />
|-<br />
| [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf 2.0 docs]<br />
|<br />
* stream format<br />
* transfer<br />
* monodix<br />
* bidix<br />
|<br />
* tagger<br />
* lrx<br />
* format handling<br />
|-<br />
| wiki<br />
|<br />
* recursive<br />
* anaphora<br />
* regtest<br />
|<br />
* separable<br />
* makefiles and modes<br />
|-<br />
| github<br />
|<br />
* lexd<br />
|<br />
|-<br />
| external sources<br />
|<br />
* HFST (probably don't redo)<br />
* CG3 (link to, don't redo)<br />
|<br />
|}<br />
<br />
missing:<br />
<br />
* common build scripts (filter-rules, etc)<br />
* postgenerator?<br />
<br />
=== Tutorials ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Source<br />
! Substantive<br />
! Fragmentary<br />
|-<br />
| Apertium wiki<br />
|<br />
* monodix<br />
* bidix<br />
* init<br />
|<br />
* transfer<br />
* recursive<br />
* anaphora<br />
|-<br />
| [[User:Firespeaker]]'s course wiki<br />
|<br />
* lexd<br />
* bidix<br />
* lrx<br />
* recursive<br />
|<br />
* CG3<br />
|}<br />
<br />
missing:<br />
<br />
* HFST<br />
* tagger<br />
* separable<br />
* regtest</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Google_Season_of_Docs_2022/Organize_and_Update_Apertium_User_Documentation&diff=73940Google Season of Docs 2022/Organize and Update Apertium User Documentation2022-03-24T21:27:14Z<p>Tino Didriksen: /* Budget */</p>
<hr />
<div><br />
== About Apertium ==<br />
<br />
Apertium (current version 3.6, first release 2004) is a free and open source (mainly GPLv3) rule-based machine translation and language technology platform. We have over 500 languages and pairs, maintained using 15+ different tools.<br />
<br />
== About the project ==<br />
<br />
=== The problem ===<br />
[https://wiki.apertium.org Apertium's wiki] and other documentation are out of date, poorly organized, not visible enough, and just plain not user-friendly.<br />
<br />
This ranges from documentation of individual tools not reflecting their current state, to our best how-to guides reflecting how things were done a decade ago. Documentation is scattered between the Apertium wiki, individual GitHub repos, an out-of-date pdf "Book", and even published papers and third party sites.<br />
<br />
The result is new users and contributors wasting time reading out-of-date materials, and even long-time contributors having no way to be aware of changes to the tools they use.<br />
<br />
=== The solution ===<br />
<br />
The solution to the above problem is to gather the existing documentation and tutorials into a single authoritative source and update them to match the current state of Apertium.<br />
<br />
<!--The solution to the above problem is to create updated documentation for all pipeline modules and/or a full tutorial.<br />
<br />
Ideally documentation on a given tool will exist in a single place, and a full tutorial will also have a single unified source. One possibility is to generate one set of docs from another, or from a single unified source. For example, if we want tools to be documented in both their GitHub repos and on the wiki, we should generate one set of documentation from the other (or a third source). If we want a full tutorial to be on the wiki but also available in PDF format, then we should designate one source as the original and generate the others from them. --><br />
<br />
=== The scope ===<br />
<br />
* Overview of the Apertium platform<br />
* All stages of the Apertium pipeline<br />
* The main approaches to and tools for each stage<br />
<br />
=== Measuring success ===<br />
<br />
Unfortunately, the only metric we have is how many people contact us either via mailing list or IRC, and that number has fallen drastically during the Covid-19 pandemic. But from both feedback and direct questioning, we know contributors (potential and current) manage to find incorrect or outdated documentation.<br />
<br />
So the way we would measure success is that the number of contributors somehow winding up following an old tutorial drops close to zero.<br />
<br />
== Existing Documentation ==<br />
<br />
=== Formal Descriptions ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Source<br />
! Mostly Complete<br />
! Partial<br />
|-<br />
| [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf 2.0 docs]<br />
|<br />
* stream format<br />
* transfer<br />
* monodix<br />
* bidix<br />
|<br />
* tagger<br />
* lrx<br />
* format handling<br />
|-<br />
| wiki<br />
|<br />
* recursive<br />
* anaphora<br />
|<br />
* separable<br />
* makefiles and modes<br />
|-<br />
| github<br />
|<br />
* lexd<br />
|<br />
|-<br />
| external sources<br />
|<br />
* HFST (probably don't redo)<br />
* CG3 (link to, don't redo)<br />
|<br />
|}<br />
<br />
missing:<br />
<br />
* common build scripts (filter-rules, etc)<br />
* postgenerator?<br />
<br />
=== Tutorials ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Source<br />
! Substantive<br />
! Fragmentary<br />
|-<br />
| Apertium wiki<br />
|<br />
* monodix<br />
* bidix<br />
* init<br />
|<br />
* transfer<br />
* recursive<br />
* anaphora<br />
|-<br />
| [[User:Firespeaker]]'s course wiki<br />
|<br />
* lexd<br />
* bidix<br />
* lrx<br />
* recursive<br />
|<br />
* CG3<br />
|}<br />
<br />
missing:<br />
<br />
* HFST<br />
* tagger<br />
* separable<br />
<br />
== Timeline ==<br />
<br />
This follows the 4-part division of https://documentation.divio.com<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Time Period<br />
! Goal<br />
! Details<br />
! Deliverable<br />
|-<br />
| '''Phase 1: Reference'''<br />
|<br />
|<br />
|<br />
|-<br />
| Week 1<br />
May 1-7<br />
| Gather and convert existing documentation<br />
|<br />
* Set up repo for canonical copy<br />
* Copy all existing docs to canonical repo<br />
* Delete outdated info<br />
| Single canonical source containing existing info<br />
|-<br />
| Weeks 2-4<br />
May 8-28<br />
| Fill in gaps in formal docs<br />
|<br />
* (see [[#Formal_descriptions]])<br />
| Up-to-date formal documentation of main pipeline modules and common build scripts<br />
|-<br />
| '''Phase 2: Tutorials'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 5-7<br />
May 29-June 18<br />
| Dictionary tutorials<br />
|<br />
* Basic introduction to shell and common Apertium-related commands<br />
* Guidance for selecting arguments for apertium-init<br />
* Instructions for going from a linguistic paradigm to monodix/lexc/lexd<br />
* Introduction to twol<br />
| Information sufficient to get a beginner set up and contributing to lexicons<br />
|-<br />
| Weeks 8-10<br />
June 19-July 2<br />
| Transfer tutorials<br />
|<br />
* How to go from a word-order or agreement difference to a working transfer rule in either formalism<br />
| Systematic tutorial for writing transfer rules<br />
|-<br />
| Weeks 11-13<br />
July 3-23<br />
| Other tutorials<br />
|<br />
* Lexical selection<br />
* Training a tagger<br />
* Writing CG rules<br />
* Anaphora resolution<br />
* Separable<br />
| End-to-end tutorial for the translation pipeline<br />
|-<br />
| '''Phase 3: Explanation'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 14-15<br />
July 24-August 6<br />
| Theoretical background<br />
|<br />
* RBMT<br />
* FSTs<br />
* other things, if time<br />
| Introductions to why Apertium uses the technology that it does<br />
|-<br />
| '''Phase 4: How-to guides and code structure'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 16-18<br />
August 7-27<br />
| How-to and code<br />
|<br />
* A few how-to guides and make it easy to add more<br />
* For each core repo:<br />
** Document listing the general purpose of each source file<br />
** Doc-comment for each noteworthy function<br />
** Outline of the operation and control flow of each class corresponding to an executable<br />
| Guidelines for contributing to the code<br />
|}<br />
<br />
== Budget ==<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Budget item<br />
! Amount<br />
|-<br />
| Paying technical writer<br />
| $6000<br />
|-<br />
! TOTAL:<br />
! $6000<br />
|}<br />
<br />
We considered adding a $500 "just in case" post, but we can't imagine anything else to cover. We've never paid org mentors, and we don't need to restore from ancient archives or broken hardware - and even if we did, it'd likely be faster to just rewrite that part.<br />
<br />
== Additional Information ==<br />
<br />
Apertium has participated in Google Summer of Code 12 times: 2009, 2010, 2011, 2012, 2013, 2014, 2016, 2017, 2018, 2019, 2020, and 2021.<br />
<br />
The technical writer participated in GSoC as a student in 2019 and 2021, and as a mentor in 2020.</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Google_Season_of_Docs_2022/Organize_and_Update_Apertium_User_Documentation&diff=73935Google Season of Docs 2022/Organize and Update Apertium User Documentation2022-03-24T11:16:02Z<p>Tino Didriksen: /* Measuring success */</p>
<hr />
<div><br />
== About Apertium ==<br />
<br />
Apertium is a free and open source machine translation and language technology platform. We have over 500 languages and pairs, maintained using 15+ different tools.<br />
<br />
== About the project ==<br />
<br />
=== The problem ===<br />
[https://wiki.apertium.org Apertium's wiki] and other documentation are out of date, poorly organized, not visible enough, and just plain not user-friendly.<br />
<br />
This ranges from documentation of individual tools not reflecting their current state, to our best how-to guides reflecting how things were done a decade ago. Documentation is scattered between the Apertium wiki, individual GitHub repos, an out-of-date pdf "Book", and even published papers and third party sites.<br />
<br />
The result is new users and contributors wasting time reading out-of-date materials, and even long-time contributors having no way to be aware of changes to the tools they use.<br />
<br />
=== The solution ===<br />
<br />
The solution to the above problem is to create updated documentation for all pipeline modules and/or a full tutorial.<br />
<br />
Ideally documentation on a given tool will exist in a single place, and a full tutorial will also have a single unified source. One possibility is to generate one set of docs from another, or from a single unified source. For example, if we want tools to be documented in both their GitHub repos and on the wiki, we should generate one set of documentation from the other (or a third source). If we want a full tutorial to be on the wiki but also available in PDF format, then we should designate one source as the original and generate the others from them. <br />
<br />
=== The scope ===<br />
<br />
* Overview of the Apertium platform<br />
* All stages of the Apertium pipeline<br />
* The main approaches to and tools for each stage<br />
<br />
=== Measuring success ===<br />
<br />
Unfortunately, the only metric we have is how many people contact us either via mailing list or IRC, and that number has fallen drastically during the Covid-19 pandemic. But from both feedback and direct questioning, we know contributors (potential and current) manage to find incorrect or outdated documentation.<br />
<br />
So the way we would measure success is that the number of contributors somehow winding up following an old tutorial drops close to zero.<br />
<br />
== Existing Documentation ==<br />
<br />
=== Formal Descriptions ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Source<br />
! Mostly Complete<br />
! Partial<br />
|-<br />
| [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf 2.0 docs]<br />
|<br />
* stream format<br />
* transfer<br />
* monodix<br />
* bidix<br />
|<br />
* tagger<br />
* lrx<br />
* format handling<br />
|-<br />
| wiki<br />
|<br />
* recursive<br />
* anaphora<br />
|<br />
* separable<br />
* makefiles and modes<br />
|-<br />
| github<br />
|<br />
* lexd<br />
|<br />
|-<br />
| external sources<br />
|<br />
* HFST (probably don't redo)<br />
* CG3 (link to, don't redo)<br />
|<br />
|}<br />
<br />
missing:<br />
<br />
* build scripts (filter-rules, etc)<br />
* spellchecker<br />
* postgenerator?<br />
<br />
=== Tutorials ===<br />
<br />
Even things in the "substantive" column will likely need a fair amount of work for the purposes of this project.<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Source<br />
! Substantive<br />
! Fragmentary<br />
|-<br />
| Apertium wiki<br />
|<br />
* monodix<br />
* bidix<br />
* init<br />
|<br />
* transfer<br />
* recursive<br />
* anaphora<br />
|-<br />
| [[User:Firespeaker]]'s course wiki<br />
|<br />
* lexd<br />
* bidix<br />
* lrx<br />
* recursive<br />
|<br />
* CG3<br />
|}<br />
<br />
missing:<br />
<br />
* HFST<br />
* tagger<br />
* separable<br />
<br />
== Timeline ==<br />
<br />
This follows the 4-part division of https://documentation.divio.com<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Time Period<br />
! Goal<br />
! Details<br />
! Deliverable<br />
|-<br />
| '''Phase 1: Reference'''<br />
|<br />
|<br />
|<br />
|-<br />
| Week 1<br />
May 1-7<br />
| Gather and convert existing documentation<br />
|<br />
* Set up repo for canonical copy<br />
* Copy all existing docs to canonical repo<br />
* Delete outdated info<br />
| Single canonical source containing existing info<br />
|-<br />
| Weeks 2-4<br />
May 8-28<br />
| Fill in gaps in formal docs<br />
|<br />
* (see [[#Formal_descriptions]])<br />
| Up-to-date formal documentation of main pipeline modules and common build scripts<br />
|-<br />
| '''Phase 2: Tutorials'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 5-7<br />
May 29-June 18<br />
| Dictionary tutorials<br />
|<br />
* Basic introduction to shell and common Apertium-related commands<br />
* Guidance for selecting arguments for apertium-init<br />
* Instructions for going from a linguistic paradigm to monodix/lexc/lexd<br />
* Introduction to twol<br />
| Information sufficient to get a beginner set up and contributing to lexicons<br />
|-<br />
| Weeks 8-10<br />
June 19-July 2<br />
| Transfer tutorials<br />
|<br />
* How to go from a word-order or agreement difference to a working transfer rule in either formalism<br />
| Systematic tutorial for writing transfer rules<br />
|-<br />
| Weeks 11-13<br />
July 3-23<br />
| Other tutorials<br />
|<br />
* Lexical selection<br />
* Training a tagger<br />
* Writing CG rules<br />
* Anaphora resolution<br />
* Separable<br />
| End-to-end tutorial for the translation pipeline<br />
|-<br />
| '''Phase 3: Explanation'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 14-15<br />
July 24-August 6<br />
| Theoretical background<br />
|<br />
* RBMT<br />
* FSTs<br />
* other things, if time<br />
| Introductions to why Apertium uses the technology that it does<br />
|-<br />
| '''Phase 4: How-to guides and code structure'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 16-18<br />
August 7-27<br />
| How-to and code<br />
|<br />
* A few how-to guides and make it easy to add more<br />
* For each core repo:<br />
** Document listing the general purpose of each source file<br />
** Doc-comment for each noteworthy function<br />
** Outline of the operation and control flow of each class corresponding to an executable<br />
| Guidelines for contributing to the code<br />
|}<br />
<br />
== Budget ==</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Google_Season_of_Docs_2022/Organize_and_Update_Apertium_User_Documentation&diff=73934Google Season of Docs 2022/Organize and Update Apertium User Documentation2022-03-24T11:02:56Z<p>Tino Didriksen: /* About Apertium */</p>
<hr />
<div><br />
== About Apertium ==<br />
<br />
Apertium is a free and open source machine translation and language technology platform. We have over 500 languages and pairs, maintained using 15+ different tools.<br />
<br />
== About the project ==<br />
<br />
=== The problem ===<br />
[https://wiki.apertium.org Apertium's wiki] and other documentation are out of date, poorly organized, not visible enough, and just plain not user-friendly.<br />
<br />
This ranges from documentation of individual tools not reflecting their current state, to our best how-to guides reflecting how things were done a decade ago. Documentation is scattered between the Apertium wiki, individual GitHub repos, an out-of-date pdf "Book", and even published papers and third party sites.<br />
<br />
The result is new users and contributors wasting time reading out-of-date materials, and even long-time contributors having no way to be aware of changes to the tools they use.<br />
<br />
=== The solution ===<br />
<br />
The solution to the above problem is to create updated documentation for all pipeline modules and/or a full tutorial.<br />
<br />
Ideally documentation on a given tool will exist in a single place, and a full tutorial will also have a single unified source. One possibility is to generate one set of docs from another, or from a single unified source. For example, if we want tools to be documented in both their GitHub repos and on the wiki, we should generate one set of documentation from the other (or a third source). If we want a full tutorial to be on the wiki but also available in PDF format, then we should designate one source as the original and generate the others from them. <br />
<br />
=== The scope ===<br />
<br />
* Overview of the Apertium platform<br />
* All stages of the Apertium pipeline<br />
* The main approaches to and tools for each stage<br />
<br />
=== Measuring success ===<br />
<br />
== Existing Documentation ==<br />
<br />
=== Formal Descriptions ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Source<br />
! Mostly Complete<br />
! Partial<br />
|-<br />
| [https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf 2.0 docs]<br />
|<br />
* stream format<br />
* transfer<br />
* monodix<br />
* bidix<br />
|<br />
* tagger<br />
* lrx<br />
* format handling<br />
|-<br />
| wiki<br />
|<br />
* recursive<br />
* anaphora<br />
|<br />
* separable<br />
* makefiles and modes<br />
|-<br />
| github<br />
|<br />
* lexd<br />
|<br />
|-<br />
| external sources<br />
|<br />
* HFST (probably don't redo)<br />
* CG3 (link to, don't redo)<br />
|<br />
|}<br />
<br />
missing:<br />
<br />
* build scripts (filter-rules, etc)<br />
* spellchecker<br />
* postgenerator?<br />
<br />
=== Tutorials ===<br />
<br />
Even things in the "substantive" column will likely need a fair amount of work for the purposes of this project.<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Source<br />
! Substantive<br />
! Fragmentary<br />
|-<br />
| Apertium wiki<br />
|<br />
* monodix<br />
* bidix<br />
* init<br />
|<br />
* transfer<br />
* recursive<br />
* anaphora<br />
|-<br />
| [[User:Firespeaker]]'s course wiki<br />
|<br />
* lexd<br />
* bidix<br />
* lrx<br />
* recursive<br />
|<br />
* CG3<br />
|}<br />
<br />
missing:<br />
<br />
* HFST<br />
* tagger<br />
* separable<br />
<br />
== Timeline ==<br />
<br />
This follows the 4-part division of https://documentation.divio.com<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Time Period<br />
! Goal<br />
! Details<br />
! Deliverable<br />
|-<br />
| '''Phase 1: Reference'''<br />
|<br />
|<br />
|<br />
|-<br />
| Week 1<br />
May 1-7<br />
| Gather and convert existing documentation<br />
|<br />
* Set up repo for canonical copy<br />
* Copy all existing docs to canonical repo<br />
* Delete outdated info<br />
| Single canonical source containing existing info<br />
|-<br />
| Weeks 2-4<br />
May 8-28<br />
| Fill in gaps in formal docs<br />
|<br />
* (see [[#Formal_descriptions]])<br />
| Up-to-date formal documentation of main pipeline modules and common build scripts<br />
|-<br />
| '''Phase 2: Tutorials'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 5-7<br />
May 29-June 18<br />
| Dictionary tutorials<br />
|<br />
* Basic introduction to shell and common Apertium-related commands<br />
* Guidance for selecting arguments for apertium-init<br />
* Instructions for going from a linguistic paradigm to monodix/lexc/lexd<br />
* Introduction to twol<br />
| Information sufficient to get a beginner set up and contributing to lexicons<br />
|-<br />
| Weeks 8-10<br />
June 19-July 2<br />
| Transfer tutorials<br />
|<br />
* How to go from a word-order or agreement difference to a working transfer rule in either formalism<br />
| Systematic tutorial for writing transfer rules<br />
|-<br />
| Weeks 11-13<br />
July 3-23<br />
| Other tutorials<br />
|<br />
* Lexical selection<br />
* Training a tagger<br />
* Writing CG rules<br />
* Anaphora resolution<br />
* Separable<br />
| End-to-end tutorial for the translation pipeline<br />
|-<br />
| '''Phase 3: Explanation'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 14-15<br />
July 24-August 6<br />
| Theoretical background<br />
|<br />
* RBMT<br />
* FSTs<br />
* other things, if time<br />
| Introductions to why Apertium uses the technology that it does<br />
|-<br />
| '''Phase 4: How-to guides and code structure'''<br />
|<br />
|<br />
|<br />
|-<br />
| Weeks 16-18<br />
August 7-27<br />
| How-to and code<br />
|<br />
* A few how-to guides and make it easy to add more<br />
* For each core repo:<br />
** Document listing the general purpose of each source file<br />
** Doc-comment for each noteworthy function<br />
** Outline of the operation and control flow of each class corresponding to an executable<br />
| Guidelines for contributing to the code<br />
|}<br />
<br />
== Budget ==</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code&diff=73884Ideas for Google Summer of Code2022-02-21T13:24:17Z<p>Tino Didriksen: Missing ]</p>
<hr />
<div>{{TOCD}}<br />
This is the ideas page for [[Google Summer of Code]], here you can find ideas on interesting projects that would make Apertium more useful for people and improve or expand our functionality. If you have an idea please add it below, if you think you could mentor someone in a particular area, add your name to "Interested mentors" using <nowiki>~~~</nowiki> <br />
<br />
The page is intended as an overview of the kind of projects we have in mind. If one of them particularly piques your interest, please come and discuss with us on <code>#apertium</code> on <code>irc.oftc.net</code>, mail the [[Contact|mailing list]], or draw attention to yourself in some other way. <br />
<br />
Note that, if you have an idea that isn't mentioned here, we would be very interested to hear about it.<br />
<br />
Here are some more things you could look at:<br />
<br />
* [[Top tips for GSOC applications]] <br />
* Get in contact with one of our long-serving [[List of Apertium mentors|mentors]] &mdash; they are nice, honest!<br />
* Pages in the [[:Category:Development|development category]]<br />
* Resources that could be converted or expanded in the [[incubator]]. Consider doing or improving a language pair (see [[incubator]], [[nursery]] and [[staging]] for pairs that need work)<br />
* Unhammer's [[User:Unhammer/wishlist|wishlist]]<br />
<!--* The open issues [https://github.com/search?q=org%3Aapertium&state=open&type=Issues on Github] - especially the [https://github.com/search?q=org%3Aapertium+label%3A%22good+first+issue%22&state=open&type=Issues Good First Issues]. --><br />
<br />
__TOC__<br />
<br />
If you're a student trying to propose a topic, the recommended way is to request a wiki account and then go to <pre>http://wiki.apertium.org/wiki/User:[[your username]]/GSoC2021Proposal</pre> and click the "create" button near the top of the page. It's also nice to include <code><nowiki>[[Category:GSoC_2021_student_proposals]]</nowiki></code> to help organize submitted proposals.<br />
<br />
== Ideas ==<br />
<br />
{{IdeaSummary<br />
| name = Python API for Apertium<br />
| difficulty = medium<br />
| skills = C++, Python<br />
| description = Update the Python API for Apertium to expose all Apertium modes and test with all major OSes<br />
| rationale = The current Python API misses out on a lot of functionality, like phonemicisation, segmentation, and transliteration, and doesn't work for some OSes <s>like Debian</s>.<br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /Python API<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = OmniLingo and Apertium<br />
| difficulty = medium<br />
| skills = JS, Python<br />
| description = OmniLingo is a language learning system for practising listening comprehension using Apertium data. There is a lot of text processing involved (for example tokenisation) that could be aided by Apertium tools. <br />
| rationale = <br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /OmniLingo<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Web API extensions<br />
| difficulty = medium<br />
| skills = Python<br />
| description = Update the web API for Apertium to expose all Apertium modes <br />
| rationale = The current Web API misses out on a lot of functionality, like phonemicisation, segmentation, and transliteration<br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Apertium APY<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Develop a morphological analyser<br />
| difficulty = easy<br />
| skills = XML or HFST or lexd<br />
| description = Write a morphological analyser and generator for a language that does not yet have one<br />
| rationale = A key part of an Apertium machine translation system is a morphological analyser and generator. The objective of this task is to create an analyser for a language that does not yet have one.<br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Jonathan Washington]], [[User: Sevilay Bayatlı|Sevilay Bayatlı]], Hossep, nlhowell, [[User:Popcorndude]]<br />
| more = /Morphological analyser<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Support for Enhanced Dependencies in UD Annotatrix<br />
| difficulty = medium<br />
| skills = NodeJS<br />
| description = UD Annotatrix is an annotation interface for Universal Dependencies, but does not yet support all functionality<br />
| rationale = <br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /Morphological analyser<br />
}}<br />
<br />
<!--<br />
This one was done, but could do with more work. Not sure if it's a full gsoc though?<br />
<br />
{{IdeaSummary<br />
| name = User-friendly lexical selection training<br />
| difficulty = Medium<br />
| skills = Python, C++, shell scripting<br />
| description = Make it so that training/inference of lexical selection rules is a more user-friendly process<br />
| rationale = Our lexical selection module allows for inferring rules from corpora and word alignments, but the procedure is currently a bit messy, with various scripts involved that require lots of manual tweaking, and many third party tools to be installed. The goal of this task is to make the procedure as user-friendly as possible, so that ideally only a simple config file would be needed, and a driver script would take care of the rest.<br />
| mentors = [[User:Unhammer|Unhammer]], [[User:Mlforcada|Mikel Forcada]]<br />
| more = /User-friendly lexical selection training<br />
}}<br />
--><br />
<br />
{{IdeaSummary<br />
| name = Robust tokenisation in lttoolbox<br />
| difficulty = Medium<br />
| skills = C++, XML, Python<br />
| description = Improve the longest-match left-to-right tokenisation strategy in [[lttoolbox]] to be fully Unicode compliant.<br />
| rationale = One of the most frustrating things about working with Apertium on texts "in the wild" is the way that the tokenisation works. If a letter is not specified in the alphabet, it is dealt with as whitespace, so e.g. you get unknown words split in two so you can end up with stuff like ^G$ö^k$ı^rmak$ which is terrible for further processing. <br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:TommiPirinen|Flammie]]<br />
| more = /Robust tokenisation<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = apertium-separable language-pair integration<br />
| difficulty = Medium<br />
| skills = XML, a scripting language (Python, Perl), some knowledge of linguistics and/or at least one relevant natural language<br />
| description = Choose a language you can identify as having a good number of "multiwords" in the lexicon. Modify all language pairs in Apertium to use the [[Apertium-separable]] module to process the multiwords, and clean up the dictionaries accordingly.<br />
| rationale = Apertium-separable is a newish module to process lexical items with discontinguous dependencies, an area where Apertium has traditionally fallen short. Despite all the module has to offer, many translation pairs still don't use it.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Popcorndude]]<br />
| more = /Apertium separable<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = UD and Apertium integration<br />
| difficulty = Entry level<br />
| skills = python, javascript, HTML, (C++)<br />
| description = Create a range of tools for making Apertium compatible with Universal Dependencies<br />
| rationale = Universal dependencies is a fast growing project aimed at creating a unified annotation scheme for treebanks. This includes both part-of-speech and morphological features. Their annotated corpora could be extremely useful for Apertium for training models for translation. In addition, Apertium's rule-based morphological descriptions could be useful for software that relies on Universal dependencies.<br />
| mentors = [[User:Francis Tyers]], [[User:Firespeaker| Jonathan Washington]], [[User:Popcorndude]]<br />
| more = /UD and Apertium integration <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = rule visualization tools<br />
| difficulty = Medium<br />
| skills = python? javascript? XML<br />
| description = make tools to help visualize the effect of various rules<br />
| rationale = TODO see https://github.com/Jakespringer/dapertium for an example<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Sevilay Bayatlı|Sevilay Bayatlı]], [[User:Popcorndude]]<br />
| more = /Visualization tools<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = dictionary induction from wikis<br />
| difficulty = Medium<br />
| skills = MySQL, mediawiki syntax, perl, maybe C++ or Java; Java, Scala, RDF, and DBpedia to use DBpedia extraction<br />
| description = Extract dictionaries from linguistic wikis<br />
| rationale = Wiki dictionaries and encyclopedias (e.g. omegawiki, wiktionary, wikipedia, dbpedia) contain information (e.g. bilingual equivalences, morphological features, conjugations) that could be exploited to speed up the development of dictionaries for Apertium. This task aims at automatically building dictionaries by extracting different pieces of information from wiki structures such as interlingual links, infoboxes and/or from dbpedia RDF datasets.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Popcorndude]]<br />
| more = /Dictionary induction from wikis<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Dictionary induction from parallel corpora / Revive ReTraTos<br />
| difficulty = Hard<br />
| skills = C++, perl, python, xml, scripting, machine learning<br />
| description = Extract dictionaries from parallel corpora<br />
| rationale = Given a pair of monolingual modules and a parallel corpus, we should be able to run a program to align tagged sentences and give us the best entries that are missing from bidix. [[ReTraTos]] (from 2008) did this back in 2008, but it's from 2008. We want a program which builds and runs in 2022, and does all the steps for the user.<br />
| mentors = [[User:Unhammer]], [[User:Popcorndude]]<br />
| more = /Dictionary induction from parallel corpora<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Bring an unreleased translation pair to releasable quality<br />
| difficulty = Medium<br />
| skills = shell scripting<br />
| description = Take an unstable language pair and improve its quality, focusing on testvoc<br />
| rationale = Many Apertium language pairs have large dictionaries and have otherwise seen much development, but are not of releasable quality. The point of this project would be bring one translation pair to releasable quality. This would entail obtaining good naïve coverage and a clean [[testvoc]].<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Seviay Bayatlı|Sevilay Bayatlı]], [[User:Hectoralos|Hèctor Alòs i Font]]<br />
| more = /Make a language pair state-of-the-art<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Develop a prototype MT system for a strategic language pair<br />
| difficulty = Medium<br />
| skills = XML, some knowledge of linguistics and of one relevant natural language <br />
| description = Create a translation pair based on two existing language modules, focusing on the dictionary and structural transfer<br />
| rationale = Choose a strategic set of languages to develop an MT system for, such that you know the target language well and morphological transducers for each language are part of Apertium. Develop an Apertium MT system by focusing on writing a bilingual dictionary and structural transfer rules. Expanding the transducers and disambiguation, and writing lexical selection rules and multiword sequences may also be part of the work. The pair may be an existing prototype, but if it's a heavily developed but unreleased pair, consider applying for "Bring an unreleased translation pair to releasable quality" instead.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Sevilay Bayatlı| Sevilay Bayatlı]], [[User:Hectoralos|Hèctor Alòs i Font]]<br />
| more = /Adopt a language pair<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Misc<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Improve elements of Apertium's web infrastructure<br />
| rationale = Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]] have numerous open issues. This project would entail choosing a subset of open issues and features that could realistically be completed in the summer. You're encouraged to speak with the Apertium community to see which features and issues are the most pressing.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Dictionary Lookup<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Finish implementing dictionary lookup mode in Apertium's web infrastructure<br />
| rationale = Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]] have numerous open issues, including half-completed features like dictionary lookup. This project would entail completing the dictionary lookup feature. Some additional features which would be good to work would include automatic reverse lookups (so that a user has a better understanding of the results), grammatical information (such as the gender of nouns or the conjugation paradigms of verbs), and information about MWEs. See [https://github.com/apertium/apertium-html-tools/issues/105 the open issue on GitHub].<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]], [[User:Popcorndude]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Spell checking<br />
| difficulty = Medium<br />
| skills = html, js, css, python<br />
| description = Add a spell-checking interface to Apertium's web tools<br />
| rationale = [[Apertium-html-tools]] has seen some prototypes for spell-checking interfaces (all in stale PRs and branches on GitHub), but none have ended up being quite ready to integrate into the tools. This project would entail polishing up or recreating an interface, and making sure [[APy]] has a mode that allows access to Apertium voikospell modules. The end result should be a slick, easy-to-use interface for proofing text, with intuitive underlining of text deemed to be misspelled and intuitive presentation and selection of alternatives. [https://github.com/apertium/apertium-html-tools/issues/390 the open issue on GitHub]<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Spell checker web interface<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Suggestions<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Finish implementing a suggestions interface for Apertium's web infrastructure<br />
| rationale = Some work has been done to add a "suggestions" interface to Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]], whereby users can suggest corrected translations. This project would entail finishing that feature. There are some related [https://github.com/apertium/apertium-html-tools/issues/55 issues] and [https://github.com/apertium/apertium-html-tools/pull/252 PRs] on GitHub.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Orthography conversion interface<br />
| difficulty = Medium<br />
| skills = html, js, css, python<br />
| description = Add an orthography conversion interface to Apertium's web tools<br />
| rationale = Several Apertium language modules (like Kazakh, Kyrgyz, Crimean Tatar, and Hñähñu) have orthography conversion modes in their mode definition files. This project would be to expose those modes through [[APy|Apertium APy]] and provide a simple interface in [[Apertium-html-tools]] to use them.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Extend Weighted transfer rules<br />
| difficulty = Medium<br />
| skills = C++, python<br />
| description = The weighted transfer module is already applied to the chunker transfer rules. And the idea here is to extend that module to be applied to interchunk and postchunk transfer rules too. <br />
| rationale = As a resource see https://github.com/aboelhamd/Weighted-transfer-rules-module<br />
| mentors = [[User: Sevilay Bayatlı|Sevilay Bayatlı]]<br />
| more = /Make a module <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Automatic Error-Finder / Backpropagation<br />
| difficulty = Medium<br />
| skills = python?<br />
| description = Develop a tool to locate the approximate source of translation errors in the pipeline.<br />
| rationale = Being able to generate a list of probable error sources automatically makes it possible to prioritize issues by frequency, frees up developer time, and is a first step towards automated generation of better rules.<br />
| mentors = [[User:Popcorndude]]<br />
| more = /Backpropagation<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Add support for NMT to web API<br />
| difficulty = Medium<br />
| skills = python, NMT<br />
| description = Add support for a popular NMT engine to Apertium's web API<br />
| rationale = Currently Apertium's web API [[APy|Apertium APy]], supports only Apertium language modules. But the front end could just as easily interface with an API that supports trained NMT models. The point of the project is to add support for one popular NMT package (e.g., translateLocally/Bergamot, OpenNMT or JoeyNMT) to the APy.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]]<br />
| more = <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Localization (l10n/i18n) of Apertium tools<br />
| difficulty = Medium<br />
| skills = C++<br />
| description = All our command line tools are currently hardcoded as English-only and it would be good if this were otherwise. [https://github.com/apertium/organisation/issues/28#issuecomment-803474833 Coding Challenge]<br />
| rationale = ...<br />
| mentors = [[User:Tino_Didriksen|Tino Didriksen]]<br />
| more = https://github.com/apertium/organisation/issues/28 Github<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Leverage and integrate language preferences into language pairs<br />
| difficulty = easy<br />
| skills = XML, some knowledge of linguistics and of one relevant natural language <br />
| description = Update language pairs with lexical and ortographical variations to leverage the new [[Dialectal_or_standard_variation|preferences]] functionality<br />
| rationale = Currently, preferences are implemented via language variant, which relies on multiple diictionaries, increasing exponentially compilation time every time a new preference gets introduced.<br />
| mentors = [[User:Xavivars|Xavi Ivars]]<br />
| more = /Use preferences in SPA-CAT<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Extract morphological data from FLEx<br />
| difficulty = hard<br />
| skills = python, XML parsing<br />
| description = Write a program to extract data from [https://software.sil.org/fieldworks/ SIL FieldWorks] and convert as much as possible to monodix (and maybe bidix).<br />
| rationale = There's a lot of potentially useful data in FieldWorks files that might be enough to build a whole monodix for some languages but it's currently really hard to use<br />
| mentors = [[User:Popcorndude|Popcorndude]], [[User:TommiPirinen|Flammie]]<br />
| more = /FieldWorks_data_extraction<br />
}}<br />
<br />
[[Category:Development]]<br />
[[Category:Google Summer of Code]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code&diff=73878Ideas for Google Summer of Code2022-02-21T12:15:48Z<p>Tino Didriksen: Remove finished Unit Testing Framework (https://github.com/apertium/apertium-regtest)</p>
<hr />
<div>{{TOCD}}<br />
This is the ideas page for [[Google Summer of Code]], here you can find ideas on interesting projects that would make Apertium more useful for people and improve or expand our functionality. If you have an idea please add it below, if you think you could mentor someone in a particular area, add your name to "Interested mentors" using <nowiki>~~~</nowiki> <br />
<br />
The page is intended as an overview of the kind of projects we have in mind. If one of them particularly piques your interest, please come and discuss with us on <code>#apertium</code> on <code>irc.oftc.net</code>, mail the [[Contact|mailing list]], or draw attention to yourself in some other way. <br />
<br />
Note that, if you have an idea that isn't mentioned here, we would be very interested to hear about it.<br />
<br />
Here are some more things you could look at:<br />
<br />
* [[Top tips for GSOC applications]] <br />
* Get in contact with one of our long-serving [[List of Apertium mentors|mentors]] &mdash; they are nice, honest!<br />
* Pages in the [[:Category:Development|development category]]<br />
* Resources that could be converted or expanded in the [[incubator]]. Consider doing or improving a language pair (see [[incubator]], [[nursery]] and [[staging]] for pairs that need work)<br />
* Unhammer's [[User:Unhammer/wishlist|wishlist]]<br />
<!--* The open issues [https://github.com/search?q=org%3Aapertium&state=open&type=Issues on Github] - especially the [https://github.com/search?q=org%3Aapertium+label%3A%22good+first+issue%22&state=open&type=Issues Good First Issues]. --><br />
<br />
__TOC__<br />
<br />
If you're a student trying to propose a topic, the recommended way is to request a wiki account and then go to <pre>http://wiki.apertium.org/wiki/User:[[your username]]/GSoC2021Proposal</pre> and click the "create" button near the top of the page. It's also nice to include <code><nowiki>[[Category:GSoC_2021_student_proposals]]</nowiki></code> to help organize submitted proposals.<br />
<br />
== Ideas ==<br />
<br />
{{IdeaSummary<br />
| name = Python API for Apertium<br />
| difficulty = medium<br />
| skills = C++, Python<br />
| description = Update the Python API for Apertium to expose all Apertium modes and test with all major OSes<br />
| rationale = The current Python API misses out on a lot of functionality, like phonemicisation, segmentation, and transliteration, and doesn't work for some OSes <s>like Debian</s>.<br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /Python API<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = OmniLingo and Apertium<br />
| difficulty = medium<br />
| skills = JS, Python<br />
| description = OmniLingo is a language learning system for practising listening comprehension using Apertium data. There is a lot of text processing involved (for example tokenisation) that could be aided by Apertium tools. <br />
| rationale = <br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /OmniLingo<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Web API extensions<br />
| difficulty = medium<br />
| skills = Python<br />
| description = Update the web API for Apertium to expose all Apertium modes <br />
| rationale = The current Web API misses out on a lot of functionality, like phonemicisation, segmentation, and transliteration<br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Apertium APY<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Develop a morphological analyser<br />
| difficulty = easy<br />
| skills = XML or HFST or lexd<br />
| description = Write a morphological analyser and generator for a language that does not yet have one<br />
| rationale = A key part of an Apertium machine translation system is a morphological analyser and generator. The objective of this task is to create an analyser for a language that does not yet have one.<br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Jonathan Washington]], [[User: Sevilay Bayatlı|Sevilay Bayatlı]], Hossep, nlhowell<br />
| more = /Morphological analyser<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Support for Enhanced Dependencies in UD Annotatrix<br />
| difficulty = medium<br />
| skills = NodeJS<br />
| description = UD Annotatrix is an annotation interface for Universal Dependencies, but does not yet support all functionality<br />
| rationale = <br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /Morphological analyser<br />
}}<br />
<br />
<!--<br />
This one was done, but could do with more work. Not sure if it's a full gsoc though?<br />
<br />
{{IdeaSummary<br />
| name = User-friendly lexical selection training<br />
| difficulty = Medium<br />
| skills = Python, C++, shell scripting<br />
| description = Make it so that training/inference of lexical selection rules is a more user-friendly process<br />
| rationale = Our lexical selection module allows for inferring rules from corpora and word alignments, but the procedure is currently a bit messy, with various scripts involved that require lots of manual tweaking, and many third party tools to be installed. The goal of this task is to make the procedure as user-friendly as possible, so that ideally only a simple config file would be needed, and a driver script would take care of the rest.<br />
| mentors = [[User:Unhammer|Unhammer]], [[User:Mlforcada|Mikel Forcada]]<br />
| more = /User-friendly lexical selection training<br />
}}<br />
--><br />
<br />
{{IdeaSummary<br />
| name = Robust tokenisation in lttoolbox<br />
| difficulty = Medium<br />
| skills = C++, XML, Python<br />
| description = Improve the longest-match left-to-right tokenisation strategy in [[lttoolbox]] to be fully Unicode compliant.<br />
| rationale = One of the most frustrating things about working with Apertium on texts "in the wild" is the way that the tokenisation works. If a letter is not specified in the alphabet, it is dealt with as whitespace, so e.g. you get unknown words split in two so you can end up with stuff like ^G$ö^k$ı^rmak$ which is terrible for further processing. <br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:TommiPirinen|Flammie]]<br />
| more = /Robust tokenisation<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = apertium-separable language-pair integration<br />
| difficulty = Medium<br />
| skills = XML, a scripting language (Python, Perl), some knowledge of linguistics and/or at least one relevant natural language<br />
| description = Choose a language you can identify as having a good number of "multiwords" in the lexicon. Modify all language pairs in Apertium to use the [[Apertium-separable]] module to process the multiwords, and clean up the dictionaries accordingly.<br />
| rationale = Apertium-separable is a newish module to process lexical items with discontinguous dependencies, an area where Apertium has traditionally fallen short. Despite all the module has to offer, many translation pairs still don't use it.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]]<br />
| more = /Apertium separable<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = UD and Apertium integration<br />
| difficulty = Entry level<br />
| skills = python, javascript, HTML, (C++)<br />
| description = Create a range of tools for making Apertium compatible with Universal Dependencies<br />
| rationale = Universal dependencies is a fast growing project aimed at creating a unified annotation scheme for treebanks. This includes both part-of-speech and morphological features. Their annotated corpora could be extremely useful for Apertium for training models for translation. In addition, Apertium's rule-based morphological descriptions could be useful for software that relies on Universal dependencies.<br />
| mentors = [[User:Francis Tyers]] [[User:Firespeaker| Jonathan Washington]]<br />
| more = /UD and Apertium integration <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = rule visualization tools<br />
| difficulty = Medium<br />
| skills = python? javascript? XML<br />
| description = make tools to help visualize the effect of various rules<br />
| rationale = TODO see https://github.com/Jakespringer/dapertium for an example<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Sevilay Bayatlı|Sevilay Bayatlı]]<br />
| more = /Visualization tools<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = dictionary induction from wikis<br />
| difficulty = Medium<br />
| skills = MySQL, mediawiki syntax, perl, maybe C++ or Java; Java, Scala, RDF, and DBpedia to use DBpedia extraction<br />
| description = Extract dictionaries from linguistic wikis<br />
| rationale = Wiki dictionaries and encyclopedias (e.g. omegawiki, wiktionary, wikipedia, dbpedia) contain information (e.g. bilingual equivalences, morphological features, conjugations) that could be exploited to speed up the development of dictionaries for Apertium. This task aims at automatically building dictionaries by extracting different pieces of information from wiki structures such as interlingual links, infoboxes and/or from dbpedia RDF datasets.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]]<br />
| more = /Dictionary induction from wikis<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Dictionary induction from parallel corpora / Revive ReTraTos<br />
| difficulty = Hard<br />
| skills = C++, perl, python, xml, scripting, machine learning<br />
| description = Extract dictionaries from parallel corpora<br />
| rationale = Given a pair of monolingual modules and a parallel corpus, we should be able to run a program to align tagged sentences and give us the best entries that are missing from bidix. [[ReTraTos]] (from 2008) did this back in 2008, but it's from 2008. We want a program which builds and runs in 2022, and does all the steps for the user.<br />
| mentors = [[User:Unhammer]]<br />
| more = /Dictionary induction from parallel corpora<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Bring an unreleased translation pair to releasable quality<br />
| difficulty = Medium<br />
| skills = shell scripting<br />
| description = Take an unstable language pair and improve its quality, focusing on testvoc<br />
| rationale = Many Apertium language pairs have large dictionaries and have otherwise seen much development, but are not of releasable quality. The point of this project would be bring one translation pair to releasable quality. This would entail obtaining good naïve coverage and a clean [[testvoc]].<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Seviay Bayatlı|Sevilay Bayatlı]], [[User:Hectoralos|Hèctor Alòs i Font]]<br />
| more = /Make a language pair state-of-the-art<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Develop a prototype MT system for a strategic language pair<br />
| difficulty = Medium<br />
| skills = XML, some knowledge of linguistics and of one relevant natural language <br />
| description = Create a translation pair based on two existing language modules, focusing on the dictionary and structural transfer<br />
| rationale = Choose a strategic set of languages to develop an MT system for, such that you know the target language well and morphological transducers for each language are part of Apertium. Develop an Apertium MT system by focusing on writing a bilingual dictionary and structural transfer rules. Expanding the transducers and disambiguation, and writing lexical selection rules and multiword sequences may also be part of the work. The pair may be an existing prototype, but if it's a heavily developed but unreleased pair, consider applying for "Bring an unreleased translation pair to releasable quality" instead.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Sevilay Bayatlı| Sevilay Bayatlı]], [[User:Hectoralos|Hèctor Alòs i Font]]<br />
| more = /Adopt a language pair<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Misc<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Improve elements of Apertium's web infrastructure<br />
| rationale = Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]] have numerous open issues. This project would entail choosing a subset of open issues and features that could realistically be completed in the summer. You're encouraged to speak with the Apertium community to see which features and issues are the most pressing.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Dictionary Lookup<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Finish implementing dictionary lookup mode in Apertium's web infrastructure<br />
| rationale = Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]] have numerous open issues, including half-completed features like dictionary lookup. This project would entail completing the dictionary lookup feature. Some additional features which would be good to work would include automatic reverse lookups (so that a user has a better understanding of the results), grammatical information (such as the gender of nouns or the conjugation paradigms of verbs), and information about MWEs. See [https://github.com/apertium/apertium-html-tools/issues/105 the open issue on GitHub].<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Spell checking<br />
| difficulty = Medium<br />
| skills = html, js, css, python<br />
| description = Add a spell-checking interface to Apertium's web tools<br />
| rationale = [[Apertium-html-tools]] has seen some prototypes for spell-checking interfaces (all in stale PRs and branches on GitHub), but none have ended up being quite ready to integrate into the tools. This project would entail polishing up or recreating an interface, and making sure [[APy]] has a mode that allows access to Apertium voikospell modules. The end result should be a slick, easy-to-use interface for proofing text, with intuitive underlining of text deemed to be misspelled and intuitive presentation and selection of alternatives. [https://github.com/apertium/apertium-html-tools/issues/390 the open issue on GitHub]<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Spell checker web interface<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Suggestions<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Finish implementing a suggestions interface for Apertium's web infrastructure<br />
| rationale = Some work has been done to add a "suggestions" interface to Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]], whereby users can suggest corrected translations. This project would entail finishing that feature. There are some related [https://github.com/apertium/apertium-html-tools/issues/55 issues] and [https://github.com/apertium/apertium-html-tools/pull/252 PRs] on GitHub.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Orthography conversion interface<br />
| difficulty = Medium<br />
| skills = html, js, css, python<br />
| description = Add an orthography conversion interface to Apertium's web tools<br />
| rationale = Several Apertium language modules (like Kazakh, Kyrgyz, Crimean Tatar, and Hñähñu) have orthography conversion modes in their mode definition files. This project would be to expose those modes through [[APy|Apertium APy]] and provide a simple interface in [[Apertium-html-tools]] to use them.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Extend Weighted transfer rules<br />
| difficulty = Medium<br />
| skills = C++, python<br />
| description = The weighted transfer module is already applied to the chunker transfer rules. And the idea here is to extend that module to be applied to interchunk and postchunk transfer rules too. <br />
| rationale = As a resource see https://github.com/aboelhamd/Weighted-transfer-rules-module<br />
| mentors = [[User: Sevilay Bayatlı|Sevilay Bayatlı]]<br />
| more = /Make a module <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Automatic Error-Finder / Backpropagation<br />
| difficulty = Medium<br />
| skills = python?<br />
| description = Develop a tool to locate the approximate source of translation errors in the pipeline.<br />
| rationale = Being able to generate a list of probable error sources automatically makes it possible to prioritize issues by frequency, frees up developer time, and is a first step towards automated generation of better rules.<br />
| mentors = ???<br />
| more = /Backpropagation<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Add support for NMT to web API<br />
| difficulty = Medium<br />
| skills = python, NMT<br />
| description = Add support for a popular NMT engine to Apertium's web API<br />
| rationale = Currently Apertium's web API [[APy|Apertium APy]], supports only Apertium language modules. But the front end could just as easily interface with an API that supports trained NMT models. The point of the project is to add support for one popular NMT package (e.g., translateLocally/Bergamot, OpenNMT or JoeyNMT) to the APy.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]]<br />
| more = <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Localization (l10n/i18n) of Apertium tools<br />
| difficulty = Medium<br />
| skills = C++<br />
| description = All our command line tools are currently hardcoded as English-only and it would be good if this were otherwise. [https://github.com/apertium/organisation/issues/28#issuecomment-803474833 Coding Challenge]<br />
| rationale = ...<br />
| mentors = [[User:Tino_Didriksen|Tino Didriksen]]<br />
| more = https://github.com/apertium/organisation/issues/28 Github<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Leverage and integrate language preferences into language pairs<br />
| difficulty = easy<br />
| skills = XML, some knowledge of linguistics and of one relevant natural language <br />
| description = Update language pairs with lexical and ortographical variations to leverage the new [[Dialectal_or_standard_variation|preferences]] functionality<br />
| rationale = Currently, preferences are implemented via language variant, which relies on multiple diictionaries, increasing exponentially compilation time every time a new preference gets introduced.<br />
| mentors = [[User:Xavivars|Xavi Ivars]]<br />
| more = /Use preferences in SPA-CAT<br />
}}<br />
<br />
[[Category:Development]]<br />
[[Category:Google Summer of Code]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code&diff=73877Ideas for Google Summer of Code2022-02-21T12:13:38Z<p>Tino Didriksen: Remove finished Apertium Browser Plugin (https://github.com/apertium/apertium-webext)</p>
<hr />
<div>{{TOCD}}<br />
This is the ideas page for [[Google Summer of Code]], here you can find ideas on interesting projects that would make Apertium more useful for people and improve or expand our functionality. If you have an idea please add it below, if you think you could mentor someone in a particular area, add your name to "Interested mentors" using <nowiki>~~~</nowiki> <br />
<br />
The page is intended as an overview of the kind of projects we have in mind. If one of them particularly piques your interest, please come and discuss with us on <code>#apertium</code> on <code>irc.oftc.net</code>, mail the [[Contact|mailing list]], or draw attention to yourself in some other way. <br />
<br />
Note that, if you have an idea that isn't mentioned here, we would be very interested to hear about it.<br />
<br />
Here are some more things you could look at:<br />
<br />
* [[Top tips for GSOC applications]] <br />
* Get in contact with one of our long-serving [[List of Apertium mentors|mentors]] &mdash; they are nice, honest!<br />
* Pages in the [[:Category:Development|development category]]<br />
* Resources that could be converted or expanded in the [[incubator]]. Consider doing or improving a language pair (see [[incubator]], [[nursery]] and [[staging]] for pairs that need work)<br />
* Unhammer's [[User:Unhammer/wishlist|wishlist]]<br />
<!--* The open issues [https://github.com/search?q=org%3Aapertium&state=open&type=Issues on Github] - especially the [https://github.com/search?q=org%3Aapertium+label%3A%22good+first+issue%22&state=open&type=Issues Good First Issues]. --><br />
<br />
__TOC__<br />
<br />
If you're a student trying to propose a topic, the recommended way is to request a wiki account and then go to <pre>http://wiki.apertium.org/wiki/User:[[your username]]/GSoC2021Proposal</pre> and click the "create" button near the top of the page. It's also nice to include <code><nowiki>[[Category:GSoC_2021_student_proposals]]</nowiki></code> to help organize submitted proposals.<br />
<br />
== Ideas ==<br />
<br />
{{IdeaSummary<br />
| name = Python API for Apertium<br />
| difficulty = medium<br />
| skills = C++, Python<br />
| description = Update the Python API for Apertium to expose all Apertium modes and test with all major OSes<br />
| rationale = The current Python API misses out on a lot of functionality, like phonemicisation, segmentation, and transliteration, and doesn't work for some OSes <s>like Debian</s>.<br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /Python API<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = OmniLingo and Apertium<br />
| difficulty = medium<br />
| skills = JS, Python<br />
| description = OmniLingo is a language learning system for practising listening comprehension using Apertium data. There is a lot of text processing involved (for example tokenisation) that could be aided by Apertium tools. <br />
| rationale = <br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /OmniLingo<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Web API extensions<br />
| difficulty = medium<br />
| skills = Python<br />
| description = Update the web API for Apertium to expose all Apertium modes <br />
| rationale = The current Web API misses out on a lot of functionality, like phonemicisation, segmentation, and transliteration<br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Apertium APY<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Develop a morphological analyser<br />
| difficulty = easy<br />
| skills = XML or HFST or lexd<br />
| description = Write a morphological analyser and generator for a language that does not yet have one<br />
| rationale = A key part of an Apertium machine translation system is a morphological analyser and generator. The objective of this task is to create an analyser for a language that does not yet have one.<br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Jonathan Washington]], [[User: Sevilay Bayatlı|Sevilay Bayatlı]], Hossep, nlhowell<br />
| more = /Morphological analyser<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Support for Enhanced Dependencies in UD Annotatrix<br />
| difficulty = medium<br />
| skills = NodeJS<br />
| description = UD Annotatrix is an annotation interface for Universal Dependencies, but does not yet support all functionality<br />
| rationale = <br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /Morphological analyser<br />
}}<br />
<br />
<!--<br />
This one was done, but could do with more work. Not sure if it's a full gsoc though?<br />
<br />
{{IdeaSummary<br />
| name = User-friendly lexical selection training<br />
| difficulty = Medium<br />
| skills = Python, C++, shell scripting<br />
| description = Make it so that training/inference of lexical selection rules is a more user-friendly process<br />
| rationale = Our lexical selection module allows for inferring rules from corpora and word alignments, but the procedure is currently a bit messy, with various scripts involved that require lots of manual tweaking, and many third party tools to be installed. The goal of this task is to make the procedure as user-friendly as possible, so that ideally only a simple config file would be needed, and a driver script would take care of the rest.<br />
| mentors = [[User:Unhammer|Unhammer]], [[User:Mlforcada|Mikel Forcada]]<br />
| more = /User-friendly lexical selection training<br />
}}<br />
--><br />
<br />
{{IdeaSummary<br />
| name = Robust tokenisation in lttoolbox<br />
| difficulty = Medium<br />
| skills = C++, XML, Python<br />
| description = Improve the longest-match left-to-right tokenisation strategy in [[lttoolbox]] to be fully Unicode compliant.<br />
| rationale = One of the most frustrating things about working with Apertium on texts "in the wild" is the way that the tokenisation works. If a letter is not specified in the alphabet, it is dealt with as whitespace, so e.g. you get unknown words split in two so you can end up with stuff like ^G$ö^k$ı^rmak$ which is terrible for further processing. <br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:TommiPirinen|Flammie]]<br />
| more = /Robust tokenisation<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = apertium-separable language-pair integration<br />
| difficulty = Medium<br />
| skills = XML, a scripting language (Python, Perl), some knowledge of linguistics and/or at least one relevant natural language<br />
| description = Choose a language you can identify as having a good number of "multiwords" in the lexicon. Modify all language pairs in Apertium to use the [[Apertium-separable]] module to process the multiwords, and clean up the dictionaries accordingly.<br />
| rationale = Apertium-separable is a newish module to process lexical items with discontinguous dependencies, an area where Apertium has traditionally fallen short. Despite all the module has to offer, many translation pairs still don't use it.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]]<br />
| more = /Apertium separable<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = UD and Apertium integration<br />
| difficulty = Entry level<br />
| skills = python, javascript, HTML, (C++)<br />
| description = Create a range of tools for making Apertium compatible with Universal Dependencies<br />
| rationale = Universal dependencies is a fast growing project aimed at creating a unified annotation scheme for treebanks. This includes both part-of-speech and morphological features. Their annotated corpora could be extremely useful for Apertium for training models for translation. In addition, Apertium's rule-based morphological descriptions could be useful for software that relies on Universal dependencies.<br />
| mentors = [[User:Francis Tyers]] [[User:Firespeaker| Jonathan Washington]]<br />
| more = /UD and Apertium integration <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = rule visualization tools<br />
| difficulty = Medium<br />
| skills = python? javascript? XML<br />
| description = make tools to help visualize the effect of various rules<br />
| rationale = TODO see https://github.com/Jakespringer/dapertium for an example<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Sevilay Bayatlı|Sevilay Bayatlı]]<br />
| more = /Visualization tools<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = dictionary induction from wikis<br />
| difficulty = Medium<br />
| skills = MySQL, mediawiki syntax, perl, maybe C++ or Java; Java, Scala, RDF, and DBpedia to use DBpedia extraction<br />
| description = Extract dictionaries from linguistic wikis<br />
| rationale = Wiki dictionaries and encyclopedias (e.g. omegawiki, wiktionary, wikipedia, dbpedia) contain information (e.g. bilingual equivalences, morphological features, conjugations) that could be exploited to speed up the development of dictionaries for Apertium. This task aims at automatically building dictionaries by extracting different pieces of information from wiki structures such as interlingual links, infoboxes and/or from dbpedia RDF datasets.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]]<br />
| more = /Dictionary induction from wikis<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Dictionary induction from parallel corpora / Revive ReTraTos<br />
| difficulty = Hard<br />
| skills = C++, perl, python, xml, scripting, machine learning<br />
| description = Extract dictionaries from parallel corpora<br />
| rationale = Given a pair of monolingual modules and a parallel corpus, we should be able to run a program to align tagged sentences and give us the best entries that are missing from bidix. [[ReTraTos]] (from 2008) did this back in 2008, but it's from 2008. We want a program which builds and runs in 2022, and does all the steps for the user.<br />
| mentors = [[User:Unhammer]]<br />
| more = /Dictionary induction from parallel corpora<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = unit testing framework<br />
| difficulty = Medium<br />
| skills = perl<br />
| description = adapt https://github.com/TinoDidriksen/regtest for general Apertium use. [https://github.com/TinoDidriksen/regtest/wiki Screenshots of regtest action]<br />
| rationale = We are gradually improving our quality control, with (semi-)automated tests, but these are done on the Wiki on an ad-hoc basis. Having a unified testing framework would allow us to be able to more easily track quality improvements over all language pairs, and more easily deal with regressions.<br />
| mentors = [[User:Xavivars|Xavi Ivars]]<br />
| more = /Unit testing<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Bring an unreleased translation pair to releasable quality<br />
| difficulty = Medium<br />
| skills = shell scripting<br />
| description = Take an unstable language pair and improve its quality, focusing on testvoc<br />
| rationale = Many Apertium language pairs have large dictionaries and have otherwise seen much development, but are not of releasable quality. The point of this project would be bring one translation pair to releasable quality. This would entail obtaining good naïve coverage and a clean [[testvoc]].<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Seviay Bayatlı|Sevilay Bayatlı]], [[User:Hectoralos|Hèctor Alòs i Font]]<br />
| more = /Make a language pair state-of-the-art<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Develop a prototype MT system for a strategic language pair<br />
| difficulty = Medium<br />
| skills = XML, some knowledge of linguistics and of one relevant natural language <br />
| description = Create a translation pair based on two existing language modules, focusing on the dictionary and structural transfer<br />
| rationale = Choose a strategic set of languages to develop an MT system for, such that you know the target language well and morphological transducers for each language are part of Apertium. Develop an Apertium MT system by focusing on writing a bilingual dictionary and structural transfer rules. Expanding the transducers and disambiguation, and writing lexical selection rules and multiword sequences may also be part of the work. The pair may be an existing prototype, but if it's a heavily developed but unreleased pair, consider applying for "Bring an unreleased translation pair to releasable quality" instead.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Sevilay Bayatlı| Sevilay Bayatlı]], [[User:Hectoralos|Hèctor Alòs i Font]]<br />
| more = /Adopt a language pair<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Misc<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Improve elements of Apertium's web infrastructure<br />
| rationale = Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]] have numerous open issues. This project would entail choosing a subset of open issues and features that could realistically be completed in the summer. You're encouraged to speak with the Apertium community to see which features and issues are the most pressing.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Dictionary Lookup<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Finish implementing dictionary lookup mode in Apertium's web infrastructure<br />
| rationale = Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]] have numerous open issues, including half-completed features like dictionary lookup. This project would entail completing the dictionary lookup feature. Some additional features which would be good to work would include automatic reverse lookups (so that a user has a better understanding of the results), grammatical information (such as the gender of nouns or the conjugation paradigms of verbs), and information about MWEs. See [https://github.com/apertium/apertium-html-tools/issues/105 the open issue on GitHub].<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Spell checking<br />
| difficulty = Medium<br />
| skills = html, js, css, python<br />
| description = Add a spell-checking interface to Apertium's web tools<br />
| rationale = [[Apertium-html-tools]] has seen some prototypes for spell-checking interfaces (all in stale PRs and branches on GitHub), but none have ended up being quite ready to integrate into the tools. This project would entail polishing up or recreating an interface, and making sure [[APy]] has a mode that allows access to Apertium voikospell modules. The end result should be a slick, easy-to-use interface for proofing text, with intuitive underlining of text deemed to be misspelled and intuitive presentation and selection of alternatives. [https://github.com/apertium/apertium-html-tools/issues/390 the open issue on GitHub]<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Spell checker web interface<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Suggestions<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Finish implementing a suggestions interface for Apertium's web infrastructure<br />
| rationale = Some work has been done to add a "suggestions" interface to Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]], whereby users can suggest corrected translations. This project would entail finishing that feature. There are some related [https://github.com/apertium/apertium-html-tools/issues/55 issues] and [https://github.com/apertium/apertium-html-tools/pull/252 PRs] on GitHub.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Orthography conversion interface<br />
| difficulty = Medium<br />
| skills = html, js, css, python<br />
| description = Add an orthography conversion interface to Apertium's web tools<br />
| rationale = Several Apertium language modules (like Kazakh, Kyrgyz, Crimean Tatar, and Hñähñu) have orthography conversion modes in their mode definition files. This project would be to expose those modes through [[APy|Apertium APy]] and provide a simple interface in [[Apertium-html-tools]] to use them.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Extend Weighted transfer rules<br />
| difficulty = Medium<br />
| skills = C++, python<br />
| description = The weighted transfer module is already applied to the chunker transfer rules. And the idea here is to extend that module to be applied to interchunk and postchunk transfer rules too. <br />
| rationale = As a resource see https://github.com/aboelhamd/Weighted-transfer-rules-module<br />
| mentors = [[User: Sevilay Bayatlı|Sevilay Bayatlı]]<br />
| more = /Make a module <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Automatic Error-Finder / Backpropagation<br />
| difficulty = Medium<br />
| skills = python?<br />
| description = Develop a tool to locate the approximate source of translation errors in the pipeline.<br />
| rationale = Being able to generate a list of probable error sources automatically makes it possible to prioritize issues by frequency, frees up developer time, and is a first step towards automated generation of better rules.<br />
| mentors = ???<br />
| more = /Backpropagation<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Add support for NMT to web API<br />
| difficulty = Medium<br />
| skills = python, NMT<br />
| description = Add support for a popular NMT engine to Apertium's web API<br />
| rationale = Currently Apertium's web API [[APy|Apertium APy]], supports only Apertium language modules. But the front end could just as easily interface with an API that supports trained NMT models. The point of the project is to add support for one popular NMT package (e.g., translateLocally/Bergamot, OpenNMT or JoeyNMT) to the APy.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]]<br />
| more = <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Localization (l10n/i18n) of Apertium tools<br />
| difficulty = Medium<br />
| skills = C++<br />
| description = All our command line tools are currently hardcoded as English-only and it would be good if this were otherwise. [https://github.com/apertium/organisation/issues/28#issuecomment-803474833 Coding Challenge]<br />
| rationale = ...<br />
| mentors = [[User:Tino_Didriksen|Tino Didriksen]]<br />
| more = https://github.com/apertium/organisation/issues/28 Github<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Leverage and integrate language preferences into language pairs<br />
| difficulty = easy<br />
| skills = XML, some knowledge of linguistics and of one relevant natural language <br />
| description = Update language pairs with lexical and ortographical variations to leverage the new [[Dialectal_or_standard_variation|preferences]] functionality<br />
| rationale = Currently, preferences are implemented via language variant, which relies on multiple diictionaries, increasing exponentially compilation time every time a new preference gets introduced.<br />
| mentors = [[User:Xavivars|Xavi Ivars]]<br />
| more = /Use preferences in SPA-CAT<br />
}}<br />
<br />
[[Category:Development]]<br />
[[Category:Google Summer of Code]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Prerequisites_for_RPM&diff=73846Prerequisites for RPM2022-01-12T10:03:38Z<p>Tino Didriksen: Document that PowerTools is needed</p>
<hr />
<div>This page shows how to install the standard dependencies of apertium (and related packages) on RHEL / CentOS / Fedora / OpenSUSE and operating systems based on those. For RHEL/CentOS we require some dependencies from [https://docs.fedoraproject.org/en-US/epel/ EPEL], and for CentOS 8 we also require the PowerTools repo.<br />
<br />
<br />
If you don't plan on working on the core C++ packages (but only want to work on / use language pairs), you can install all prerequisites with yum/zypper, using [[User:Tino Didriksen]]'s repository. The first line here adds this repository to yum/zypper, then we can just install the usual way:<br />
<pre><br />
# Pick one<br />
# Release, stable:<br />
curl -sS https://apertium.projectjj.com/rpm/install-release.sh | sudo bash<br />
# Or nightly, unstable:<br />
curl -sS https://apertium.projectjj.com/rpm/install-nightly.sh | sudo bash<br />
<br />
# RHEL/CentOS:<br />
sudo yum install apertium-all-devel<br />
# Fedora:<br />
sudo dnf install apertium-all-devel<br />
# OpenSUSE:<br />
sudo zypper install apertium-all-devel<br />
</pre><br />
<br />
For a list of available language pairs and other packages, see https://build.opensuse.org/project/show/home:TinoDidriksen:release and https://build.opensuse.org/project/show/home:TinoDidriksen:nightly<br />
<br />
If you want to ''work on'' a language pair, you'll have to [[Install language data by compiling|check out the language data from GitHub]] and compile it.<br />
<br />
Otherwise, e.g. if you want to work on the core C++ packages, continue to [[Install Apertium core by compiling]].<br />
<br />
[[Category:Installation]]<br />
[[Category:Documentation in English]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Prerequisites_for_RPM&diff=73845Prerequisites for RPM2022-01-12T09:54:34Z<p>Tino Didriksen: Document that EPEL is needed</p>
<hr />
<div>This page shows how to install the standard dependencies of apertium (and related packages) on RHEL / CentOS / Fedora / OpenSUSE and operating systems based on those. For RHEL/CentOS we require some dependencies from [https://docs.fedoraproject.org/en-US/epel/ EPEL].<br />
<br />
<br />
If you don't plan on working on the core C++ packages (but only want to work on / use language pairs), you can install all prerequisites with yum/zypper, using [[User:Tino Didriksen]]'s repository. The first line here adds this repository to yum/zypper, then we can just install the usual way:<br />
<pre><br />
# Pick one<br />
# Release, stable:<br />
curl -sS https://apertium.projectjj.com/rpm/install-release.sh | sudo bash<br />
# Or nightly, unstable:<br />
curl -sS https://apertium.projectjj.com/rpm/install-nightly.sh | sudo bash<br />
<br />
# RHEL/CentOS:<br />
sudo yum install apertium-all-devel<br />
# Fedora:<br />
sudo dnf install apertium-all-devel<br />
# OpenSUSE:<br />
sudo zypper install apertium-all-devel<br />
</pre><br />
<br />
For a list of available language pairs and other packages, see https://build.opensuse.org/project/show/home:TinoDidriksen:release and https://build.opensuse.org/project/show/home:TinoDidriksen:nightly<br />
<br />
If you want to ''work on'' a language pair, you'll have to [[Install language data by compiling|check out the language data from GitHub]] and compile it.<br />
<br />
Otherwise, e.g. if you want to work on the core C++ packages, continue to [[Install Apertium core by compiling]].<br />
<br />
[[Category:Installation]]<br />
[[Category:Documentation in English]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=IRC/Matrix&diff=73469IRC/Matrix2021-05-28T19:50:33Z<p>Tino Didriksen: Freenode -> OFTC</p>
<hr />
<div>If you want persistent IRC history/logs and notifications without having to have a computer online all the time, but you don't have/know how to set up a server, you can use the Matrix network to stay connected. <br />
<br />
[[Image:Riot-matrix-Step1join.png|thumb|400px|right|Before logging in]]<br />
<br />
<br />
'''To get started''', open<br />
<br />
https://riot.im/app/#/room/#oftc_#apertium:matrix.org<br />
<br />
<br />
It'll say "''Click here'' to join the discussion". If you don't have a Matrix account already, just do that, and enter a username, click Continue and fill out the captcha and you're in!<br />
<br />
(If you already have a Matrix account, instead click "Login" and enter your details.)<br />
<br />
[[Image:Riot-matrix-step2profit.png|thumb|400px|right|Logged in]]<br />
<br />
Once you're in, you should as soon as possible click the cogwheel/open https://riot.im/app/#/settings to '''set a password and e-mail''' for your Matrix account.<br />
<br />
<br />
The web client can send Desktop notifications if you use Firefox at least (see your [https://riot.im/app/#/settings settings] if it's not enabled), but there is also a regular [https://riot.im/desktop.html desktop version of Riot] for Mac, Windows and GNU/Linux.<br />
<br />
<br />
== Details ==<br />
<br />
'''Element''' (formerly riot) is a client for the Matrix network. Matrix is sort of a supercharged IRC network/protocol, which "bridges" into regular IRC networks like OFTC but also provides a host of other features.<br />
<br />
Read more about the relation between Matrix and IRC at https://opensource.com/article/17/5/introducing-riot-IRC – including how to change or register your IRC nick.<br />
<br />
See https://matrix.org/ for the "backend" bits.<br />
<br />
<br />
Note that your IRC chats will be going through the matrix.org server. For public, logged channels like #apertium, this isn't any concern, but for one-on-one conversations there will be one more server that technically could log things (although one-on-one conversations on IRC can potentially be logged by OFTC too). Matrix is free and open source, so you can set up your own Matrix server, but that seems to take away the point of this being a low-maintenance way to get persistent IRC connections (and in that case, https://weechat.org/ is much simpler to set up).<br />
<br />
OTOH, if you're chatting with other Matrix users, it actually becomes more secure, since Matrix provides end-to-end encryption between Matrix users.<br />
<br />
<br />
== Remove [m] from your IRC nick ==<br />
* Open a private chat with <code>@oftc-irc:matrix.org</code> and tell it <code>!nick irc.oftc.net NewNickGoesHere</code><br />
* See also: https://github.com/matrix-org/matrix-appservice-irc/blob/master/HOWTO.md#changing-nicks<br />
* See also: https://opensource.com/article/17/5/introducing-riot-IRC for information about how to change your IRC nick (and more details in general on using chatting through Matrix).<br />
<br />
== Join new channels ==<br />
Use this template (replace "ChannelName" with the name of the channel you want to join):<br />
<br />
https://riot.im/app/#/room/#oftc_#ChannelName:matrix.org<br />
<br />
[[Category:Users]] <br />
[[Category:Contact]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=User:Firespeaker/HFST_bug&diff=73460User:Firespeaker/HFST bug2021-05-27T06:33:12Z<p>Tino Didriksen: Freenode -> OFTC</p>
<hr />
<div>{{TOCD}}<br />
In 2011, a bug in how HFST handles words containing spaces was [http://sourceforge.net/p/hfst/bugs/59/ documented and resolved] (apparently in [http://hfst.svn.sourceforge.net/viewvc/hfst?view=revision&revision=1518 r1518]?), but it introduced a new bug. This page documents the new [incorrect!] behaviour. It appears to only affect transducers written in lexc.<br />
<br />
[https://sourceforge.net/p/hfst/bugs/153/ A bug report] was filed in January of 2013 along with a patch for a test case. In March of 2013, [[User:Francis Tyers|spectie]] posted a patch that fixed the bug but introduced an issue with newlines and full stops. As of today, the bug has still not been fixed.<br />
<br />
== text.lexc ==<br />
Make sure to include the space in '<code>% </code>' under <code>Multichar_Symbols</code>.<br />
<pre><br />
Multichar_Symbols<br />
<br />
% <br />
<br />
LEXICON Root<br />
<br />
erke:erke # ;<br />
erke% me:erke% me # ;<br />
medvedev:medvedev # ;<br />
</pre><br />
<br />
== Compiling ==<br />
# <code>$ hfst-lexc test.lexc -o test.hfst</code><br />
# <code>$ hfst-invert test.hfst | hfst-fst2fst -w -o test.hfst.ol</code><br />
<br />
== Testing ==<br />
=== Some correctly analysed forms ===<br />
* <code>$ echo "erke" | hfst-proc test.hfst.ol</code><br />
: <code>^erke/erke$</code><br />
* <code>$ echo "erke me" | hfst-proc test.hfst.ol </code><br />
: <code>^erke me/erke me$</code><br />
* <code>$ echo "medvedev" | hfst-proc test.hfst.ol</code><br />
: <code>^medvedev/medvedev$</code><br />
=== The incorrectly analysed form ===<br />
* <code>$ echo "erke medvedev" | hfst-proc test.hfst.ol</code><br />
: <code>^erke medvedev/<span style="color: red">*</span>erke medvedev$</code><br />
<br />
=== Expected output ===<br />
This form is analysed correctly by a transducer identical to the one above except with the "erke me" form removed:<br />
* <code>$ echo "erke medvedev" | hfst-proc test2.hfst.ol</code><br />
: <code>^erke/erke$ ^medvedev/medvedev$</code><br />
<br />
== Another test case ==<br />
This one is meant to be more familiar to English-speakers :)<br />
<br />
<pre><br />
Multichar_Symbols<br />
<br />
% <br />
<br />
LEXICON Root<br />
<br />
word:word #;<br />
word% form:word% form #;<br />
formation:formation #;<br />
</pre><br />
<br />
* <code>$ echo "word formation" | hfst-proc test3.hfst.ol</code><br />
: <code>^word formation/<span style="color:red;">*</span>word formation$</code><br />
* <code>$ echo "formation word" | hfst-proc test3.hfst.ol</code><br />
: <code>^formation/formation$ ^word/word$</code><br />
<br />
<br />
== Notes ==<br />
<ul><br />
<li>This doesn't seem to affect transducers written in other formats. E.g., the transducer that results from <code>apertium-eng-kaz.eng.dix</code> outputs the following:</li><br />
<ul><br />
<li><code>$ echo "right there" | apertium -d . eng-kaz-morph</code></li><br />
: <code>^right there/right there<adv>$^./.<sent>$</code><br />
<li><code>$ echo "right the" | apertium -d . eng-kaz-morph</code></li><br />
: <code>^right/right<adj>/right<adv>/right<n><sg>$ ^the/the<det><def><sp>$^./.<sent>$</code><br />
<li><code>$ echo "right therein" | apertium -d . eng-kaz-morph</code></li><br />
: <code>^right/right<adj>/right<adv>/right<n><sg>$ ^therein/*therein$^./.<sent>$</code><br />
</ul><br />
</ul><br />
<br />
== Other materials ==<br />
* [http://wiki.apertium.org/wiki/Talk:Ideas_for_Google_Summer_of_Code/Closer_integration_with_HFST spectie explains the bug to firespeaker]<br />
* irc.oftc.net#hfst</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Guide_Apertium_pour_les_utilisateurs_de_Windows&diff=73459Guide Apertium pour les utilisateurs de Windows2021-05-27T06:27:59Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div>[[Apertium guide for Windows users|In English]]<br />
<br />
Ce tutoriel va vous guider pour utiliser l'installateur d'Apertium afin d'installer :<br />
* Apertium<br />
* les paires de langues d'Apertium<br />
* TortoiseSVN (GUI SVN client pour Windows)<br />
* Notepad++ (Bon éditeur de textes open source)<br />
<br />
Avec l'installateur d'Apertium pour Windows, il n'est pas nécessaire d'installer des outils comme Cygwin, Subversion, GCC, etc... (qu'un autre guide [[Installation sur Windows en utilisant cygwin]] vous demande d'installer).<br />
<br />
== Installation d'Apertium ==<br />
<br />
1. Téléchargez l'installateur depuis [http://sourceforge.net/projects/apertium/files/apertium-win32-installer/ApertiumSetup.exe/download Sourceforge]. Démarrez-le et lisez la licence d'utilisation de l'installateur d'Apertium (beaucoup de parties sont dans la GNU/GPL), puis cliquez "I Agree" pour continuer.<br />
<br />
[[File:Apertium_guide_for_Windows_users_Apertium_1.png]]<br />
<br />
2. Sélectionnez les composants à installer (Tous sont recommandés pour les utilisateurs non expérimentés), puis cliquez "Next" pour aller à la page suivante.<br />
<br />
[[File:Apertium_guide_for_Windows_users_Apertium_2.png]]<br />
<br />
3. Choisissez l'emplacement désiré pour l'installation. Il est recommandé de n'utiliser aucun caractère spécial (ex. : espace, caractère accentué, caractère d'un alphabet non européen) pour définir cet emplacement.<br />
<br />
[[File:Apertium_guide_for_Windows_users_Apertium_3.png]]<br />
<br />
4. L'installateur de Cygwin apparaîtra. Choisissez le miroir le plus proche et continuez.<br />
<br />
[[File:Apertium_guide_for_Windows_users_Apertium_4.png]]<br />
<br />
5. Il vous sera demandé de sélectionner les paquets de Cygwin (vous pouvez les laisser tous non marqués). Cliquez "Next" pour continuer.<br />
<br />
[[File:Apertium_guide_for_Windows_users_Apertium_5.png]]<br />
<br />
6. L'installateur listera les paquets qu'il est nécessaire d'installer pour résoudre les problèmes de dépendances. Cliquez "Next" pour continuer.<br />
<br />
[[File:Apertium_guide_for_Windows_users_Apertium_6.png]]<br />
<br />
7. Il y aura une boîte d'alerte à la première installation, ignorez-la en cliquant "OK"<br />
<br />
[[File:Apertium_guide_for_Windows_users_Apertium_7.png]]<br />
<br />
8. L'installateur de Cygwin vous demandera de créer une icône sur le Bureau et un raccourci dans le menu Démarrer. L'installateur d'Apertium va aussi créer le raccourci du menu de démarrage, il n'est donc pas nécessaire de les sélectionner. <br />
<br />
[[File:Apertium_guide_for_Windows_users_Apertium_8.png]]<br />
<br />
9. Une console apparaîtra pour recompiler Apertium et lttoolbox. L'opération prendra de 30 à 60 minutes (il est bon de faire d'autres choses en attendant).<br />
<br />
=== Test rapide pour vérifier le bon fonctionnement ===<br />
<br />
Le moteur d'Apertium devrait à présent être installé (mais aucune paire de langues à ce stade). <br />
<br />
Après l'installation, la fenêtre noire du terminal "Cygwin" devrait être visible (vous pouvez aussi trouver un lien pour le faire depuis votre menu de démarrage sous "Apertium").<br />
<br />
Si vous tapez <code>apertium</code> puis enfoncez la touche entrée, vous devriez voir<br />
<br />
<code>USAGE: apertium [-d datadir] [-f format] [-u] <direction> [in [out]]</code> et d'autre texte du même genre. Si ce n'est pas le cas, quelque chose s'est probablement mal passé...<br />
<br />
== Installation des paire de langues d'Apertium ==<br />
<br />
1. Téléchargez l'installateur depuis [http://sourceforge.net/projects/apertium/files/apertium-win32-installer/ApertiumLanguageSetup.exe/download Sourceforge]. Démarrez le et lisez la licence d'utilisation de l'installateur d'Apertium (beaucoup de parties sont dans la GNU/GPL) ensuite cliquez "I Agree" pour continuer.<br />
<br />
[[File:Apertium_Language_Pairs_Installer_1.png]]<br />
<br />
2. Sélectionnez les paires de langues que vous aimeriez installer et cliquez "Next" pour continuer. '''L'installation complète prend quelques heures et n'est pas recommandée pour un utilisateur normal.'''<br />
<br />
[[File:Apertium_Language_Pairs_Installer_2.png]]<br />
<br />
3. Choisissez l'emplacement pour l'installation et cliquez "Next". (Ça devrait être l'endroit où Apertium a été installé).<br />
<br />
[[File:Apertium_Language_Pairs_Installer_3.png]]<br />
<br />
4. Cliquez Finish pour finir le processus d'installation.<br />
<br />
[[File:Apertium_Language_Pairs_Installer_4.png]]<br />
<br />
=== Test rapide pour voir si ça fonctionne ===<br />
<br />
Après l'installation, la fenêtre noire du terminal "Cygwin" devrait être visible (vous pouvez aussi trouver un lien pour le faire depuis votre menu de démarrage sous "Apertium").<br />
<br />
Si vous tapez <code>apertium -l</code> suivi de la touche entrée, vous devriez voir une liste de codes des paires de langues installées. Si on trouve dans la liste <code>es-ca</code>, vous pouvez essayer de taper<br />
<br />
<code>echo hola mundo | apertium es-ca</code><br />
<br />
Vous devriez voir <code>hola món</code> ; fonctionnement similaire avec d'autres paires de langues.<br />
<br />
=== Traduire un fichier ===<br />
<br />
Exemple: traduire un fichier HTML de l'asturian vers l'espagnol :<br />
<pre><br />
apertium -f html ast-es $(cygpath c:\\mystuff\\asturian_input.html) $(cygpath c:\\mystuff\\spanish_output.html)<br />
</pre><br />
<br />
(Le cygpath est malheureusement nécessaire pour récupérer un chemin d'accès Windows depuis Cygwin. Un utilisateur de Windows expérimenté devrait probablement pouvoir écrire un script pour en faire une option du menu droit (clic droit de la souris).)<br />
<br />
Veuillez noter que le backslash <code>\>/code> doit être "échappé", en le tapant 2 fois !<br />
<br />
<u>Note du traducteur :</u> On peut également utiliser la syntaxe UNIX pour les chemins d'accès en replaçant chaque <code>\</code> du chemin d'accès Windows par un <code>/</code> . Je ne sais pas si ça résout le problème de cygpath.<br />
<br />
=== Développer d'avantage une paire de langues ===<br />
<br />
L'installateur met les données des paires de langues dans le répertoire Cygwin <code>/home/apertium-install/languages</code>, qui est en réalité implanté dans le répertoire Windows <code>C:\Apertium\home\apertium-install\languages</code>. À l'intérieur, on peut trouver des répertoires pour toutes les paires de langues installées, et vous pouvez modifier les données linguistiques et recompiler etc... comme décrit dans [[Créer une nouvelle paire de langues]]<br />
<br />
: ou était-ce <code>/home/apertium-installation/...</code> ?<br />
<br />
== Installation de TortoiseSVN (optionnel) ==<br />
<br />
''Voir aussi : [[Utiliser SVN avec TortoiseSVN]]<br />
<br />
1. Téléchargez TortoiseSVN from [http://tortoisesvn.net/downloads la page officielle de téléchargement]. Lancez le et cliquez "Next" pour sauter l'étape d'introduction.<br />
<br />
[[File:Apertium_guide_for_Windows_users_TortoiseSVN_1.png]]<br />
<br />
2. Lisez la licence d'utilisation et sélectionnez "I accept the terms in the License Agreement", cliquez "Next" pour continuer.<br />
<br />
[[File:Apertium_guide_for_Windows_users_TortoiseSVN_2.png]]<br />
<br />
3. Sélectionnez les composants à installer, vous pouvez laisser tel quel et cliquer "Next" pour continuer.<br />
<br />
[[File:Apertium_guide_for_Windows_users_TortoiseSVN_3.png]]<br />
<br />
4. L'installateur montrera la page de sin, cliquez "Next" et il vous demandera de redémarrer. '''You need to restart your machine to start using TortoiseSVN.'''<br />
<br />
[[File:Apertium_guide_for_Windows_users_TortoiseSVN_4.png]]<br />
<br />
5. Ensuite TortoiseSVN sera un sous menu de celui qui contient les fichiers et répertoires. Pour tester SVN cliquez sur "SVN checkout..."<br />
<br />
[[File:Apertium_guide_for_Windows_users_TortoiseSVN_5.png]]<br />
<br />
6. Vous pouvez maintenant taper l'URL d'un dépot dans la boite de diologue Checkout et cliquer OK pour démarrer le test de SVN.<br />
<br />
[[File:Apertium_guide_for_Windows_users_TortoiseSVN_6.png]]<br />
<br />
== Installation de Notepad++ (facultatif) ==<br />
<br />
1. Téléchargez Notepad++ depuis [http://notepad-plus-plus.org/download le site officiel]. Lancez-le et cliquez "Next" pour continuer.<br />
<br />
[[File:Apertium_guide_for_Windows_users_NotepadPlus_1.png]]<br />
<br />
2. Lisez la licence d'utilisation et cliquez "I agree" pour continuer.<br />
<br />
[[File:Apertium_guide_for_Windows_users_NotepadPlus_2.png]]<br />
<br />
3. Choisissez l'emplacement pour l'installation (valeur par défaut recommandée) et cliquez "Next" pour continuer.<br />
<br />
[[File:Apertium_guide_for_Windows_users_NotepadPlus_3.png]]<br />
<br />
4. Sélectionnez les composants à installer (contentez-vous des choix par défaut) et cliquez "Next" pour continuer.<br />
<br />
[[File:Apertium_guide_for_Windows_users_NotepadPlus_4.png]]<br />
<br />
5. L'installateur commencera à faire son travail. Ensuite, vous pourrez choisir de démarrer Notepad++ pour la première fois (en cliquant sur la case à cocher), puis cliquez sur "Finish".<br />
<br />
[[File:Apertium_guide_for_Windows_users_NotepadPlus_5.png]]<br />
<br />
6. Vous pouvez éditer n'importe quel fichier avec Notepad++ par un clic droit puis un clic sur l'élément du menu nommé "Edit with Notepad++".<br />
<br />
[[File:Apertium_guide_for_Windows_users_NotepadPlus_6.png]]<br />
<br />
== Comment valider les changements à SVN en utilisant TortoiseSVN ==<br />
<br />
Vous avez besoin d'un accès en mise à) jour avant de suivre cette partie du tutoriel. Vous pouvez demander un accès en mise à jour pour ça des manières suivantes :<br />
* IRC: #apertium sur irc.oftc.net<br />
* Mailing List: apertium-stuff AT lists.sourceforge POINT net <br />
<br />
1. Cliquez droit sur le répertoire pour lequel vous voudriez faire des changements et cliquez "SVN Commit..." (Les paires de langues seront normalement implantées dans C:\Apertium\home\apertium-install\languages)<br />
<br />
[[File:TortoiseSVN_Commit_1.png]]<br />
<br />
2. Tapez un message descrivant vos changements and selectionnez le fichier que vous voulez mettre à jour. Ensuite cliquez "OK" pour démarrer le processus.<br />
<br />
[[File:TortoiseSVN_Commit_2.png]]<br />
<br />
3. Il y aura une boite de dialogue demandant votre nom d'utilisateur et mot de passe. Donnez ces informations et cliquez sur "OK"<br />
<br />
[[File:TortoiseSVN_Commit_3.png]]<br />
<br />
4. Vous verrez uun état d'avancement de la mise à jour. Attendez et cliquez "OK" une fois que le processus sera fini.<br />
<br />
[[File:TortoiseSVN_Commit_4.png]]<br />
<br />
<br />
[[Category:Installation]]<br />
[[Category:Documentation en français]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=D-Bus_service_for_Apertium&diff=73458D-Bus service for Apertium2021-05-27T06:27:55Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div>{{TOCD}}<br />
[http://dbus.freedesktop.org/ D-Bus] is a simple inter-process communication system. We are in the process of developing D-Bus services for Apertium which will make programmatic access to the Apertium tools easier.<br />
We have started developing simple D-Bus bindings for Apertium which allow for:<br />
* discovery of details of the current Apertium installation and,<br />
* translations via a programmatic interface.<br />
<br />
The D-Bus bindings are needed for some of the tools, such as [[Apertium-view]] and [[Apertium-tolk]].<br />
<br />
==Prerequisites==<br />
<br />
* apertium (>= 3.0.0)<br />
* python <br />
* dbus<br />
* python-dbus<br />
<br />
==Installing==<br />
<br />
'''Note:''' After Apertium's migration to GitHub, this tool is '''read-only''' on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see [[Migrating tools to GitHub]].<br />
<br />
The package is available from [[SVN]] in the <code>apertium-dbus</code> module. The process for installation is the standard:<br />
<br />
<pre><br />
$ svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium-dbus<br />
</pre><br />
<br />
<pre><br />
$ ./autogen.sh<br />
$ make <br />
$ make install<br />
</pre><br />
<br />
The current package is in the process of being ported to Python3, but should work. Do "svn up -r25849" if you want to get the last working Python2 version instead.<br />
<br />
===Check that it works===<br />
You can check that the bindings work by issuing the command:<br />
:<code>dbus-send --print-reply --dest=org.apertium.info / org.apertium.Info.modes</code><br />
This should return an array of strings of all the Apertium modes installed on your system. It should look something like<br />
<pre><br />
$ dbus-send --print-reply --dest=org.apertium.info / org.apertium.Info.modes <br />
method return sender=:1.567 -> dest=:1.599 reply_serial=2<br />
array [<br />
string "en-ca"<br />
string "ca-en"<br />
string "en-af"<br />
string "af-en"<br />
]<br />
</pre><br />
<br />
To translate from the command line (assuming apertium-en-ca is installed), try e.g.<br />
<pre><br />
$ dbus-send --print-reply --dest=org.apertium.mode / org.apertium.Translate.translate string:en-ca dict:string:string:"mark_unknown","true" string:'My hoovercraft is full of eels'<br />
</pre><br />
<br />
If the above two commannds don't work, then it's quite possible that a Python error from our side snuck in. Try running info.py directly; that is<br />
:<code>python info.py -p /usr/local</code><br />
where the prefix for Apertium in the above example is <code>/usr/local</code>. If you get a Python error, please post the error on this page or post a bug report in [http://bugs.apertium.org/cgi-bin/bugzilla/index.cgi our bug tracker]. If the service starts up without errors, try executing<br />
:<code>dbus-send --print-reply --dest=org.apertium.info / org.apertium.Info.modes</code><br />
again. If there is no output, then open a new terminal and run<br />
:<code>dbus-monitor</code>.<br />
This neat utility shows you the activity on the D-Bus. Now try executing <code>dbus-send --print-reply --dest=org.apertium.info / org.apertium.Info.modes</code> again.<br />
<br />
If you have no luck, come and talk to us in <code>#apertium</code> at <code>irc.oftc.net</code><br />
<br />
==Installing into a prefix==<br />
:''This is unfinished''<br />
<br />
*Do: mkdir -p <prefix>/share/dbus-1/services/<br />
<br />
*Make a session-local.conf file like it says [http://www.linuxfromscratch.org/blfs/view/svn/general/dbus.html here] under "" Configuration Information"".<br />
**Change the <servicedir> element value to <prefix>/share/dbus-1/services/<br />
* Restart dbus. /etc/init.d/dbus restart<br />
* Log out and log back in again<br />
<br />
==Interfaces==<br />
Currently, Apertium offers two D-Bus services:<br />
* <code>org.apertium.info</code> has a single object <code>/</code>, which offers rudimentary information about the Apertium installation.<br />
* <code>org.apertium.translate</code> contains an object for each Apertium mode installed in the system. <br />
<br />
==Issues==<br />
If you make a change to any of the D-Bus configuration files, you will need to restart both the system-wide and session D-Bus daemons. The system-wide daemon can be restarted on a Debian/Ubuntu system with:<br />
*<code>/etc/init.d/dbus restart</code><br />
The only real way to restart your session deamon is to logout and log back in again. You will likely run into strange problems if you attempt to kill the session D-Bus daemon.<br />
<br />
==Filesystem layout==<br />
<br />
* <code>/usr/share/dbus-1/services/</code> &mdash; DBUS <code>.service</code> files.<br />
* <code>/usr/share/apertium/dbus-1/</code> &mdash; Python code that actually does the service (<code>info.py</code> and <code>mode.py</code>)<br />
<br />
==Examples==<br />
<br />
There are some simple examples in various languages on the page [[D-Bus examples]].<br />
<br />
==External links==<br />
<br />
* [http://www.pygtk.org/articles/applets_arturogf/x207.html Autotoolising python]<br />
<br />
[[Category:Development]]<br />
[[Category:Services]]<br />
[[Category:Documentation in English]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Radnik&diff=73457Radnik2021-05-27T06:27:38Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div>{{Github-migration-check}}<br />
'''Radnik''' (from Serbo-Croatian for "worker") is a build/compile/svn bot. It sits on <code>#apertium</code>, <code>irc.oftc.net</code>. <br />
<br />
It is linked with the online test site at [http://xixona.dlsi.ua.es/testing/ xixona]. If you do an svn commit, and don't want to wait 12 hours for the next update, hop onto irc and use radnik to do a rebuild.<br />
<br />
For example:<br />
<pre><br />
<spectie> radnik, rebuild apertium-en-af<br />
-radnik- Building apertium-en-af<br />
-radnik- Your build probably completely successfully.<br />
</pre><br />
<br />
<pre><br />
<spectie> radnik, clean apertium-fr-nl<br />
-radnik- Cleaning apertium-fr-nl<br />
-radnik- Module cleaned<br />
</pre><br />
<br />
<pre><br />
<youssef> radnik: info apertium-fr-nl<br />
-radnik- SVN Information for apertium-fr-nl<br />
-radnik- Ruta: .<br />
-radnik- URL: http://apertium.svn.sourceforge.net/svnroot/apertium/apertium-fr-nl<br />
-radnik- UUID en el repositorio: 72bbbca6-d526-0410-a7d9-f06f51895060<br />
-radnik- Revisión: 613<br />
-radnik- Tipo de nodo: directorio<br />
-radnik- Agendado: normal<br />
-radnik- Autor del último cambio: youssefsan<br />
-radnik- Revisión del último cambio: 613<br />
-radnik- Fecha de último cambio: 2007-06-10 13:24:43 +0200 (dom, 10 jun 2007)<br />
</pre><br />
<br />
Radnik also rebuilds on each CIA commit message. Unfortunately CIA can be quite slow, so you may need to wait 10 minutes or so between the commit and the build. Going on irc offers instant gratification.<br />
<br />
The bot will only talk to a set of pre-defined users, if you wish to be added, please contact [[User:Francis Tyers|fran]] (irc: <code>spectre</code>, or <code>spectie</code>).<br />
<br />
==Limitations==<br />
<br />
It would be ideal if it could get a message straight from the svn post commit script, rather than having it go SF svn post-commit -> Email -> CIA -> IRC -> radnik. Unfortunately sourceforge only allows a certain number of pre-defined scripts in their SVN post-commit, so this is a little hack.<br />
<br />
[[Category:Bots]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Questions_fr%C3%A9quentes&diff=73456Questions fréquentes2021-05-27T06:27:34Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div>{{Otherlang|Frequently Asked Questions|In English}}<br />
<br />
== Pourquoi utilisez-vous XML et pas une base de données ? ==<br />
<br />
XML n'est il pas un format vraiment peu efficace pour stocker des dictionnaires, tous ces espaces et balises, ils sont compliqués à lire, ne serait-il pas mieux d'avoir toute l'information dans une base de données, comme Postgres ou MySQL ? Ou même dans des fichiers texte ordinaires ?<br />
<br />
;Réponse<br />
<br />
* Chaque élément de données est explicitement étiqueté avec une balise descriptive nommée avec une signification claire associée<br />
* La structure des documents peut être facilement validée au moyen de DTDs ou de schémas<br />
* Plusieurs technologies existent pour XML (conversion depuis et vers XML, interopérabilité).<br />
* XML est assez facile à traiter avec des outils de traitement de texte comme sed, cut et awk.<br />
<br />
Vous pouvez lire plus de détails tant pratiques que théoriques sur notre format pour mémoriser les dictionnaires ici : ''[[Dictionnaire morphologique]]''.<br />
<br />
== Est ce que Apertium supporte les verbes séparables ? ==<br />
<br />
Plusieurs langues, par exemple la plupart des langues Germaniques (à l'exception de l'anglais) et le Hongrois possèdent un phénomène appelé "verbes séparables", également appelé "prépositions attachées" ou par d'autres noms. C'est lorsque l'infinitif du verbe possède une partie qui est détachée et déplacée lorsque le verbe est conjugué. Par exemple en afrikaans, le verbe "annoncer" est "aankondig". La partie ''aan'' est séparée lorsque le verbe est conjugué, donc par exemple :<br />
verbs<br />
:Les astronomes '''annoncent''' [la découverte].<br />
:Sterrekundiges '''kondig''' [die ontdekking] '''aan'''.<br />
<br />
Toutefois, au passé on aurait :<br />
<br />
:Les astronomes ont '''annoncé''' [la découverte]. <br />
:Sterrekundiges het [die ontdekking] '''aangekondig'''.<br />
<br />
Tout seul, "kondig" ne signifie rien. <br />
<br />
;Réponse<br />
<br />
Essentiellement non, pour l'instant on ne supporte pas les verbes séparables. Le problème pour Apertium se produit quand la partie non séparée ne signifie rien, il est pour l'instant impossible d'analyser un mot en deux parties quand elles sont séparées par quelque-chose d'aussi nébuleux qu'un groupe nominal (NP). Il y a un certain nombre de hacks qui peuvent être essayés pour contourner cette déficience, mais aucune ne fonctionne proprement. Si vous aimeriez plus d'information là dessus, ou avez des idées sur la manière de traiter ça ou de s'en accommoder, veuillez voir notre page ''[[Separable verbs (lien direct ?)]]''.<br />
<br />
== Comment puis-je contribuer à ce projet ? ==<br />
<br />
Indépendamment du type de contribution que vous voulez faire, les deux choses pour commencer sont de s'inscrire à la mailing list [https://lists.sourceforge.net/lists/listinfo/apertium-stuff apertium-stuff], qui est l'endroit où vont la plupart des discussions. Également, venez flâner sur le canal IRC <code>#apertium</code> sur <code>irc.oftc.net</code>.<br />
<br />
Pour décider sur quoi vous voulez contribuer, regardez à ''[[Développement (français)]]'' et ''[[Projects (à traduire ?)]]'' pour quelques idées qu'on a eu en programmation, pour l'extension du moteur de traduction, et regardez dans ''[[Classement des paires de langues selon leur état d'avancement|les différentes branches]]'' ou dans la [[Liste des paires de langues]] si vous êtes intéressé par les problèmes linguistiques. Si vous pouvez trouver quelque-chose qui vous intéresse ou pique votre curiosité, envoyez juste un email ou demandez à quelqu'un sur l'irc et on sera heureux de vous aider.<br />
<br />
[[Category:Documentation]]<br />
[[Category:Documentation en français]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code&diff=73455Ideas for Google Summer of Code2021-05-27T06:27:25Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div>{{TOCD}}<br />
This is the ideas page for [[Google Summer of Code]], here you can find ideas on interesting projects that would make Apertium more useful for people and improve or expand our functionality. If you have an idea please add it below, if you think you could mentor someone in a particular area, add your name to "Interested mentors" using <nowiki>~~~</nowiki> <br />
<br />
The page is intended as an overview of the kind of projects we have in mind. If one of them particularly piques your interest, please come and discuss with us on <code>#apertium</code> on <code>irc.oftc.net</code>, mail the [[Contact|mailing list]], or draw attention to yourself in some other way. <br />
<br />
Note that, if you have an idea that isn't mentioned here, we would be very interested to hear about it.<br />
<br />
Here are some more things you could look at:<br />
<br />
* [[Top tips for GSOC applications]] <br />
* Get in contact with one of our long-serving [[List of Apertium mentors|mentors]] &mdash; they are nice, honest!<br />
* Pages in the [[:Category:Development|development category]]<br />
* Resources that could be converted or expanded in the [[incubator]]. Consider doing or improving a language pair (see [[incubator]], [[nursery]] and [[staging]] for pairs that need work)<br />
* Unhammer's [[User:Unhammer/wishlist|wishlist]]<br />
* The open issues [https://github.com/search?q=org%3Aapertium&state=open&type=Issues on Github] - especially the [https://github.com/search?q=org%3Aapertium+label%3A%22good+first+issue%22&state=open&type=Issues Good First Issues].<br />
<br />
__TOC__<br />
<br />
If you're a student trying to propose a topic, the recommended way is to request a wiki account and then go to <pre>http://wiki.apertium.org/wiki/User:[[your username]]/GSoC2021Proposal</pre> and click the "create" button near the top of the page. It's also nice to include <code><nowiki>[[Category:GSoC_2021_student_proposals]]</nowiki></code> to help organize submitted proposals.<br />
<br />
== Ideas ==<br />
<br />
{{IdeaSummary<br />
| name = Python API for Apertium<br />
| difficulty = medium<br />
| skills = C++, Python<br />
| description = Update the Python API for Apertium to expose all Apertium modes and test with all major OSes<br />
| rationale = The current Python API misses out on a lot of functionality, like phonemicisation, segmentation, and transliteration, and doesn't work for some OSes <s>like Debian</s>.<br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /Python API<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = OmniLingo and Apertium<br />
| difficulty = medium<br />
| skills = JS, Python<br />
| description = OmniLingo is a language learning system for practising listening comprehension using Apertium data. There is a lot of text processing involved (for example tokenisation) that could be aided by Apertium tools. <br />
| rationale = <br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /OmniLingo<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Web API extensions<br />
| difficulty = medium<br />
| skills = Python<br />
| description = Update the web API for Apertium to expose all Apertium modes <br />
| rationale = The current Web API misses out on a lot of functionality, like phonemicisation, segmentation, and transliteration<br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Apertium APY<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Develop a morphological analyser<br />
| difficulty = easy<br />
| skills = XML or HFST or lexd<br />
| description = Write a morphological analyser and generator for a language that does not yet have one<br />
| rationale = A key part of an Apertium machine translation system is a morphological analyser and generator. The objective of this task is to create an analyser for a language that does not yet have one.<br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Jonathan Washington]], [[User: Sevilay Bayatlı|Sevilay Bayatlı]], Hossep, nlhowell<br />
| more = /Morphological analyser<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Support for Enhanced Dependencies in UD Annotatrix<br />
| difficulty = medium<br />
| skills = NodeJS<br />
| description = UD Annotatrix is an annotation interface for Universal Dependencies, but does not yet support all functionality<br />
| rationale = <br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /Morphological analyser<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = User-friendly lexical selection training<br />
| difficulty = Medium<br />
| skills = Python, C++, shell scripting<br />
| description = Make it so that training/inference of lexical selection rules is a more user-friendly process<br />
| rationale = Our lexical selection module allows for inferring rules from corpora and word alignments, but the procedure is currently a bit messy, with various scripts involved that require lots of manual tweaking, and many third party tools to be installed. The goal of this task is to make the procedure as user-friendly as possible, so that ideally only a simple config file would be needed, and a driver script would take care of the rest.<br />
| mentors = [[User:Unhammer|Unhammer]], [[User:Mlforcada|Mikel Forcada]]<br />
| more = /User-friendly lexical selection training<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Robust tokenisation in lttoolbox<br />
| difficulty = Medium<br />
| skills = C++, XML, Python<br />
| description = Improve the longest-match left-to-right tokenisation strategy in [[lttoolbox]] to be fully Unicode compliant.<br />
| rationale = One of the most frustrating things about working with Apertium on texts "in the wild" is the way that the tokenisation works. If a letter is not specified in the alphabet, it is dealt with as whitespace, so e.g. you get unknown words split in two so you can end up with stuff like ^G$ö^k$ı^rmak$ which is terrible for further processing. <br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:TommiPirinen|Flammie]]<br />
| more = /Robust tokenisation<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = apertium-separable language-pair integration<br />
| difficulty = Medium<br />
| skills = XML, a scripting language (Python, Perl), some knowledge of linguistics and/or at least one relevant natural language<br />
| description = Choose a language you can identify as having a good number of "multiwords" in the lexicon. Modify all language pairs in Apertium to use the [[Apertium-separable]] module to process the multiwords, and clean up the dictionaries accordingly.<br />
| rationale = Apertium-separable is a newly developed module to process lexical items with discontinguous dependencies, an area where Apertium has traditionally fallen short. Despite all the module has to offer, it has only been put to use in small test cases, and hasn't been integrated into any translation pair's development cycle.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]]<br />
| more = /Apertium separable<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = UD and Apertium integration<br />
| difficulty = Entry level<br />
| skills = python, javascript, HTML, (C++)<br />
| description = Create a range of tools for making Apertium compatible with Universal Dependencies<br />
| rationale = Universal dependencies is a fast growing project aimed at creating a unified annotation scheme for treebanks. This includes both part-of-speech and morphological features. Their annotated corpora could be extremely useful for Apertium for training models for translation. In addition, Apertium's rule-based morphological descriptions could be useful for software that relies on Universal dependencies.<br />
| mentors = [[User:Francis Tyers]] [[User:Firespeaker| Jonathan Washington]]<br />
| more = /UD and Apertium integration <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = rule visualization tools<br />
| difficulty = Medium<br />
| skills = python? javascript? XML<br />
| description = make tools to help visualize the effect of various rules<br />
| rationale = TODO see https://github.com/Jakespringer/dapertium for an example<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Sevilay Bayatlı|Sevilay Bayatlı]]<br />
| more = /Visualization tools<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = dictionary induction from wikis<br />
| difficulty = Medium<br />
| skills = MySQL, mediawiki syntax, perl, maybe C++ or Java; Java, Scala, RDF, and DBpedia to use DBpedia extraction<br />
| description = Extract dictionaries from linguistic wikis<br />
| rationale = Wiki dictionaries and encyclopedias (e.g. omegawiki, wiktionary, wikipedia, dbpedia) contain information (e.g. bilingual equivalences, morphological features, conjugations) that could be exploited to speed up the development of dictionaries for Apertium. This task aims at automatically building dictionaries by extracting different pieces of information from wiki structures such as interlingual links, infoboxes and/or from dbpedia RDF datasets.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]]<br />
| more = /Dictionary induction from wikis<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = unit testing framework<br />
| difficulty = Medium<br />
| skills = perl<br />
| description = adapt https://github.com/TinoDidriksen/regtest for general Apertium use. [https://github.com/TinoDidriksen/regtest/wiki Screenshots of regtest action]<br />
| rationale = We are gradually improving our quality control, with (semi-)automated tests, but these are done on the Wiki on an ad-hoc basis. Having a unified testing framework would allow us to be able to more easily track quality improvements over all language pairs, and more easily deal with regressions.<br />
| mentors = [[User:Xavivars|Xavi Ivars]]<br />
| more = /Unit testing<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Bring an unreleased translation pair to releasable quality<br />
| difficulty = Medium<br />
| skills = shell scripting<br />
| description = Take an unstable language pair and improve its quality, focusing on testvoc<br />
| rationale = Many Apertium language pairs have large dictionaries and have otherwise seen much development, but are not of releasable quality. The point of this project would be bring one translation pair to releasable quality. This would entail obtaining good naïve coverage and a clean [[testvoc]].<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Seviay Bayatlı|Sevilay Bayatlı]], [[User:Hectoralos|Hèctor Alòs i Font]]<br />
| more = /Make a language pair state-of-the-art<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Develop a prototype MT system for a strategic language pair<br />
| difficulty = Medium<br />
| skills = XML, some knowledge of linguistics and of one relevant natural language <br />
| description = Create a translation pair based on two existing language modules, focusing on the dictionary and structural transfer<br />
| rationale = Choose a strategic set of languages to develop an MT system for, such that you know the target language well and morphological transducers for each language are part of Apertium. Develop an Apertium MT system by focusing on writing a bilingual dictionary and structural transfer rules. Expanding the transducers and disambiguation, and writing lexical selection rules and multiword sequences may also be part of the work. The pair may be an existing prototype, but if it's a heavily developed but unreleased pair, consider applying for "Bring an unreleased translation pair to releasable quality" instead.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Sevilay Bayatlı| Sevilay Bayatlı]], [[User:Hectoralos|Hèctor Alòs i Font]]<br />
| more = /Adopt a language pair<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Misc<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Improve elements of Apertium's web infrastructure<br />
| rationale = Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]] have numerous open issues. This project would entail choosing a subset of open issues and features that could realistically be completed in the summer. You're encouraged to speak with the Apertium community to see which features and issues are the most pressing.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Dictionary Lookup<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Finish implementing dictionary lookup mode in Apertium's web infrastructure<br />
| rationale = Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]] have numerous open issues, including half-completed features like dictionary lookup. This project would entail completing the dictionary lookup feature. Some additional features which would be good to work would include automatic reverse lookups (so that a user has a better understanding of the results), grammatical information (such as the gender of nouns or the conjugation paradigms of verbs), and information about MWEs. See [https://github.com/apertium/apertium-html-tools/issues/105 the open issue on GitHub].<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Spell checking<br />
| difficulty = Medium<br />
| skills = html, js, css, python<br />
| description = Add a spell-checking interface to Apertium's web tools<br />
| rationale = [[Apertium-html-tools]] has seen some prototypes for spell-checking interfaces (all in stale PRs and branches on GitHub), but none have ended up being quite ready to integrate into the tools. This project would entail polishing up or recreating an interface, and making sure [[APy]] has a mode that allows access to Apertium voikospell modules. The end result should be a slick, easy-to-use interface for proofing text, with intuitive underlining of text deemed to be misspelled and intuitive presentation and selection of alternatives. [https://github.com/apertium/apertium-html-tools/issues/390 the open issue on GitHub]<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Spell checker web interface<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Suggestions<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Finish implementing a suggestions interface for Apertium's web infrastructure<br />
| rationale = Some work has been done to add a "suggestions" interface to Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]], whereby users can suggest corrected translations. This project would entail finishing that feature. There are some related [https://github.com/apertium/apertium-html-tools/issues/55 issues] and [https://github.com/apertium/apertium-html-tools/pull/252 PRs] on GitHub.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Orthography conversion interface<br />
| difficulty = Medium<br />
| skills = html, js, css, python<br />
| description = Add an orthography conversion interface to Apertium's web tools<br />
| rationale = Several Apertium language modules (like Kazakh, Kyrgyz, Crimean Tatar, and Hñähñu) have orthography conversion modes in their mode definition files. This project would be to expose those modes through [[APy|Apertium APy]] and provide a simple interface in [[Apertium-html-tools]] to use them.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Apertium Browser Plugin<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Expand functionality of Geriaoueg vocabulary assistant<br />
| rationale = [[Geriaoueg]] is a vocabulary assistant with Firefox/Chrom[e/ium] plugins. These plugins interface with Apertium's web API, [[APy|Apertium APy]], and allow a user to look up (in Apertium's dictionaries) word forms from a web page they're viewing. A Firefox/Chrom[e/ium] plugin should also be able to provide in-browser website translation. This project is to clean up the dictionary lookup functionality and add translation support to the plugins. Some APy features may need to be tweaked, but most of the work in this project will be solely in the plugins.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]], [[User:Tino_Didriksen|Tino Didriksen]]<br />
| more = /Geriaoueg browser plugin<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Extend Weighted transfer rules<br />
| difficulty = Medium<br />
| skills = C++, python<br />
| description = The weighted transfer module is already applied to the chunker transfer rules. And the idea here is to extend that module to be applied to interchunk and postchunk transfer rules too. <br />
| rationale = As a resource see https://github.com/aboelhamd/Weighted-transfer-rules-module<br />
| mentors = [[User: Sevilay Bayatlı|Sevilay Bayatlı]]<br />
| more = /Make a module <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Automatic Error-Finder / Backpropagation<br />
| difficulty = Medium<br />
| skills = python?<br />
| description = Develop a tool to locate the approximate source of translation errors in the pipeline.<br />
| rationale = Being able to generate a list of probable error sources automatically makes it possible to prioritize issues by frequency, frees up developer time, and is a first step towards automated generation of better rules.<br />
| mentors = ???<br />
| more = /Backpropagation<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Add support for NMT to web API<br />
| difficulty = Medium<br />
| skills = python, NMT<br />
| description = Add support for a popular NMT engine to Apertium's web API<br />
| rationale = Currently Apertium's web API [[APy|Apertium APy]], supports only Apertium language modules. But the front end could just as easily interface with an API that supports trained NMT models. The point of the project is to add support for one popular NMT package (e.g., OpenNMT or JoeyNMT) to the APy.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]]<br />
| more = <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Localization (l10n/i18n) of Apertium tools<br />
| difficulty = Medium<br />
| skills = C++<br />
| description = All our command line tools are currently hardcoded as English-only and it would be good if this were otherwise. [https://github.com/apertium/organisation/issues/28#issuecomment-803474833 Coding Challenge]<br />
| rationale = ...<br />
| mentors = [[User:Tino_Didriksen|Tino Didriksen]]<br />
| more = https://github.com/apertium/organisation/issues/28 Github<br />
}}<br />
<br />
[[Category:Development]]<br />
[[Category:Google Summer of Code]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Frequently_Asked_Questions&diff=73454Frequently Asked Questions2021-05-27T06:27:16Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div>{{Otherlang|Questions fréquentes|{{French}}}}<br />
<br />
There are many ways to contribute to Apertium, from sending us lists of words or phrases you find that are incorrectly translated, to getting involved in creating a new language pair or programming on tools or user interfaces. Here are some question frequently asked by users.<br />
<br />
== The Blurb ==<br />
Apertium is a ''[https://en.wikipedia.org/wiki/Rule-based_machine_translation rule-based machine translation]'' toolchain and ecosystem, with many of our tools based on [https://en.wikipedia.org/wiki/Finite-state_transducer finite-state transducers].<br />
<br />
Our language agnostic tools are native and written in [https://en.wikipedia.org/wiki/C++ C++]. The various development helpers are mostly in [https://python.org/ Python].<br />
<br />
Our language data is in various formats, including XML and other human-editable texts. Language data is split into single-language packages that can analyse and generate a given language, and translation pairs that perform transfer and transformation between two languages. The single-language packages are shared amongst many pairs.<br />
<br />
If you wish to contribute to the language agnostic native tools you'll need to know C++.<br />
<br />
If you wish to contribute language data to Apertium, your contributions should fit in our [[Apertium_system_architecture|existing pipeline]]. That is, it should be rule-based and deterministic. We will happily [[Contact|help you learn]] our formats and methods, and we know from experience it is possible to learn and use Apertium in short time.<br />
<br />
We do not currently include any [https://en.wikipedia.org/wiki/Statistical_machine_translation statistical] or [https://en.wikipedia.org/wiki/Neural_machine_translation neural] machine translation tools or methods. We are often asked if contributions can be made with statistical or neural systems, but for now they cannot.<br />
<br />
For more information about how to contribute, see [[Contributing]].<br />
<br />
==How do I start off?==<br />
<br />
Regardless of the kind of contribution you want to do, the two things to start with are to subscribe to the mailing list [https://lists.sourceforge.net/lists/listinfo/apertium-stuff apertium-stuff], which is where most of the discussion goes on. Also, come and idle on the [[IRC|IRC channel]] <code>#apertium</code> on <code>irc.oftc.net</code>.<br />
<br />
To decide what you want to contribute, take a look at ''[[Development]]'' and ''[[Projects]]'' for some ideas we've had around programming, extending the engine, and have a look at the ''[[Incubator]]'' if you're interested in linguistic issues. If you can't find anything that interests you or piques your interest, just send an email or ask someone on IRC and they'll be happy to help.<br />
<br />
==How do I add or fix words?==<br />
If you have some words that are unknown in a certain language pair, you can help out by simply writing list of words and their translations, e.g.<br />
<pre><br />
house; noun; casa; noun f<br />
dog; noun; perro; noun m<br />
</pre><br />
<br />
into a file, and sending that to the [[mailing list]]. Most likely you want to send to the one called "apertium-stuff"; [https://lists.sourceforge.net/lists/listinfo/apertium-stuff subscribe here], then attach the file and send it to apertium-stuff@lists.sourceforge.net.<br />
<br />
You can also send a spreadsheet file—if you prefer that.<br />
<br />
==How can I contribute my knowledge?==<br />
The [[Indirect contribution guide]] has some tips on how to contribute your knowledge of a language to create resources that we use in Apertium, such as <br />
* Writing contrastive analyses<br />
* Cataloguing resources<br />
* Hand-translating text<br />
* Converting dictionaries<br />
* Contributing to related projects<br />
<br />
==How do I get more involved?==<br />
The first thing you should do if you want to get more involved is to introduce yourself on the [[mailing list]] and hang out on our [[IRC]] channel. There is also a [[list of Apertium mentors]].<br />
<br />
Next, you should [[Installation|install apertium, lttoolbox and some language pair]] to play around with. <br />
<br />
If you want to create or contribute to a language pair, go through the [[New language pair HOWTO]]. This is required reading for anyone who wants to get involved with developing Apertium language pairs. Also, take a look at [[Contributing to an existing pair]], meant for those who want to contribute to existing language pairs. You can improve the quality of the translation for an existing pair by correcting errors in the dictionaries. You will find some hints on the page [[Finding_errors_in_dictionaries]].<br />
<br />
Next up, the [https://www.abumatran.eu/wp-content/uploads/2014/12/abumatran-apertium-workshop-data-guide.pdf Apertium EU Workshop site] is a comprehensive guide to rule based machine translation with Apertium (originally made for a four-day course on Apertium for people with little background in machine translation); print this out and read it on the bus/train/boat<br />
<br />
If you're a student, [[Google Summer of Code]] or [https://codein.withgoogle.com/ Google Code-In] for high-school students) is a good way to get involved with Apertium, and the ideas page there has lots of project tips if you're more interested in programming than linguistics/language pairs. If you are on the task of requesting a wiki account and adopting a page, contact a mentor to request an account to gain access to edit the wiki.<br />
<br />
==Why are you using XML and not a database?==<br />
XML is not a really inefficient format to store dictionaries. With all these spaces and tags, they are complicated to read. Would it not be better to have all the information in a database, like Postgres or MySQL? Or even in ordinary text files?<br />
<br />
* Each data item is explicitly tagged with a descriptive tag named with a clear meaning associated with it<br />
* Document structure can be easily validated using DTDs or schemas<br />
* Several technologies exist for XML (conversion to and from XML, interoperability).<br />
* XML is quite easy to process with word processing tools like sed, cut and awk.<br />
* You can read more practical and theoretical details about our format to memorize the dictionaries here: ''[[Morphological dictionary]]''.<br />
<br />
==Does Apertium support separable verbs?==<br />
Several languages, for example, most of the Germanic languages (with the exception of English) and the Hungarian have a phenomenon called "separable verbs", also called "attached prepositions" or by other names. This is when the verb's infinitive has a part that is detached and displaced when the verb is conjugated. For example in Afrikaans, the verb "to announce" is "aankondig". The part "aan" is separated when the verb is conjugated, so for example:<br />
<br />
* Sterrekundiges '''kondig''' [die ontdekking] '''aan'''.<br />
* Astronomers '''announce''' [the discovery]. <br />
<br />
The stem "kondig" does not by itself mean anything, only in conjunction with the particle "aan;" however, this is not always the case. The past participle is formed by inserting "ge" in between the particle and the stem, for example:<br />
<br />
* Sterrekundiges '''het''' [die ontdekking] '''aangekondig'''.<br />
* Astronomers '''have announced''' [the discovery].<br />
<br />
The answer is yes, we do have a module created for exactly this purpose: [[Apertium separable|Apertium-separable]]. However, this module has not yet been incorporated into many of our existing pairs.<br />
<br />
[[Category:Documentation in English]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=MediaWiki:Sitenotice&diff=73453MediaWiki:Sitenotice2021-05-27T06:27:15Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div><!--<br />
<div style="background-color: #fffbdd; border-radius: 2px; border: 1px rgba(27,31,35,0.15) solid; color: #735c0f; padding: 5px; margin-top: 1em; text-align:center;"><b>Apertium has moved from SourceForge to [https://github.com/apertium/ GitHub].</b><br />If you have any questions, please come and [[IRC|talk to us]] on <code>#apertium</code> on <code>irc.oftc.net</code> or contact the [[GitHub migration team]].</div><br />
--></div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Contact_(fran%C3%A7ais)&diff=73452Contact (français)2021-05-27T06:26:54Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div>[[Contact|In English]]<br />
<br />
{{Main page header fr}}<br />
<br />
== Rester en contact ==<br />
<br />
* Listes de discussion :<br />
** [https://lists.sourceforge.net/lists/listinfo/apertium-stuff apertium-stuff] — liste générale tous sujets. (multilingue) — ([https://sourceforge.net/mailarchive/forum.php?forum_name=apertium-stuff la liste des archives]) ([http://news.gmane.org/gmane.comp.nlp.apertium gmane])<br />
** [http://groups.google.com/group/apertium_eo?hl=eo apertium_eo] — liste de diffusion pour l'[[Esperanto|espéranto]] (surtout en espéranto, mais les autres langues sont bienvenues)<br />
** [https://lists.sourceforge.net/lists/listinfo/apertium-turkic apertium-turkic] — liste de diffusion pour les traducteurs automatiques des langues turciques (multilingue) — ([https://sourceforge.net/mailarchive/forum.php?forum_name=apertium-turkic la liste des archives]) ([http://blog.gmane.org/gmane.science.linguistics.turkic.mt gmane])<br />
** [https://lists.sourceforge.net/lists/listinfo/apertium-uralic apertium-uralic] &mdash; liste de diffusion pour les traducteurs automatiques des langues ouraliennes (multilingue) &mdash; ([https://sourceforge.net/mailarchive/forum.php?forum_name=apertium-uralic list archives]) ([http://blog.gmane.org/gmane.science.linguistics.uralic.mt gmane])<br />
** [https://lists.sourceforge.net/lists/listinfo/apertium-celtic apertium-celtic] — liste de diffusion pour les traducteurs automatiques des langues celtiques (multilingue) — ([https://sourceforge.net/mailarchive/forum.php?forum_name=apertium-celtic la liste des archives]) ([http://blog.gmane.org/gmane.science.linguistics.celtic.nlp gmane])<br />
** [https://lists.sourceforge.net/lists/listinfo/apertium-persian persian-nlp] — liste de diffusion pour le persan (multilingue) — ([https://sourceforge.net/mailarchive/forum.php?forum_name=apertium-persian la liste des archives])<br />
** [http://groups.google.com/group/apertium/about Апертиум Россия] &mdash; Группа по созданию систем машинного перевода на Apertium для языков России (multilingue) &mdash;<br />
* [[IRC]]: <code>irc.oftc.net</code> <code>#apertium</code> (multilingue)<br />
** Vous pouvez utiliser notre [http://xixona.dlsi.ua.es/cgi-bin/cgiirc/irc.cgi client CGI::IRC] ou, si vous utilisez Mozilla, Firefox, Opera ou Seamonkey, vous pouvez essayer de taper [irc://irc.oftc.net/#apertium irc://irc.oftc.net/#apertium] (cela démarrera le client [https://addons.mozilla.org/en-US/firefox/addon/16/ Chatzilla] ou le client spécifique à Opera).<br />
** Il y a des robots collecteurs d'aide sur irc: [[radnik]] et [[eleda]].<br />
** Utilisez le pastebin de http://apertium.codepad.org/ pour partager du code et pour les copies de gros morceaux.<br />
* Si vous trouvez un bug, faites un rapport dans [[Bugzilla]] !<br />
<br />
== Voir aussi ==<br />
<br />
* [[Language and pair maintainer]] : liste des paires de langues qu'une personne déclare maintenir, avec les sens de traduction et les responsables des paires concernées.<br />
<br />
[[Category:Contact]]<br />
[[Category:Documentation en français]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Eleda&diff=73451Eleda2021-05-27T06:26:26Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div>{{Github-unmigrated-tool}}<br />
'''Eleda''' (from a spirit that reflects one of the manifestations of God in Yoruba) is a translation bot on <code>#apertium</code>, <code>irc.oftc.net</code>. The bot allows you to "follow" what a user is saying, translating it as it goes. <br />
<br />
== Usage ==<br />
<br />
<pre><br />
-eleda- Apertium translation bot<br />
-eleda- .follow <nick> <direction> - Translate a user.<br />
-eleda- .unfollow <nick> - Stop translating a user.<br />
-eleda- .following - List users currently being followed.<br />
-eleda- .listpairs - List available pairs.<br />
</pre><br />
<br />
==Following a user==<br />
<br />
<pre><br />
<spectie> @follow moyogo fr-ca<br />
-eleda- following moyogo, direction fr-ca<br />
<moyogo> J'écris une phrase en français.<br />
-eleda- Escric una frase en francès.<br />
<moyogo> et une autre<br />
-eleda- i una altra<br />
</pre><br />
<br />
<pre><br />
<spectie> @unfollow moyogo<br />
-eleda- unfollowing moyogo<br />
</pre><br />
<br />
==Installation==<br />
<br />
Download from http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/apertium-tools/bots/<br />
<br />
[[Category:Bots]]<br />
[[Category:Tools]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Talk:Documentation&diff=73450Talk:Documentation2021-05-27T06:26:22Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div><br />
==New 'indexes' section==<br />
The new 'indexes' section ought to be auto-generated, for easy maintenance. But the current state of article categorisation means the display would be a visual disaster. For now, hand-edited. <br />
===============================================================================<br />
Official documentation (214 pages) has an error on page 12:<br />
2.2. Data Stream without format.<br />
Into that text a picture is projected.<br />
[[User:Muki987|Muki987]] 11:07, 9 April 2009 (UTC)<br />
<br />
<br />
==Standard format for README files==<br />
A proposal for a standard format for README files in Apertium. This is for a language pair between Language1 and Language2, using the ISO codes code1 and code2. The note about regression tests would be removed if the language pair has none of course.<br />
<br />
<pre><br />
Language1 and Language2<br />
<br />
apertium-code1-code2<br />
===============================================================================<br />
<br />
This is an Apertium language pair for translating between Language1 and <br />
Language2. What you can use this language package for:<br />
<br />
* Translating between Language1 and Language2<br />
* Morphological analysis of Language1 and Language2<br />
* Part-of-speech tagging of Language1 and Language2<br />
<br />
For information on the latter two points, see subheading "For more <br />
information" below<br />
<br />
Requirements<br />
===============================================================================<br />
<br />
You will need the following software installed:<br />
<br />
* lttoolbox (>= 3.3.0)<br />
* apertium (>= 3.3.0)<br />
* vislcg3 (>= 0.9.9.10297)<br />
<br />
If this does not make any sense, we recommend you look at: www.apertium.org<br />
<br />
Compiling<br />
===============================================================================<br />
<br />
Given the requirements being installed, you should be able to just run:<br />
<br />
$ ./configure <br />
$ make<br />
# make install<br />
<br />
You can use ./autogen.sh instead of ./configure you're compiling from<br />
SVN. If you installed any prerequisite language packages using a --prefix<br />
to ./configure, make sure to give the same --prefix to ./configure here.<br />
<br />
Testing<br />
===============================================================================<br />
<br />
If you are in the source directory after running make, the following<br />
commands should work:<br />
<br />
$ echo "My hovercraft is full of eels" | apertium -d . code1-code2<br />
My skeertuig is vol palings<br />
<br />
$ echo "My skeertuig is vol palings" | apertium -d . code2-code1<br />
My hovercraft is full of eels<br />
<br />
After installing somewhere in $PATH, you should be able to do e.g.<br />
<br />
$ echo "My hovercraft is full of eels" | apertium code1-code2<br />
My skeertuig is vol palings<br />
<br />
The following command runs tests which are on the Apertium wiki page:<br />
<br />
$ ./regression-tests.sh <br />
<br />
Files and data<br />
===============================================================================<br />
<br />
* apertium-code1-code2.code1.dix - Monolingual dictionary for Language1<br />
* apertium-code1-code2.code1-code2.dix - Bilingual dictionary <br />
* apertium-code1-code2.code2.dix - Monolingual dictionary for Language2<br />
* code1-code2.prob - Tagger model for Language1<br />
* code2-code1.prob - Tagger model for Language2<br />
* apertium-code1-code2.code1.tsx - Tagger training rules for Language1<br />
* apertium-code1-code2.code2.tsx - Tagger training rules for Language2<br />
* modes.xml - Translation modes<br />
<br />
For more information<br />
===============================================================================<br />
<br />
* http://wiki.apertium.org/wiki/Installation<br />
* http://wiki.apertium.org/wiki/apertium-code1-code2<br />
* http://wiki.apertium.org/wiki/Using_an_lttoolbox_dictionary<br />
<br />
Help and support<br />
===============================================================================<br />
<br />
If you need help using this language pair or data, you can contact:<br />
<br />
* Mailing list: apertium-stuff@lists.sourceforge.net<br />
* IRC: #apertium on irc.oftc.net<br />
<br />
See also the file AUTHORS included in this distribution.<br />
</pre><br />
<br />
<br />
Here's a template for monolingual packages:<br />
<br />
<pre><br />
Language1<br />
<br />
apertium-code1<br />
===============================================================================<br />
<br />
This is an Apertium monolingual language package for Language1. What<br />
you can use this language package for:<br />
<br />
* Morphological analysis of Language1<br />
* Morphological generation of Language1<br />
* Part-of-speech tagging of Language1<br />
<br />
Requirements<br />
===============================================================================<br />
<br />
You will need the following software installed:<br />
<br />
* lttoolbox (>= 3.3.0)<br />
* apertium (>= 3.3.0)<br />
* vislcg3 (>= 0.9.9.10297)<br />
<br />
If this does not make any sense, we recommend you look at: www.apertium.org<br />
<br />
Compiling<br />
===============================================================================<br />
<br />
Given the requirements being installed, you should be able to just run:<br />
<br />
$ ./configure<br />
$ make<br />
<br />
You can use ./autogen.sh instead of ./configure you're compiling from<br />
SVN.<br />
<br />
If you're doing development, you don't have to install the data, you<br />
can use it directly from this directory.<br />
<br />
If you are installing this language package as a prerequisite for an<br />
Apertium translation pair, then do (typically as root / with sudo):<br />
<br />
# make install<br />
<br />
You can give a --prefix to ./configure to install as a non-root user,<br />
but make sure to use the same prefix when installing the translation<br />
pair and any other language packages.<br />
<br />
Testing<br />
===============================================================================<br />
<br />
If you are in the source directory after running make, the following<br />
commands should work:<br />
<br />
$ echo "My skeertuig is vol palings" | apertium -d . code1-morph<br />
^My/My<det><pos><sp>/Prpers<prn><obj><p1><mf><sg>$ ^skeertuig/skeertuig<n><sg>$ ^is/wees<vbser><pres>$ ^vol/vol<adj><pred>$ ^palings/paling<n><pl>$<br />
<br />
$ echo "My skeertuig is vol palings" | apertium -d . code2-tagger<br />
^My/My<det><pos><sp>$ ^skeertuig/skeertuig<n><sg>$ ^is/wees<vbser><pres>$ ^vol/vol<adj><pred>$ ^palings/paling<n><pl>$<br />
<br />
Files and data<br />
===============================================================================<br />
<br />
* apertium-code1.code1.dix - Monolingual dictionary<br />
* code1.prob - Tagger model<br />
* apertium-code1.code1.rlx - Constraint Grammar disambiguation rules<br />
* modes.xml - Translation modes<br />
<br />
For more information<br />
===============================================================================<br />
<br />
* http://wiki.apertium.org/wiki/Installation<br />
* http://wiki.apertium.org/wiki/apertium-code1<br />
* http://wiki.apertium.org/wiki/Using_an_lttoolbox_dictionary<br />
<br />
Help and support<br />
===============================================================================<br />
<br />
If you need help using this language pair or data, you can contact:<br />
<br />
* Mailing list: apertium-stuff@lists.sourceforge.net<br />
* IRC: #apertium on irc.oftc.net<br />
<br />
See also the file AUTHORS included in this distribution.<br />
</pre></div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=English_and_Kazakh&diff=73449English and Kazakh2021-05-27T06:26:16Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div>{{TOCD}}<br />
= Starting work on Apertium English to Kazakh =<br />
<br />
These notes are basically for Anel, Aizhan and Assem who have started to develop this language pair... And Aida too...<br />
<br />
== Installing what is needed ==<br />
<br />
=== Operating System ===<br />
<br />
Install a suitable GNU/Linux system such as Debian, Ubuntu, Mint...<br />
<br />
=== Install vislcg3, hfst, apertium, lttoolbox essentials, etc. ===<br />
<br />
Open a terminal window and type<br />
<pre><br />
curl -sS https://apertium.projectjj.com/apt/install-nightly.sh | sudo bash<br />
<br />
sudo apt-get -f install locales build-essential \<br />
automake subversion pkg-config \<br />
gawk libtool apertium-all-dev<br />
</pre><br />
''enter your password when asked, and Wait till the packages are downloaded and installed.''<br />
<br />
If you don't already have a directory for sources, make one in your home directory and enter it:<br />
<br />
<pre><br />
cd ~<br />
mkdir Source<br />
cd Source<br />
</pre><br />
<br />
=== Download apertium, lttoolbox and eng-kaz data from SVN ===<br />
{{main|Minimal installation from SVN}}<br />
<pre><br />
cd ~/Source<br />
git clone https://github.com/apertium/apertium-tools.git<br />
git clone https://github.com/apertium/apertium-kaz.git<br />
git clone https://github.com/apertium/apertium-eng-kaz.git<br />
</pre><br />
<br />
=== Install Kazakh language ===<br />
<pre><br />
cd ..<br />
cd apertium-kaz<br />
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh<br />
make<br />
</pre><br />
<br />
=== Install English--Kazakh language pair data from staging ===<br />
<pre><br />
cd ..<br />
cd apertium-eng-kaz/<br />
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh --with-lang2=$HOME/Source/apertium-kaz<br />
make<br />
</pre><br />
<br />
==Troubleshooting==<br />
<br />
If you get:<br />
<br />
<pre><br />
lt-comp: error while loading shared libraries: liblttoolbox3-3.2.so.0: cannot open shared object file: No such file or directory<br />
</pre><br />
<br />
Then you should do:<br />
<br />
<pre><br />
sudo ldconfig<br />
</pre><br />
<br />
== Browse SVN ==<br />
<br />
Here you can look at changes that have been made:<br />
<br />
http://sourceforge.net/p/apertium/svn/HEAD/tree/staging/apertium-eng-kaz/<br />
<br />
== Contact ==<br />
<br />
===IRC===<br />
<br />
Open up XChat (normally "Programs -> Internet -> XChat IRC") and type:<br />
<br />
<pre><br />
/server irc.oftc.net<br />
/join #apertium<br />
/join #hfst</pre><br />
<br />
To install xchat:<br />
<br />
<pre><br />
sudo apt-get install xchat <br />
</pre><br />
<br />
In Windows:<br />
<br />
http://www.silverex.org/download/<br />
<br />
Chat logs/archives: http://alpha.visl.sdu.dk/~tino/pisg/freenode/logs/<br />
<br />
===Mailing list=== <br />
<br />
Email: apertium-turkic@lists.sourceforge.net<br />
<br />
http://blog.gmane.org/gmane.science.linguistics.turkic.mt<br />
<br />
[[Category:English and Kazakh|*]]<br />
<br />
===November 2013 to-do list===<br />
<br />
* Check the constraint-grammar file for strange rules and also rules that may not be correct. Try to understand the rules we have.<br />
* Write .t1x and .t2x code to deal with case (capitalization)<br />
* Make sure all new rules have the correct superblank management<br />
** That is, where a rule reorders items, all superblanks in the reordered area should go before the reordered area<br />
* The copula, t2x rules: past copula with PP ("I was from Kazakhstan"), negative copula, adverbial adjective phrases in copula "The man is very large"<br />
**added rule for past copula with PP<br />
**rule with negative copula for PP in past and present<br />
**solved by adding rule "preadv + adj" as AdjP<br />
* NPs with adverbs, particularly "very" ("Three very beautiful children")<br />
** solved by adding rule "preadv + adj" and "preadv"<br />
* 1-word chunks to always provide a translation for any English word <br />
** solved partly by adding pronouns: before,behind,below,in,on,after,towards,through, under as adverbs, and rules in constraint grammar.<br />
** even for stranded prepositions, so that we have a translation for "in" similar to "inside" etc., as if they were adverbs<br />
* noun-noun compounds in NPs and PPs: add the most frequent ones to t1x (hard to do in t2x as prepositions are solved in t1x)<br />
**Solved by adding rules to t1x:<br />
***noun1 noun2<br />
***adjec noun1 noun2<br />
***det num noun1 noun2<br />
***num noun1 noun2<br />
***prep num noun1 noun2<br />
***num adjec noun1 noun2<br />
***det num adjec noun1 noun2<br />
***prep num adjec noun1 noun2<br />
***prep det num adjec noun1 noun2<br />
* interrogative sentences: Yes/no (-ba) and informative ("Where is my Kazakh dictionary"?) → probably work for t2x<br />
**special questions were done by adding rules:<br />
***Where/When + NP or PP, but not for WHAT<br />
***Simple questions Do,Did + NP or PP<br />
***how write rule for "Are/Were?", rule for "Noun is/was" has the same pattern<br />
** 'which' as a determiner → precedes noun ("which house" → "қайсы үй") or genitive construct ("үйдің қайсысы")<br />
**"which" as adv-itg, "which house do you like?"<br />
* relatives (simple relatives: "the book that I wrote", "the book which I wrote" → "Мен жазған кітап"; adverbial relatives "when he came" → "Ол келгенде" [uses locative!])<br />
**added rules for simple relatives(that,which)<br />
**added rule for "when" and "which"<br />
* check some changes made to punctuation regular expressions in the English dictionary to solve mismatches with the Kazakh dictionary<br />
* "-ing" is hard (check the appropriate section in [[Tagging_guidelines_for_English]]. This gives problems in "I like playing football" vs. "I like flying birds" and will be hard. Transitivity could be a clue? What to do in t1x and what in t2x? Also "Flying planes can be dangerous", famous ambiguity). Try to get as much as possible done with CG rules.<br />
* Negative pronous ("yesh" forms)<br />
** write lexical selection to generate "yesh-" forms from "any-" forms in negatives or "bir" forms in questions, e.g. "do" "not" vblex.inf "anything" (dictionaries should be populated with alternatives)<br />
***lexical selection for "anything" as "yeshnarse" for negative sentences and as "bir narse" for affirmative sentences.<br />
* Choosing auxiliaries for present continuous ("be" → "bol" (default), "zhatyr", "otyr" ,etc.)<br />
**can't be solved by lexical selection<br />
* deciding t1x versus t2x:<br />
** NPs and PPs in t1x as long as possible (hard design choice, tedious work, code repetition, but...)<br />
* Some adjective phrases like "num "years old"".<br />
**added as NP phrase "num years old" - "* жаста"<br />
* Comparative constructs (more ADJ than NP → NP-dat karaganda ADJ-comp)<br />
**is done, by 4 rules in t2x<br />
**added comparative and superlative adj, in the biggest city as PP<br />
* Adverbial phrases: think on how to treat them similarly to PP in .t2x ("very quickly" is not that different from "in the park" when it comes to t2x reordering)<br />
* (Partly done) Pseudo-modals "finish" "start" "love" "hate" "enjoy", "like" which take -ing and sometimes "to".... Three possibilities to deal with them: (1) a long def-cat, (2) def-list and tests in rules, and (3) Jim's <exception> (dangerous!). Route: change new-gen-simple-verb macro with a deflist, to generate VP_psmod, translate "-ing" into NPs (as they take case,etc.) and write .t2x rules. Careful: -ing desambiguation not too goodd<br />
* Collect parallel kaz-eng corpora!<br />
<br />
==== What to take care of when writing rules ====<br />
<br />
What are these systems going to be used for and how does this affect design?<br />
<br />
It is quite unlikely that a system like this will ever be used for postediting the output into publishable text. The output is quite unlikely to be useful, particularly for sentences longer than a few words, as it would be very difficult to get the right word order.<br />
<br />
It might be used, however, for:<br />
<br />
* (1) interactive MT as in http://www.dlsi.ua.es/bbcat/?slang=eng&tlang=kaz ?<br />
* (2) fuzzy-match repair (when a translator using a computer-aided translation system gets a very good fuzzy match from a translaton memory, MT output can be intelligently used to find which parts of the target side need to be changed an actually change them (a thesis at the Universitat d'Alacant). This is because short segments may get very good translations.<br />
* (3) assimilation or gisting (understanding what a text is about); the evaluation of this may be tricky but some Apertiumers have had interesting ideas: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/4867<br />
<br />
Indeed, evaluation may be tricky in general.<br />
<br />
Development should take these applications into account:<br />
* (1) and (2): getting good translations for short segments (2, 3, 4 words) can be very helpful here<br />
* (3): the idea here would be not to pay attention to features that do not impair understanding (e.g, English articles can be deleted; "of Kazakh constituents order acceptable may be", etc.). good translations for short phrases (linguistically motivated segments) could be the key here<br />
<br />
==== Questions we had, open issues ====<br />
* morphology of reflexives ("Öz") → Mikel has to talk to Jonathan and Ilnar to make it work as in apertium-kir or as bir-bir reciprocals. Make morphology describe the real morphotactics of these forms.<br />
* why do we have gender in Kazakh morphologies when gender is not represented? Make morphology describe the real morphotactics of these forms??? (Fran gave reasons for not doing so, check apertium-turkic)<br />
<br />
=== November 2013 work done ===<br />
<br />
Aida will complete and document this list<br />
<br />
*Regression test is completed with new sentences 426/426<br />
<br />
* Structural transfer<br />
** Reported speech sentences<br />
** Conditionals(First and Second)<br />
** "be" + adjective in present and past : "You are/were beautiful"<br />
** "be" + PP: "I am from Kazakhstan"<br />
** Rule for demonstrative pronouns<br />
** Rule for negative pronouns("nothing,nobody, anything(only for negative sentences)") changing verb to negative<br />
** Def-list of pseudo-mod verbs(I LIKE/ENJOY/LOVE/HATE/START/FINISH playing) int .t1x and choosing them in gen-simple-verb macro<br />
** Rule for "-ing" words as NP<subst> in .t1x,for example, I like '''playing'''.<br />
** "ing" + NP in .t2x<br />
** Rule for "would"(.t1x) + NP(.t2x)<br />
<br />
* Lexical selection<br />
** One rule for "residence"<br />
** Rules in CG<br />
<br />
*Dictionary work<br />
** Put some country and city names into apertium-eng-kaz.eng-kaz.dix as NP-TOP<br />
** Added missing pronouns to bilingual dictionary<br />
** Corrected verbs which iv to tv(tv to iv) <br />
** Changed "would" <inf> to <past> in eng.dix<br />
<br />
== Old stuff scheduled for removal ==<br />
<br />
Some of this information is outdated and needs work, but make sure that everything is there before removing this part.<br />
<br />
=== Postpositions ===<br />
<br />
Apparently Kazakh has 5 kinds of postpositions, according to the case of the NP they follow. Some of those following genitive may be interpreted as "nouns" with a case, such as <br />
<br />
бақшаның астында <br />
<br />
garden-of bottom-in<br />
<br />
garden.gen bottom.loc<br />
<br />
"under the garden"<br />
<br />
where астын is roughly the noun "bottom", much as in Basque "ortu-a-ren azpi-an" "azpi" is a noun.<br />
<br />
==== With nominative (or base form) ====<br />
<br />
Check this list:<br />
<br />
* арқылы through<br />
* туралы about<br />
* секілді similarly to<br />
* жөнінде about<br />
<br />
==== With genitive ====<br />
<br />
* астынан from below<br />
* астында above (top-its-in)<br />
* жанынан from beside (side-its-from)<br />
* жанында beside (side-its-in)<br />
<br />
==== With dative ====<br />
<br />
* қарай (towards)<br />
* арналған (intended for)<br />
<br />
==== With ablative ====<br />
<br />
* кейін behind, after<br />
<br />
==== With instrumental ====<br />
<br />
* қатар beside<br />
* бірге together with<br />
<br />
= Starting work on Apertium Kazakh to English =<br />
<br />
== General ideas ==<br />
<br />
Try to translate as literally as possible in the first prototypes (do not have too many .t2x rules)<br />
<br />
Make the most of existing CG-based PoS tagging (wait for instructions on how to use the apertium-kaz.kaz.rlx in apertium-kaz)<br />
<br />
=== Detecting NPs and PPs ===<br />
<br />
There is a lot of stuff in apertium-eng-kaz.kaz-eng.t1x already! We have to study, and check the following.<br />
<br />
Main kinds of NPs:<br />
<br />
* accusative and nominative → no preposition<br />
* genitive → two solutions: N's N or N of N (attention genitive chains)<br />
* dative → what should one do? (tricky)<br />
* locative, ablative (make list) → PPs<br />
** what to do with possessives (particularly 1st and 2nd person) to avoid double possessives in sentences with "mening", etc.<br />
***Менің бақшам → my garden of me (!)<br />
<br />
Composition of NPs: n, adj n, num n, num adj n, ...<br />
<br />
Things to take care of:<br />
<br />
* Decide if noun-based postpositions: artynda, ustinde, keyin, etc. will be detected in t1x. A list of lemmas would be necessary, or changes to bilingual dictionaries<br />
* Make sure we generate plurals for numbers<br />
* articles (use third-person possessives as a hint to generate definite articles)<br />
** kalaning baqsha''sy'' → ''the'' garden of city <br />
<br />
=== Detecting VPs ===<br />
<br />
* simple verbs: decide on reasonable equivalents<br />
** some may be hard to decide, such as generating future simple in English different from present<br />
** present or past perfect to generate. Only present perfect<br />
* compound forms based on zhatyr, otyr, etc.<br />
* generating negatives (have negative VPs detected separately or use logic (choose) inside t1x rules<br />
* gender in third-person pronouns (including 'öz' reflexives)<br />
<br />
=== Loose list of problems ===<br />
<br />
* constructions based on infinite verbs (participles, etc.) (the problem of generating tense)<br />
* reinserting the verb to be when the copula is missing<br />
<br />
=== November 2013 work done ===<br />
<br />
Aida will complete this part<br />
<br />
* Regression test added<br />
* Transfer<br />
** Continuous tenses, simple tenses, negatives<br />
** Subject pronouns (gender still an open issue)<br />
** Nouns and adjectives<br />
*** Deleted n.attr from adjective definition<br />
<br />
*Dictionaties<br />
** changed "жатқан жоқ" in kaz.lexc and eng-kaz.dix to vaux-negative to catch negative present continuous<br />
** added some words<br />
= Aida Sundetova's GSoC 2014: Adopting an unreleased English-Kazakh language pair =<br />
<br />
== Workplan ==<br />
<br />
First work plan I prepared for proposal: http://wiki.apertium.org/wiki/User:Aida/Application<br />
<br />
Before coding was started:<br />
<br />
* Total stems in apertium-eng-kaz.eng-kaz.dix: 3660<br />
<br />
* Chunk rules: 118<br />
* Interchunk rules: 99<br />
* Postchunk-and-cleanup rules: 6.<br />
* CG rules:202<br />
<br />
<br />
== New plan ==<br />
<br />
By new plan, we focused on adding vocabulary from 4 corpora.<br />
Please see: http://wiki.apertium.org/wiki/English_and_Kazakh/Work_plan_(GSOC_2014)<br />
<br />
== Results ==<br />
=== Vocabulary ===<br />
Coverage of corpora now:<br />
<br />
SETimes:92,32%<br />
<br />
EuroParl: 96,18%<br />
<br />
NewsCommentary:93,99%<br />
<br />
Total stems in apertium-eng-kaz.eng-kaz.dix: 11071<br />
<br />
=== Transfer rules ===<br />
<br />
Needed transfer rules were written by translating texts, which were taken for coding challenge and midterm evaluation, also <br />
some cleaning and single-word rules were written while cleaning testvoc.<br />
<br />
*Written rules:<br />
** For single-word: vbmod, subs-ing, be-vblex, num-year, det-which<br />
** Constructions for adv/adjec + verb: adjec to inf-verb, adv-itg to inf-verb, have + adv + been + verb-pp, <br />
** For years, "after 1920", etc. translating as "1920 жылДАН кейін": prep num-years.<br />
** Rules for "unknown" words, if word are not in dix,rules can not match. So for some phrases like "the hargle house" - "*hargle үй":det unknown noun, unknown - for single unknown word, will translate as NP, unknown noun2, prep det unknown adjec noun, prep det unknown noun, sup-adjec unknown nom.<br />
** Interchunk rules<br />
** Cleaning rules for pronouns, adjectives.<br />
<br />
=== Testvoc ===<br />
<br />
Tue Aug 5 22:02:59 ALMT 2014<br />
<br />
<br />
<br />
{|class=wikitable<br />
!rowspan=1| POS || Total || Clean ||rowspan=1| With<br/>@||rowspan=1|With<br/>#||rowspan=1| Clean<br/>%<br />
|-<br />
| n||31166|| 31166||0||0||100<br />
|-<br />
| vblex||9317|| 9317||0||0||100 <br />
|- <br />
| adj||2269|| 2269||0||0||100<br />
|-<br />
| np||1410|| 1410||0||0||100<br />
|-<br />
| adv||1236|| 1236||0||0||100 <br />
|-<br />
| prn||172|| 172||0||0||100<br />
|-<br />
| pr||107|| 107||0||0||100<br />
|-<br />
| abbr||78|| 78||0||0||100<br />
|-<br />
| num||63|| 63||0||0||100<br />
|-<br />
| det||62|| 62||0||0||100<br />
|-<br />
| vaux||51|| 51||0||0||100<br />
|-<br />
| cnjadv||34|| 34||0||0||100<br />
|-<br />
| vbmod||26|| 26||0||0||100<br />
|-<br />
| vbser||24|| 24||0||0||100<br />
|-<br />
| ij||23|| 23||0||0||100<br />
|-<br />
| cnjcoo||19|| 19||0||0||100<br />
|-<br />
| cnjsub||16|| 16||0||0||100<br />
|-<br />
| vbhaver||12|| 12||0||0||100<br />
|-<br />
| rel||4|| 4||0||0||100<br />
|-<br />
| preadv||2|| 2||0||0||100<br />
|-<br />
| guio||1|| 1||0||0||100<br />
|-<br />
| cm||1|| 1||0||0||100<br />
|}<br />
<br />
= Work done before November =<br />
<br />
<br />
Progress is not so big :)<br />
<br />
== Results ==<br />
=== Vocabulary ===<br />
Coverage of corpora now:<br />
<br />
SETimes:92,93%<br />
<br />
EuroParl: 96,76%<br />
<br />
NewsCommentary:96,62%<br />
<br />
Total stems in apertium-eng-kaz.eng-kaz.dix: 13359<br />
<br />
*Some interchunk rules added<br />
*Cleaning # from europarl corpora, did not finish.<br />
*Correcting some errors, and gereating wrong attributes, like <pp>, etc.<br />
<br />
== Future work ==<br />
<br />
*Cleaning all # from europarl <br />
*Solving problem with "Are/Am" same morph analyse as "I AM a doctor": ^vP_q<VPQ><aor>{ }$ ^obj-pron<NP><sg><p2><PXD><CD>{^сіз<prn><pers><p2><2><4><5>$}$ ^nP_ger<NP><PD><ND><ger><PXD><CD>{^ойна<v><tv><4><5><3><2><6>$}$^sent<Q_mark>{^?<sent>$}$^sent<SENT>{^.<sent>$}$<br />
*Something wrong with regression-tests<br />
*Correcting some errors, and gereating wrong attributes, like <pp>, etc.<br />
<br />
= November 2014 to do list =<br />
<br />
<br />
<br />
== Example of transfer with apertium-eng-kaz==<br />
<br />
The small children were playing in the park<br />
Det Adj N Vbe Vger Prep Art N<br />
<br />
*Chunker [.t1x] (pattern of lexical form→action)<br />
[NP Det Adj N] [VP Vbe Vger] [PP Prep Art N]<br />
**Output<br />
[NP Adj N] [VP V-"п" Vaux-отыр] [PP N+Postp]<br />
<br />
===Need to do===<br />
*^detart-adjec-nom<NP><pl><p3><PXD><CD>{ ^кішкентай<adj>$ ^бала<n><2><4><5>$}$<br />
<br />
^pers-verb<VP><ND><PD><ifi><PXD><NXD>'''<CD>'''{ ^ойна<v><tv><prc_perf>$ ^отыр<vaux><6><4><5><3><2><7>$}$ <br />
<br />
→ why 6 and parknot 7 (question for developers)<br />
→ we need to repair this<br />
<br />
*Interchunk[.t2x]<br />
<br />
NP VP PP → NP PP VP<br />
<PRE><br />
^detart-adjec-nom<NP><pl><p3><PXD><CD>{ ^кішкентай<adj>$ ^бала<n><2><4><5>$}$ <br />
^prep-detart-noun<PP><sg><p3><PXD><loc>{ ^саябақ<n><2><4><5>$}$<br />
^pers-verb<VP><pl><p3><ifi><PXD><NXD>{ ^ойна<v><tv><prc_perf>$ ^отыр<vaux><6><4><5><3><2><7>$}$<br />
</PRE><br />
<br />
*Postchunk[.t3x]<br />
<br />
"instantiate labels + remove syntax"<br />
<PRE><br />
^кішкентай<adj>$ <br />
^бала<n><pl><PXD><CD>$ <br />
^саябақ<n><sg><PXD><loc>$ <br />
^ойна<v><tv><prc_perf>$ <br />
^отыр<vaux><NXD><ifi><PXD><p3><pl>$<br />
^.<sent>$<br />
</PRE><br />
<br />
*Cleanup[.t4x]<br />
<br />
Select default values for PXD, CD, NXD, ...<br />
Remove <sg><br />
<br />
<PRE><br />
^кішкентай<adj>$ <br />
^бала<n><pl><nom>$ <br />
^саябақ<n><loc>$ <br />
^ойна<v><tv><prc_perf>$ <br />
^отыр<vaux><ifi><p3><pl>$<br />
^.<sent>$<br />
</PRE><br />
<br />
*Number of rules today:<br />
<br />
$ grep "</rule>" apertium-eng-kaz.eng-kaz.t[1234]x | wc -l<br />
309<br />
<br />
==Work to do generally for English-Kazakh==<br />
<br />
*Lexical selection: lots of work (and cleaning) to do <br />
**DONE for noun, adj, adv<br />
<br />
Getting all nouns in bidix with more than 1 translation (or repeated)<br />
<br />
<PRE><br />
grep ">[a-zA-Z]\+<s n=\"n\"" apertium-eng-kaz.eng-kaz.dix | sed 's/.*[>]\([a-zA-Z]\+[<]s n=\"n\"\).*/\1/g' | <br />
fgrep -v ">" | sort | uniq -c | grep "[2-9] "<br />
</PRE><br />
<br />
<PRE><br />
2× → 249<br />
3× → 50<br />
4× → 15<br />
5×→ 2<br />
6× → 5<br />
</PRE><br />
<br />
respect, <br />
reputation,<br />
possession etc.<br />
<br />
===Transfer===<br />
<br />
**We need to treat "Be N verb" questions like "Am I a doctor?" by deferring copula generation to .t2x or later<br />
**We need a rule for "the girl's mother" which is like the rule for "girl's mother" but with an additional determiner (typical example of rule writing by cutting-and-pasting).<br />
***DONE: det n1's n2; prep n1's n2; prep det n1's n2<br />
**The problem of indirect objects without a preposition (He told Mrs. Doyle).<br />
***maybe by looking at verbs that can do this<br />
***look at 1980's Oxford Advanced Learner's Dictionary and see if A.S. Hornby's verb patterns are of any help (VPx)<br />
**It is a good idea to have NP chunks that are given-name + family-name and similar constructs<br />
***DONE for np-ant + np-cog.<br />
***Have to think about constructions: '''the Head of State''' Nursultan Nazarbayev<br />
<br />
===Miscellaneous===<br />
**"On the way to the hospital" needs to be translated with an adverbial construction with "go". (емханаға бара жатқанда)<br />
**What happens if one wants to change case at .t2x level? Maybe leave it for .t3x<br />
**What to do with proper nouns? Recognize (!), tag, and transliterate?<br />
***What happens if they go through in Latin (possible .twol rules for Latin vowels: e.g. Kilkenny-де but Carlow-да).<br />
***Another possibility (Aida): detect unknown capitalized words (possible?). We tried with regular expressions but they do not seem to work in the apertium-eng-kaz.eng.dix unless they are added to the bilingual and the intersection code notices them (carefyl: cyclic!!!) It does not seem to be related to the -w switch of lt-proc. There is some rubbish in the dictionary now for testing.<br />
<br />
===To compare===<br />
<br />
*http://www.sanasoft.kz/online/translater/<br />
<br />
*http://itranslate4.eu (uses Trident)</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Apertium-view&diff=73448Apertium-view2021-05-27T06:26:12Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div>{{Github-unmigrated-tool}}<br />
{{TOCD}}<br />
Apertium-view is a little program which can be used to view and edit the output of the various stages of an apertium translation.<br />
<br />
The various stages update ''while you type'', and a change made in any one pane updates the subsequent stages.<br />
<br />
[[Image:Apertium-view-screenshot-1.png|thumb|300px|right|My hovercraft is full of eels.<br/>''Daar is palings in my skeertuig'']]<br />
<br />
Currently, the program is in its early stages and it will take some time before it becomes fully usable. But if you are a developer with some knowledge of Python and PyGTK, you can already dive in.<br />
<br />
== What you need ==<br />
<br />
* The [[D-Bus_service_for_Apertium|Apertium D-Bus Service]]<br />
* A working Apertium 3.0 installation (note: this must be installed)<br />
* The python bindings for the GTK SourceView V 2.0 module if you want syntax highlighting<br />
<br />
== Getting apertium-view ==<br />
<br />
'''Note:''' After Apertium's migration to GitHub, this tool is '''read-only''' on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see [[Migrating tools to GitHub]].<br />
<br />
Check out the <code>apertium-tools/apertium-view</code> from the subversion repository.<br />
<br />
== Running apertium-view == <br />
<br />
If you installed the D-Bus service for Apertium correctly, Apertium-view should just work (you should only have to run <code>python apertium-view.py</code>). If it fails to start up, then there might be a problem with your D-Bus setup. Have a look at the [[D-Bus_service_for_Apertium|D-Bus page]] for possible solutions, or come and ask for help in #apertium on irc.oftc.net.<br />
<br />
=== Testing unreleased language pairs from source ===<br />
<br />
You need to install your language pair it in /usr: <code>./configure --prefix=/usr; make; make install</code> (the last as root).<br />
If <code>make install</code> fails with <code>stat() './en-eo.mode': No such file or directory</code>, then try first <code>ln -s modes/* .</code><br />
<br />
==Feature requests==<br />
<br />
* <s>Removing scrollbars when not wanted.</s><br />
* <s>Allow users to set 'mark unknown words' or not.</s><br />
* Automatically resizing panes to fill the screen (when others are minimised)<br />
* Syntax highlighting<br />
::tags are #aaaaaa, ^ and $ are #009900 { } are #999900 @ * # are #990000 and [] are #aaaaff<br />
* Moving chunks as units rather than text.<br />
* Configuration / choosing language pair in the GUI.<br />
* Option to be able to click on an analysis to remove it. (basically, when you click in between / /, it removes the part in between. Would need CTRL+Z to be able to restore analyses (see below).<br />
* Some kind of undo on a per text-pane/stage basis.<br />
* <s>Ability to detach windows (particularly input and output windows).</s><br />
* Scroll panes down when they fill with more information.<br />
* Remember settings when closed.<br />
<br />
==Related software==<br />
<br />
<br />
* [[Apertium-viewer]] is an improved version, which does not require dbus and which is written in Java<br />
<br />
* [[Apertium-tolk]] is similar to, but much simpler than Apertium-view. It only has an input window and an output window. Where Apertium-view is aimed at developers, Apertium-tolk is intended to be as user friendly as possible.<br />
<br />
[[Category:Tools]]<br />
[[Category:User interfaces]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Travis_settings_for_Apertium&diff=73447Travis settings for Apertium2021-05-27T06:26:05Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div>This line is ignored by IRC bots.<br />
<br />
[http://www.example.com Travis-ci] is a "continuous integration" tool that works for example on github. What this actually means is that you can set an apertium language or language pair on github to automatically build and test on each commit. You only need to set up a script that installs dependencies on travis, e.g. ubuntu, and then runs the autotools and make check. The configuration file is written in yaml. <br />
<br />
This is an example for a monolingual data using hfst (from [apertium-fin]):<br />
<br />
<pre><br />
dist: trusty<br />
before_install:<br />
- curl -sS http://apertium.projectjj.com/apt/install-nightly.sh | sudo bash<br />
- sudo apt-get install hfst apertium lttoolbox apertium-dev lttoolbox-dev libhfst48-dev cg3<br />
script:<br />
- ./autogen.sh<br />
- ./configure<br />
- make<br />
- make check<br />
notifications:<br />
irc:<br />
channels:<br />
- "irc.oftc.net#apertium"<br />
on_failure: change<br />
on_success: never<br />
</pre><br />
<br />
This is an example of bilingual data, with non-released languages built into the test process (from [apertium-fin-deu]):<br />
<br />
<pre><br />
dist: trusty<br />
before_install:<br />
- curl -sS http://apertium.projectjj.com/apt/install-nightly.sh | sudo bash<br />
- sudo apt-get install hfst apertium lttoolbox apertium-dev apertium-deu lttoolbox-dev libhfst48-dev cg3 apertium-lex-tools<br />
- curl -sS https://github.com/flammie/apertium-fin/archive/master.zip -o master.zip<br />
- unzip master.zip<br />
- pushd apertium-fin-master && ./autogen.sh && ./configure && make && sudo make install && popd<br />
script:<br />
- ./autogen.sh<br />
- ./configure<br />
- make<br />
- make check<br />
notifications:<br />
irc:<br />
channels:<br />
- "irc.oftc.net#apertium"<br />
on_failure: always<br />
on_success: never<br />
</pre></div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Hfst&diff=73445Hfst2021-05-27T06:25:41Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div>{{TOCD}}<br />
'''hfst''' is the Helsinki finite-state toolkit. This is formalism-compatible with both lexc and twolc, so, kind of like [[foma]] is to xfst. It is currently being used in [[apertium-sme-nob]], [[apertium-fin-sme]], [[apertium-kaz-tat]] and in few other pairs which involve Turkic languages.<br />
<br />
The IRC channel is <code>#hfst</code> at <code>irc.oftc.net</code> (you may try [irc://irc.oftc.net/#hfst irc://irc.oftc.net/#hfst] if your browser supports it, or enter #hfst into https://webchat.oftc.net/ if you want a web client). The [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstHome HFST Wiki] has some very good documentation (see especially the page [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstReadme HfstReadme] when you run into compilation problems).<br />
<br />
HFST is actually created as a set of wrappers over several possible ''back-ends'', [[Foma]], [[OpenFST]], [[SFST]], …. The latest versions of HFST include the back-ends you need, so there's no reason to install any of these backends separately.<br />
<br />
{{Github-migration-check}}<br />
==Building and installing HFST==<br />
<br />
<span style="color: #f00;">See [[Installation]], for most real operating systems you can now get pre-built packages of HFST (as well as other core tools) through your regular package manager.</span><br />
<br />
<br />
If you wish to hack on the HFST C++ code itself (or you are on some system that doesn't have packages yet), you can follow this procedure:<br />
<br />
===Install prerequisites===<br />
<br />
You will need the regular build dependencies:<br />
* <code>automake, autoconf, libtool, flex, bison, g++, libreadline-dev</code><br />
<br />
If you've already installed apertium/lttoolbox these should be installed already; if not, they should be easily installable with your package manager, e.g. <br />
* Ubuntu: <code>sudo apt-get install automake autoconf libtool flex bison g++ libreadline-dev</code><br />
* Arch Linux: <code>sudo pacman -S base-devel</code><br />
* MacOS X users should install the general [[Prerequisites_for_Mac_OS_X]] first, then <code>sudo port install bison readline</code><br />
<br />
===Download HFST===<br />
<br />
Either use the latest release (recommended for users), or go with the bleeding-edge Git version (recommended for developers).<br />
<br />
====From Git repository====<br />
<br />
<pre><br />
$ git clone https://github.com/hfst/hfst.git<br />
$ cd hfst/<br />
$ ./autogen.sh<br />
$ ./configure<br />
$ make<br />
</pre><br />
<br />
(The autogen step is only needed when using Git, not with the tarball.)<br />
<br />
====Released tarball====<br />
<br />
Download the latest release, named something like hfst-X.Y.Z.tar.gz, from https://github.com/hfst/hfst/releases, then<br />
<pre><br />
$ tar -xzf hfst-X.Y.Z.tgz<br />
$ cd hfst-X.Y.Z/<br />
</pre><br />
(replacing X.Y.Z for the version you downloaded)<br />
<br />
===Configure===<br />
<br />
In the configure step, you can turn on/off features and backends and such. <small>The [[OpenFST]] backend is included in the HFST distribution, while [[foma]] and [[SFST]] are not and are not recommended since they typically lead to more trouble than it's worth.</small><br />
<br />
For most users, this should work:<br />
<pre><br />
$ ./configure --enable-proc --without-foma --enable-lexc --enable-all-tools<br />
</pre><br />
<br />
The above command will configure it to be installed to /usr/local in the <code>make install</code> step (below). <br />
<br />
If you want hfst and back-ends installed somewhere else, you can do<br />
<pre><br />
$ ./configure --enable-proc --without-foma --enable-lexc --enable-all-tools --prefix=/home/USERNAME/local/<br />
</pre><br />
<br />
'''Note: When we say USERNAME we mean your username, you need to replace it with your username, if you don't know what it is, you can find out by typing <code>whoami</code>'''<br />
<br />
<br />
You can also add <code>--with-unicode-handler=glib</code> (or <code>--with-unicode-handler=ICU</code>) to the ./configure step if you have glib (or ICU) installed and want better Unicode [https://en.wikipedia.org/wiki/Case_folding#Case_folding Case_folding].<br />
<br />
===Compile and install===<br />
If your autotools version is older than 1.14 (check with <code>automake --version</code>), first do:<br />
<pre>$ scripts/generate-cc-files.sh</pre><br />
<br />
Build by running<br />
<pre>$ make</pre><br />
<br />
<br />
Then you need to install (Note: you need to use <code>sudo make install</code> if you installed it in /usr/local (or did not give a --prefix in the configure step); otherwise, no sudo!)<br />
<pre><br />
$ make install<br />
</pre><br />
<br />
And finally, unless you have a Mac, you may need to do:<br />
<pre><br />
$ sudo ldconfig<br />
</pre><br />
<br />
==Troubleshooting==<br />
When doing "make" with old autotools (pre 1.14?)<br />
<pre>make[5]: *** No rule to make target `xre_parse.hh', needed by `xre_lex.ll'. Stop.</pre><br />
Run <code>scripts/generate-cc-files.sh</code> and then make again.<br />
<br />
<br />
If, during the ./configure step, you see<pre>checking for GNU libc compatible malloc... no<br />
[…]<br />
checking for GNU libc compatible realloc... no</pre> and then during make a bunch of errors like: <pre>/usr/local/include/sfst/mem.h:37:57: error: 'malloc' was not declared in this scope</pre>, try the following:<br />
<br />
<pre>sudo ldconfig<br />
export LD_LIBRARY_PATH=/usr/local/lib<br />
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig</pre><br />
<br />
and then ./configure and make.<br />
<br />
<br />
If, during make, you see errors like<br />
<pre>xre_parse.cc:2293:24: error: invalid conversion from 'const char*' to 'char*' [-fpermissive]</pre><br />
try instead<br />
<pre><br />
make CXXFLAGS=-fpermissive<br />
</pre><br />
<br />
<br />
If, when compiling a dictionary, you end up in a "foma" prompt where you can type stuff, you should remove anything related to foma or "hfst-xfst" from your system, and build HFST anew as described above. <br />
<br />
<br />
For more advices on installation problems, have a look at [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstReadme the Hfst Readme page].<br />
<br />
See also [[Foma]], [[OpenFST]] and [[SFST]] for problems regarding the back-ends.<br />
<br />
==Using==<br />
<br />
<pre><br />
$ svn co https://victorio.uit.no/langtech/trunk/langs/fao<br />
$ cd fao/src<br />
$ make -f Makefile.hfst<br />
<br />
$ echo "orð" | hfst-lookup ../bin/fao-morph.hfst<br />
lookup> <br />
orð orð+N+Neu+Sg+Nom+Indef<br />
orð orð+N+Neu+Sg+Acc+Indef<br />
orð orð+N+Neu+Pl+Nom+Indef<br />
orð orð+N+Neu+Pl+Acc+Indef<br />
<br />
lookup><br />
$<br />
<br />
</pre><br />
<br />
To compile <code>lexc</code> code, first concatenate all the lexc files:<br />
<br />
<pre><br />
$ cat fao-lex.txt noun-fao-lex.txt noun-fao-morph.txt adj-fao-lex.txt \<br />
adj-fao-morph.txt verb-fao-lex.txt verb-fao-morph.txt adv-fao-lex.txt \<br />
abbr-fao-lex.txt acro-fao-lex.txt pron-fao-lex.txt punct-fao-lex.txt \<br />
numeral-fao-lex.txt pp-fao-lex.txt cc-fao-lex.txt cs-fao-lex.txt \<br />
interj-fao-lex.txt det-fao-lex.txt > ../tmp/lexc-all.txt<br />
</pre><br />
<br />
To compile this, just use the <code>hfst-lexc</code> program,<br />
<br />
<pre><br />
hfst-lexc < ../tmp/lexc-all.txt > ../bin/lexc-fao.bin<br />
</pre><br />
<br />
To compile the <code>twol</code> rules, just use the <code>hfst-twolc</code> program,<br />
<br />
<pre><br />
$ hfst-twolc twol-fao.txt > twol-fao.bin<br />
</pre><br />
<br />
And then to compose the lexicon and rule file, use <code>hfst-compose-intersect</code>:<br />
<br />
<pre><br />
$ hfst-compose-intersect -l lexc-fao.bin twol-fao.bin -o fao-gen.hfst<br />
</pre><br />
<br />
This will create a generator, if you want an analyser, you just need to invert the generator with <code>hfst-invert</code>:<br />
<br />
<pre><br />
$ hfst-invert fao-gen.hfst -o fao-morph.hfst<br />
</pre><br />
<br />
==See also==<br />
<br />
* [[Starting a new language with HFST]]<br />
<br />
==External links==<br />
<br />
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]<br />
<br />
[[Category:Morphological analysers]]<br />
[[Category:HFST]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Flyer&diff=73446Flyer2021-05-27T06:25:41Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div>{{TOCD}}<br />
=English=<br />
Apertium ([http://www.apertium.org http://www.apertium.org]) is a free software (GPL) machine translation platform; it was initially designed to translate<br />
between the Romance languages of the Iberian peninsula, but is now being used to translate between more distant language pairs.<br />
<br />
==Who is developing it ?==<br />
<br />
The Apertium engine is being developed in the Transducens research group ([http://transducens.dlsi.ua.es http://transducens.dlsi.ua.es]) at the Department de Llenguatges i Sistemes<br />
Informàtics ([http://www.dlsi.ua.es/ http://www.dlsi.ua.es/]) within the Universitat d'Alacant and also by the company Prompsit Language Engineering ([http://www.prompsit.com http://www.prompsit.com]). Linguistic data are being developed by Transducens, the Seminario<br />
de Lingüística Informàtica of the Universidade de Vigo, the Institut Universitari de Lingüística Aplicada at the<br />
Universitat Pompeu Fabra in Barcelona, along with a number of companies including Prompsit Language Engineering, Imaxin|software and Eleka Ingenieritza Linguistikoa, as well as by independent free software developers<br />
both in Spain and abroad.<br />
<br />
==Funding==<br />
<br />
The Spanish Ministry of Industry, Tourism and Commerce funded the development of the engine and three initial<br />
language pairs: Spanish&ndash;Catalan, Spanish&ndash;Galician and Spanish&ndash;Portuguese. The project has also received funding<br />
from: the Universitat d'Alacant, the Generalitat de Catalunya (Government of Catalonia) to improve the engine for distant pairs and to develop language pairs such as English&ndash;Catalan, Occitan&ndash;Catalan and Occitan&ndash;Spanish, the Romanian Ministry of Foreign Affairs to develop translators between Spanish&ndash;Romanian and Catalan&ndash;Romanian.<br />
<br />
==Currently supported languages==<br />
<br />
There are currently several supported translation pairs published using the Apertium platform. These are:<br />
<br />
*Basque&rarr;Spanish<br />
*Catalan&rarr;Romanian<br />
*Catalan&rarr;Esperanto<br />
*Breton&rarr;French<br />
*English&harr;Catalan<br />
*English&rarr;Esperanto<br />
*English&harr;Spanish<br />
*English&harr;Galician<br />
*French&harr;Catalan<br />
*French&harr;Spanish<br />
*Norwegian Bokmål&rarr;Norwegian Nynorsk<br />
*Occitan&harr;Catalan (both Aranese and ''Occitan Larg'')<br />
*Occitan&harr;Spanish (both Aranese and ''Occitan Larg'')<br />
*Portuguese&harr;Catalan<br />
*Spanish&harr;Catalan<br />
*Spanish&rarr;Esperanto<br />
*Spanish&harr;Portuguese (both European and Brazilian)<br />
*Spanish&harr;Galician<br />
*Romanian&rarr;Spanish<br />
*Welsh&rarr;English<br />
<br />
Other pairs currently under active development, but without a stable release include: English-Afrikaans, Catalan-Romanian, Danish-Swedish, and English-Polish. Stable pairs can be tested through our web interface at [http://www.apertium.org http://www.apertium.org]. Unstable pairs can be tested (at your own risk) at [http://www.apertium.org/testing/ http://www.apertium.org/testing/].<br />
<br />
==How good is it?==<br />
<br />
The quality of the final translations depends greatly on the amount of time spent in development,<br />
and the closeness of the languages. For example Spanish-Catalan has approximately 95% accuracy, but Spanish-Portuguese<br />
has around 90%. With accuracies around 90% one can use the raw translation as a draft that can be ''post-edited'' for publication (''dissemination''). For less related and unreleased pairs such as English-Afrikaans, the accuracy, excluding unknown<br />
words is somewhere around 70%, and even lower for some other pairs, but the resulting translations can still be used to understand a good part of a text written in another language (''assimilation''). The Breton&rarr;French, Welsh&rarr;English, and Basque&rarr;Spanish pairs may be used for that.<br />
<br />
==Downloading==<br />
<br />
Current versions of the engine, linguistic data and documentation can be found on our SourceForge project page ([http://www.sf.net/projects/apertium/ http://www.sf.net/projects/apertium/]). Further<br />
documentation and discussion can be found both on our wiki ([http://xixona.dlsi.ua.es/wiki/ http://xixona.dlsi.ua.es/wiki/]) and mailing list ([mailto:apertium-stuff@lists.sf.net apertium-stuff@lists.sf.net]). We also meet at the IRC channel #apertium at irc.oftc.net (you can use [https://webchat.oftc.net https://webchat.oftc.net] if you don't have an IRC client).<br />
<br />
==Development==<br />
<br />
The project is always looking for developers who are interested in improving the engine and existing data, working on new language pairs (especially those involving less-used or under-resourced languages), creating interfaces,<br />
or adapting the software to fit your needs. Existing free (GPL) data and corpora easily reusable to feed Apertium's dictionaries are also welcome.<br />
<br />
==Applications==<br />
<br />
*Multilingual management of web content such as media<br />
*Rapid localisation of free software<br />
*Translation of documentation between a more resourced language and a less resourced language<br />
*Understanding text in a different language<br />
<br />
=Français=<br />
Apertium (http://www.apertium.org) est une plate-forme de traduction automatique de code ouvert (GPL) initialement conçue pour les langues romanes dans la Péninsule Ibérique, mais qui a été de plus en plus développée pour pouvoir traiter des paires des langues plus divergentes.<br />
<br />
==Qui le développe?==<br />
<br />
Le moteur d'Apertium est développé aussi bien dans le groupe de recherche Transducens del Departament de Llenguatges i Sistemes Informàtics de la Universitat d'Alacant que dans la spin-off Prompsit Language Engineering. Transducens et Prompsit prennent aussi en charge le développement linguistique avec le Seminario de Lingüística Informática de la Universida de Vigo, l'Institut Universitari de Lingüística Aplicada de la Universitat Pompeu Fabra de Barcelona et d'autres entreprises comme imaxin|software et Eleka Ingeniaritza Linguistikoa. Des développeurs volontaires externes tant en Espagne qu'à l'étranger y collaborent.<br />
<br />
==Financement==<br />
<br />
Le Ministère espangol de l'Industrie, du Tourisme et du Commerce a partiellement financé le développement du moteur et des deux des paires des langues initiales: espagnol-catalan et espagnol-galicien. Le projet a aussi été financé par: la Universitat d'Alacant (paire espagnol-portugais et d'autres), la Generalitat de Catalunya (paires anglais-catalan, occitain-catalan, français-catalan et occitain-espagnol, amélioration du moteur pour le traitement de langues éloignées), le Ministère des Affaires Étrangères de la Roumanie (paires espagnol-roumain et catalan-roumain), etc.<br />
<br />
==Paires de langues disponibles==<br />
<br />
Actuellement, il y a dix-sept paires des langues dans la plate-forme Apertium:<br />
<br />
*Espagnol-Catalan<br />
*Espagnol-Portugais<br />
*Espagnol-Galicien<br />
*Catalan-Français<br />
*Catalan-Occitain<br />
*Espagnol-Roumain<br />
*Anglais-Catalan<br />
*Anglais-Espganol<br />
*Espganol-Galicien<br />
*Français-Espagnol<br />
*Esperanto-Espagnol<br />
*Gallois-Anglais<br />
*Esperanto-Catalan<br />
*Portugais-Catalan<br />
*Portugais-Galicien<br />
*Basque-Espagnol<br />
<br />
Ces paires peuvent être testés sur notre site http://xixona.dlsi.ua.es/apertium/.<br />
<br />
==Quelle est la qualité?==<br />
<br />
La qualité des traductions finales dépend, dans une grande mesure, du temps mis dans le développement d'une paire déterminée et de la proximité des langues. Par exemple, entre l'espagnol et le catalan on atteint un pourcentage de succès de 95%; entre l'espagnol et le portugais de 90%. Pour les langues plus éloignées ce pourcentage est plus bas.<br />
<br />
==Téléchargements==<br />
<br />
Les versions les plus récentes du moteur, les donnés linguistiques, la documentation et d'autres outils peuvent être téléchargés sur le site du projet à SourceForge ([http://www.sf.net/projects/apertium/ http://www.sf.net/projects/apertium/]). Vous pouvez trouver de la documentation et plus d'information tant sur notre wiki ([http://xixona.dlsi.ua.es/wiki/ http://xixona.dlsi.ua.es/wiki/]), que sur notre liste de distribution ([mailto:apertium-stuff@lists.sf.net apertium-stuff@lists.sf.net]).<br />
<br />
==Développement==<br />
<br />
Le projet cherche toujours des développeurs intéressés à améliorer le moteur et les données déjà existantes, à travailler sur des nouvelles paires des langues (notamment celles dont les langues sont minoritaires ou ayant peu de ressources), à créér des interfaces ou à adapter le logiciel à des besoins particulières. Merci de rendre disponible les données et de faciliter des corpus de code ouvert et réutilisables afin d'améliorer les dictionnaires d'Apertium.<br />
<br />
==Applications==<br />
<br />
*Gestion des webs avec des contenus multilingues utilisés, par exemple, par les médias<br />
*Localisation rapide de logiciel de code ouvert<br />
*Traduction de documentation entre les langues ayant beaucoup de ressources et les langues ayant peu de ressources<br />
<br />
=Македонски=<br />
Apertium ([http://www.apertium.org http://www.apertium.org]) слободна платформа за машински превод на јазици; првично е дизајниран да преведува помеѓу Романски јазици од Иберискиот полуостров, но сега се користи за се подалечни јазици.<br />
<br />
==Кој го развива ?==<br />
<br />
Apertium погонот е развиван од Transducens истражувачката група од Department de Llenguatges i Sistemes Informàtics во склоп на Universitat d'Alacant и исто така од компанијата Prompsit Language Engineering. Лингвистичките податоци се развиваат од Transducens, the Seminario de Lingüística Informàtica од Universidade de Vigo, на институтот Universitari de Lingüística Aplicada од Universitat Pompeu Fabra во Barcelona, заедно со поголем број на компании вклучувајќи ги и Prompsit Language Engineering, Imaxin|software и Eleka Ingenieritza Linguistikoa, како и независни развивачи на слободен софтвер - како од Шпанија така и од странство.<br />
<br />
==Финансирање==<br />
<br />
Шпанското министерство за индустрија, туризам и комерција го финансираше развојот на погонот и три иницијални јазични парови: Шпанско-Каталонски, Шпанско-Галски и Шпанско-Португалски. Проектот исто така, има добиено средства од: Universitat d'Alacant, Generalitat de Catalunya (Владата на Каталонија) за подобрување на погонот за подалечни парови и за развивање на јазични парови како што се Англиско-Каталонски, Окситански-Каталонски и Окситански-Шпански, Романското министерство за надворешни работи за развивање на Шпанско-Романски и Каталонско-Романски јазик.<br />
<br />
==Подржани јазици во моментов==<br />
<br />
Во моментов достапни се седум јазични парови, кои можат да бидат преведувани преку Apertium платформата. Тоа се:<br />
<br />
*Шпанско-Каталонски<br />
*Шпанско-Португалски<br />
*Шпанско-Галициски<br />
*Каталонско-Француски<br />
*Каталонско-Окситонски<br />
*Шпанско-Романски<br />
*Англиско-Каталонски<br />
<br />
Други парови кои во моментов се во развојна фаза се: Француско-Шпански, Англиско-Африкански, Англиско-Велшки, Каталонско-Романски, Шпанско-Баскиски и Англиско-Полски. Стабилните парови (како и оние во развој, под сопствен ризик) може да бидат тестирани преку нашата веб апликација на http://xixona.dlsi.ua.es/apertium/.<br />
<br />
==Колку е добар?==<br />
<br />
Квалитетот на крајниот превод зависи во голема мера од времето поминато во развој и близината на јазиците. На пример Шпанско-Каталонскиот е преведуван приближно со 95% точност, но Шпанско-Португалскиот со околу 90%. За помалку поврзани јазици како што е Англиско-Африкански, точноста е околу 70%(исклучувајќи ги непознатите зборови).<br />
<br />
==Преземање==<br />
<br />
Актуелните верзии на погонот, лингивистичките податоци и документацијата се достапни преку SourceForge страната на нашиот проект ([http://www.sf.net/projects/apertium/ http://www.sf.net/projects/apertium/]). Понатаму, документација и дискусии може да бидат најдени на нашето вики ([http://wiki.apertium.org http://wiki.apertium.org]) и преку мејлинг листата ([mailto:apertium-stuff@lists.sf.net apertium-stuff@lists.sf.net]).<br />
<br />
==Развој==<br />
<br />
На проектот секогаш му се потребни програмери кои се заинтересирани до го подобрат погонот и постоечките податоци, работење на нови јазични парови (посебно на оние кои не се користат често или нема доволно ресурси за нив), за креирање на интерфејс програми или адаптирање на софтверот на твоите потреби. Постоечки слободни(GPL) податоци и корпус, кој што лесно може да се вметне во речниците на Apertium се исто така добредојдени.<br />
<br />
==Употреба==<br />
<br />
*Повеќејазичен менаџмент на веб содржина<br />
*Брза локализација на слободен софтвер<br />
*Превод на документација помеѓу повеќе застапени и помалку застапени јазици<br />
<br />
=Castellano=<br />
Apertium (http://www.apertium.org) es una plataforma de traducción automática de código abierto (GPL) inicialmente diseñada para las lenguas romances de la Península Ibérica, pero que ha sido recientemente ampliada para poder tratar pares de lenguas más divergentes.<br />
<br />
==¿Quién lo desarrolla?==<br />
<br />
El motor de Apertium se desarrolla tanto dentro del grupo de investigación Transducens del Departament de Llenguatges i Sistemes Informàtics de la Universitat d'Alacant como de la spin-off Prompsit Language Engineering. Transducens y Prompsit se encargan también del desarrollo lingüístico junto con el Seminario de Lingüística Informática de la Universidade de Vigo, el Institut Universitari de Lingüística Aplicada de la Universitat Pompeu Fabra de Barcelona y otras empresas como imaxin|software y Eleka Ingeniaritza Linguistikoa. También recibe las colaboraciones de desarrolladores externos voluntarios tanto de dentro como de fuera de España.<br />
<br />
==Financiación==<br />
<br />
El Ministerio de Industria, Turismo y Comercio finació parcialmente el desarrollo del motor y de dos de los pares de lenguas iniciales: español-catalán y español-gallego. El proyecto también ha sido financiado por: la Universidad de Alicante (par español-portugués y otros), la Generalitat de Catalunya (mejora del motor para el tratamiento de lenguas distantes y pares inglés-catalán, occitano-catalán, francés catalán y occitano-español), el Ministerio de Asuntos Exteriores de Rumanía (pares español-rumano y catalán-rumano), etc.<br />
<br />
==Pares de lenguas disponibles==<br />
<br />
Actualmente hay siete pares de lenguas disponibles que usan la plataforma Apertium:<br />
<br />
*Español-Catalán<br />
*Español-Portugués<br />
*Español-Gallego<br />
*Catalán-Francés<br />
*Catalán-Occitano<br />
*Español-Rumano<br />
*Inglés-Catalán<br />
<br />
<br />
Otros pares de lenguas que están siendo activamente desarrollados pero no poseen aún una versión estable son: francés-español, inglés-afrikáans, inglés-galés, catalán-rumano, español-euskera e inglés-polaco. Los pares estables (e inestables aunque sin garantías) se pueden probar a través de nuestra web en http://xixona.dlsi.ua.es/apertium/.<br />
<br />
==¿Qué calidad ofrecen?==<br />
<br />
La calidad de las traducciones finales depende, en gran medida, del tiempo invertido en el desarrollo de un par determinado y la cercanía de las lenguas. Por ejemplo, entre español y catalán se consigue un porcentaje de éxito del 95%; entre español y portugués del 90%. Para lenguas más alejadas y sin versión estable como inglés-afrikáans este porcentaje, sin contar las palabras desconocidas, está alrededor del 70%.<br />
<br />
==Descargas==<br />
<br />
Las versiones más recientes del motor, datos lingüísticos, documentación y otras herramientas se pueden descargar de la página del proyecto en SourceForge ([http://www.sf.net/projects/apertium/ http://www.sf.net/projects/apertium/]). Se puede encontrar documentación e información adicional tanto en nuestro wiki ([http://xixona.dlsi.ua.es/wiki/ http://xixona.dlsi.ua.es/wiki/]) como en nuestra lista de distribución ([mailto:apertium-stuff@lists.sf.net apertium-stuff@lists.sf.net]).<br />
<br />
==Desarrollo==<br />
<br />
El proyecto busca continuamente desarrolladores intesesados en mejorar el motor y los datos existentes, en trabajar en nuevos pares de lenguas (especialmente aquellos que incluyen lenguas minoritarias o con pocos recursos), en crear interfaces o adaptar el software a necesidades particulares. También se agradece la disponibilización de datos y corpora libres (GPL) que sean reutilizables para mejorar los diccionarios de Apertium.<br />
<br />
==Aplicaciones==<br />
<br />
*Gestión de webs con contenidos multilingües usadas, por ejemplo, por medios de comunicación<br />
*Localización rápida de software libre<br />
*Traducción de documentación entre lenguas con muchos recursos y lenguas con pocos recursos<br />
<br />
=Português=<br />
<br />
Apertium ([http://www.apertium.org http://www.apertium.org]) é uma plataforma de tradução automática de código aberto (GPL) que foi projetada inicialmente para traduzir entre línguas românicas da península Ibérica, no entanto atualmente seu uso se expandiu para pares de línguas mais distantes<br />
<br />
==Quem o desenvolve?==<br />
A máquina de Apertium está sendo desenvolvida pelo grupo Transducens formado por pesquisadores do departamento de linguagens e sistemas informáticos da Universidade de Alicante em uma associação com a empresa Prompsit Language Engineering, uma spin-off desta mesma Universidade. O grupo Transducens e Prompsit também são responsáveis pelo desenvolvimento dos dados lingüísticos junto com o Seminario de Lingüística Informàtica da Universidade de Vigo, o Institut Universitari de Lingüística Aplicada da Universitat Pompeu Fabra de Barcelona e empresas como Imaxin|software e Eleka Ingenieritza Linguistikoa. Também existe um considerável aporte voluntário de desenvolvedores de software livre tanto espanhóis como estrangeiros.<br />
<br />
==Financiamento==<br />
<br />
O Ministério Espanhol de Indústria, Turismo e Comércio financiou parcialmente o desenvolvimento inicial do motor e dos dados lingüísticos dos primeiros pares de línguas: espanhol-catalão e espanhol-galego. O projeto também obteve outras financiações: da Universidade de Alicante (para o par espanhol-português e outros), da Generalitat de Catalunya (melhora do motor para o tratamento de línguas distantes e pares inglês- catalão, occitano-catalão, francês-catalão e occitano-espanhol), do Ministério de Assuntos Exteriores da Romênia (pares espanhol-romeno e catalão-romeno), etc.<br />
<br />
==Pares de línguas disponíveis==<br />
<br />
Atualmente são oferecidos sete pares de língua que fazem uso da plataforma Apertium:<br />
*Espanhol-Catalão<br />
*Espanhol-Português<br />
*Espanhol-Galego<br />
*Catalão-Francês<br />
*Catalão-Romeno<br />
*Espanhol-Romeno<br />
*Inglês-Catalão<br />
<br />
Outros pares de línguas estão em fase de desenvolvimento, porém ainda não apresentam uma versão estável. São eles: Francês-Espanhol, Inglês-Africâner, Inglês-Galês, Catalão-Romeno, Espanhol-Vasco e Inglês-Polonês. As versões estáveis (e as inestáveis, embora sem garantias) podem ser testadas através de nossa web no endereço http://xixona.dlsi.ua.es/apertium/.<br />
<br />
==Que qualidade oferece?==<br />
<br />
A qualidade das traduções depende basicamente do tempo investido no desenvolvimento de um determinado par de línguas e da proximidade existente entre elas. Por exemplo Espanhol-Catalão tem um percentual de acerto de 95% enquanto Espanhol-Português tem aproximadamente um 90%. No entanto, para línguas não emparentadas em versões inestáveis como o par Inglês-Africâner, o grau de acerto, excluindo palavras desconhecidas, é de aproximadamente 70%.<br />
<br />
==Descargas==<br />
<br />
A plataforma Apertium em sua versão atualizada, tanto de motor, quanto dos dados lingüísticos e da documentação pode ser encontrada na página do projeto em SourceForge ([http://www.sf.net/projects/apertium/ http://www.sf.net/projects/apertium/]). Para mais informações e debates entre no nosso Wiki ([http://xixona.dlsi.ua.es/wiki/ http://xixona.dlsi.ua.es/wiki/]) ou na nossa lista de distribuição ([mailto:apertium-stuff@lists.sf.net mailto:apertium-stuff@lists.sf.net]).<br />
<br />
==Desenvolvimento==<br />
<br />
O projeto está constantemente em busca de desenvolvedores que estejam interessados em melhorar a máquina e os datos existentes, trabalhando em um par de línguas novo (principalmente aqueles que envolvem línguas menos usadas ou com menos recursos), criando interfaces,ou adaptando o software para suas próprias necessidades. Também são bem vindas contribuições de corpus e dados livres (GPL) que possam ser reutilizáveis a fim de aprimorar os dicionários de Apertium.<br />
<br />
==Aplicações==<br />
<br />
*Administração de webs com conteúdos multilíngües usadas, por exemplo, por meios de comunicação<br />
*Locaização rápida de software livres <br />
*Tradução de documentos entre línguas com muito e pouco recurso.<br />
<br />
=Català=<br />
<br />
=Afrikaans=<br />
Apertium ([http://www.apertium.org http://www.apertium.org]) is vrye sagteware (GPL) vir masjienvertaling. Hoewel dit oorspronklik ontwikkel is om tussen Romaanse tale van die Iberiese Skiereiland te vertaal, word dit tans aangewend vir tale wat verder weg geleë is.<br />
<br />
==Wie ontwikkel dit?==<br />
<br />
Die Apertium-enjin word tans ontwikkel deur die Transducens-navorsingsgroep van die Departement van Sagteware en Rekenaarstelsels aan die Universiteit van Alicante, asook by die maatskappy Prompsit Language Engineering wat daaruit ontstaan het. Die linguistiese data word ontwikkel deur Transducens, die Rekenaarlinguistiek-groep (SLI) van die Universiteit van Vigo, die Universitêre Instituut vir Toegepaste Linguistiek van die Pompeu Fabra Universiteit in Barcelona, 'n aantal maatskappye, waaronder Prompsit Language Engineering, Imaxin|software en Eleka Ingenieritza Linguistikoa, en onafhanklike ontwikkelaars van vrye sagteware in Spanje en oorsee.<br />
<br />
==Befondsing==<br />
<br />
Die ontwikkeling van die vertaalenjin en die drie aanvanklike taalpare, Spaans-Katalaans, Spaans-Galisies en Spaans-Portugees, is deur die Spaanse Ministerie van Nywerheid, Toerisme en Handel befonds. Die projek het ook fondse ontvang van die Universiteit van Alicante en die regering van Katalonië, om die vertaalenjin vir ander taalpare te verbeter en om taalpare soos Engels-Katalaans, Oksitaans-Katalaans en Oksitaans-Spaans te help ontwikkel, asook van die Roemeense Ministerie van Buitelandse Sake, om vertaalprogramme vir Spaans-Roemeens en Katalaans-Roemeens te help ontwikkel.<br />
<br />
==Tale waarmee dit tans werk==<br />
<br />
Die Apertium-platform werk tans met sewe taalpare. Hulle is:<br />
<br />
* Spaans-Katalaans<br />
* Spaans-Portugees<br />
* Spaans-Galisies<br />
* Katalaans-Frans<br />
* Katalaans-Oksitaans<br />
* Spaans-Roemeens<br />
* Engels-Katalaans<br />
<br />
Ander taalpare wat tans aktief ontwikkel word, maar wat nog nie amptelik beskikbaar gestel is nie, is Frans-Spaans, Engels-Afrikaans, Engels-Wallies, Katalaans-Roemeens, Spaans-Baskies en Engels-Pools. Taalpare met werkende weergawes (asook tale met half-werkende weergawes) kan beproef word by http://xixona.dlsi.ua.es/apertium/.<br />
<br />
==Hoe goed is die vertaling?==<br />
<br />
Die gehalte van die eindvertalings hang in 'n groot mate af van die hoeveelheid ontwikkeling wat reeds gedoen is, asook hoe naby die tale aan mekaar verwant is. Die enjin vir Spaans-Katalaans is byvoorbeeld 95% akkuraat, en Spaans-Portugees is omtrent 90% akkuraat. Tale wat nie so ná aan mekaar verwant is nie, byvoorbeeld Engels-Afrikaans, se akkuraatheid is ongeveer 70% (mits alle woorde in die teks bekend is). <br />
<br />
==Waar om af te laai==<br />
<br />
Huidige weergawes van die enjin, linguistiese data en dokumentasie kan afgelaai word op ons SourceForge-projekbladsy ([http://www.sf.net/projects/apertium/ http://www.sf.net/projects/apertium/]). Ander dokumentasie en vorige gesprekke kan op ons wiki ([http://xixona.dlsi.ua.es/wiki/ http://xixona.dlsi.ua.es/wiki/]) en poslys ([mailto:apertium-stuff@lists.sf.net apertium-stuff@lists.sf.net]) verkry word.<br />
<br />
==Ontwikkeling==<br />
<br />
Die projek verwelkom ontwikkelaars wat graag die enjin en bestaande data wil help verbeter, aan nuwe taalpare wil begin werk (veral tale wat minder algemeen is of waarvoor daar min hulpbronne bestaan), koppelvlakke wil skryf, of die sagteware vir hul eie behoeftes wil aanpas. Bestaande vrye (GPL) data en korpusse wat maklik vir Apertium se woordeboeke aangepas kan word, is ook welkom.<br />
<br />
==Toepassings==<br />
<br />
* Veeltalige hantering van webinhoud soos media<br />
* Vinnige lokalisasie van vrye sagteware<br />
* Vertaling van dokumentasie tussen meer gebruikte en minder gebruikte tale<br />
<br />
[[Category:Promotion HQ]]<br />
[[Category:Documentation]]<br />
[[Category:Documentation in English]]<br />
[[Category:Documentation en français]]<br />
[[Category:Documentación en castellano]]<br />
[[Category:Dokumentado en Esperanto]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Dansk_introduktion&diff=73444Dansk introduktion2021-05-27T06:25:36Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div>= Open Source maskinoversættelse med Apertium =<br />
<br />
=== Introduktion ===<br />
Vi er nok mange der kender problemet: Man har en tekst på dansk og skal bruge den på engelsk. Eller omvendt. Eller man har noget på russisk, og man vil meget gerne lige se om netop det svar man søger måske står i teksten, men kan ikke et ord russisk.<br />
<br />
Heldigvis har vi jo Google Oversæt: Vupti, ind med teksten på [http://translate.google.com/ http://translate.google.com/] og ud kommer en mere eller mindre forståelig oversættelse af teksten. <br />
<br />
Men... se nu hvad der sker hvis vi tager en tilfældig side (fra svensk wikipedia) og oversætter til dansk:<br />
<br />
Trakterna kring Fredriksberg räknas som bebodda sedan 1600-talet.<br />
<br />
Google Oversæt giver:<br />
<br />
Områderne omkring Fredriksberg tælles som har været besat siden 1600-tallet.<br />
<br />
Hmm... er Google nu egentlig en hjælp?<br />
<br />
Den rigtige oversættelse er:<br />
<br />
Områderne omkring Fredriksberg regnes som beboede siden 1600-tallet.<br />
<br />
Generelt skelner man mellem to brugssituationer:<br />
<br />
* '''Formidling''' - maskinoversættelsen giver en råtekst til efterredigering. Målet er at få ''redigeringsdistancen'' (dvs antal rettelser der skal til for at rette maskinoversættelsen så den er acceptabel som en rigtig oversættelse) så langt ned som muligt. <br />
* '''Forståelse''' - man kan ikke forstå kildesproget (f.eks. russisk) og bruger derfor en maskine til at lave en oversættelse som man kan forstå.<br />
<br />
Google Oversæt har problemer på begge fronter: Redigeringsdistancen er ikke ret meget mindre i forhold til bare at rette den svenske tekst, og forståelsen er gået fløjten (beboet -> besat).<br />
<br />
=== Apertium svensk-dansk ===<br />
Apertium er et open source maskinoversættelsessystem med p.t. over 20 sprogpar og mange flere undervejs.<br />
<br />
Apertium er unikt, fordi:<br />
<br />
* ikke blot maskinoversættelsesmotoren, men også alle sprogdata og alle supplerende værktøjer er frigivet under GPL<br />
* det har et meget aktivt og hjælpsomt miljø omkring sig<br />
* det kan udvikles uden at have en universitetsgrad i hverken lingvistik eller IT<br />
* det er hurtigt voksende og omfattende. Således er der blot de sidste 2 måneder kommet engelsk-esperanto, nynorsk-bokmål og svensk-dansk til<br />
<br />
Alle sprogpar kan bruges fra http://apertium.org/, men kan også nemt hentes og installeres på UNIX-baserede systemer. Dertil kommer at det findes i som en del af standarddistribtionen af Ubuntu, desværre i en gammel udgave, så vi installerer det fra bunden, hvilket heldigvis ikke er så svært. <br />
<br />
Først de nødvendige forudsætninger:<br />
<br />
<code><br />
sudo apt-get install git build-essential g++ pkg-config libxml2 libxml2-dev libxml2-utils xsltproc flex automake autoconf libtool libpcre3-dev<br />
</code><br />
<br />
Dernæst lttoolbox (en 'lingvistisk værktøjskasse' som Apertium afhænger af), og så Apertium selv (man kan kopiere nedenstående instrukser fra [http://bit.ly/5K45lb http://bit.ly/5K45lb] eller [http://wiki.apertium.org/wiki/Dansk_introduktion http://wiki.apertium.org/wiki/Dansk_introduktion]):<br />
<br />
<pre><br />
mkdir apertium<br />
cd apertium<br />
<br />
git clone https://github.com/apertium/lttoolbox<br />
cd lttoolbox<br />
sh autogen.sh<br />
make<br />
sudo make install<br />
sudo ldconfig<br />
cd ..<br />
<br />
git clone https://github.com/apertium/apertium<br />
cd apertium<br />
sh autogen.sh<br />
make<br />
sudo make install<br />
sudo ldconfig<br />
cd ..<br />
<br />
git clone https://github.com/apertium/apertium-swe-dan<br />
cd apertium-swe-dan<br />
sh autogen.sh<br />
make<br />
sudo make install<br />
cd ..<br />
</pre><br />
<br />
<br />
Par skal installeres samme sted som Apertium, i /usr/local/ (så brugte du standardpakkerne i Debian ovenfor skal du skrive: sh autogen.sh --prefix=/usr, ellers kan Apertium ikke finde dit sprogpar).<br />
<br />
Nu er vi klar til at prøve en hel oversættelse. Apertium er lavet helt i UNIX-ånden og virker naturligvis med en pipe:<br />
<br />
echo "Trakterna kring Fredriksberg räknas som bebodda sedan 1600-talet" | apertium sv-da<br />
<br />
<br />
Resultatet kommer prompte:<br />
<br />
<nowiki>*Trakterna omkring *Fredriksberg regnes som *bebodda siden 1600-talen</nowiki><br />
<br />
<br />
Stjerner er ord som Apertium sv-da p.t. ikke kender (men det har jo åben kildekode, så det kan vi jo lave om på - mere herom i en senere artikel). Den slags advarsler kan fjernes med parameteren -u hvis man helst er fri for dem.<br />
<br />
Man kan også oversætte filer, i de mest almindelige formater (txt, html, rtf, odt, docx og et par andre) med parameteren -f. <br />
<br />
Her prøver jeg at oversætte artiklen her til svensk. <br />
<br />
apertium -u -f odt da-sv apertium-dkuug-artikel.odt artikel-svensk.odt<br />
<br />
<br />
Første afsnit af artiklen her bliver oversat af Apertium med:<br />
<br />
Vi är nog många dit känner problemet: Man har en text på danska och skal brukar den på engelska. Eller omvende. Eller man har något på ryska, och man vill mycket gerne lika se om just det svara man søg kanske står i texten, mena kan icke et ord ryska.<br />
<br />
Retningen fra dansk til svensk er ikke klar endnu (her bliver det rart med -u), så resultatet bliver blandet:<br />
<br />
$ echo "Retningen fra dansk til svensk er ikke klar endnu (her bliver det rart med -u), så resultatet bliver blandet:" | apertium da-sv -u <br />
<br />
Riktningen från danska til svenska är icke klara ännu (hit blir det behagfull med -u), så resultatet blir bland: <br />
<br />
<br />
Dog er det nok stadig hurtigere at efterredigere denne tekst end at skulle oversætte den danske tekst fra bunden af.<br />
<br />
=== Hvordan virker det? - forklaring for de viderekomne ===<br />
Præcist hvordan oversættelsen skal foregå, dvs hvilke led der skal gås igennem, styres ved gode gammeldaws UNIX-pipes. For svensk → dansk står det i filen modes/sv-da.mode (i /usr/share/apertium/ eller hvor det nu ligger):<br />
<br />
lt-proc sv-da.automorf.bin |<br />
<br />
apertium-tagger -g sv-da.prob | <br />
<br />
apertium-pretransfer | <br />
<br />
apertium-transfer apertium-sv-da.sv-da.t1x sv-da.t1x.bin sv-da.autobil.bin |<br />
<br />
lt-proc -g sv-da.autogen.bin<br />
<br />
<br />
Her kommer en kort forklaring på hvert trin i pipen (normalt er det kun udviklere der har brug for at forstå principperne jeg her forklarer, så fortvivl ikke hvis du syns det er lidt svært at overskue).<br />
<br />
For nemheds skyld starter jeg i den modsatte retning, med dansk → svensk, så jeg kan forklare de første trin med danske eksempler.<br />
<br />
'''Det første trin'''<nowiki> (lt-proc) laver en morfologisk analyse hvor overfladeformen af hvert ord bliver oversat til leksikalsk form. F.eks bliver ordet 'husenes' oversat til hus<n><nt><pl><def><gen> hvilket betyder 'hus' navneord, intetkøn, flertal, bestemt form, genitiv:</nowiki><br />
<br />
$ echo "Husene" | lt-proc da-sv.automorf.bin <br />
<br />
<nowiki>^Husene/Hus<n><nt><pl><def><nom>$ </nowiki><br />
<br />
<br />
$ echo "Hunden ser katten slås" | lt-proc da-sv.automorf.bin <br />
<br />
<nowiki>^Hunden/Hund<n><ut><sg><def><nom>$ ^ser/se<vblex><pres><actv>$ ^katten/kat<n><ut><sg><def><nom>$ ^slås/slå<n><ut><sg><ind><gen>/slå<vblex><pres><pasv>/slå<vblex><inf><pasv>$</nowiki><br />
<br />
<br />
<nowiki>Så 'ser' er verbet 'se' i nutid (præsens - <pres>) aktiv. 'katten' er navneordet 'kat' der er fælleskøn (utrum - <ut>), ental (singularis - <sg>). </nowiki><br />
<br />
Men hvad sker der med 'slås'? Jo, det er flertydigt, det kan både en slå, dvs en krog til at lukke en dør med ('bag lås og slå') i ejeform 'en slås form', og så kan det være udsagnsordet 'slå' i infinitiv passiv ('der slås græs'). Her skal der vælges og en række alternativer står med skråstreger / imellem.<br />
<br />
'''Det næste trin''', taggeren, vælger hvad der mon menes:<br />
<br />
$ echo "Hunden ser katten slås" | lt-proc da-sv.automorf.bin | apertium-tagger -g da-sv.prob <br />
<br />
<nowiki>^Hund<n><ut><sg><def><nom>$ ^se<vblex><pres><actv>$ ^kat<n><ut><sg><def><nom>$ ^slå<n><ut><sg><ind><gen>$</nowiki><br />
<br />
<br />
Overfladeformerne er væk fra datastrømmen og der er valgt præcis ét alternativ (så alle skråstreger er væk).<br />
<br />
Her vælges desværre det forkerte 'slå' (navneordet), og det illustrerer et af de mange klassiske problemer som maskinoversættelsessystemer (incl Apertium) slås med. Her skyldes det dog at taggeren ikke er korrekt trænet for dansk (det er indtil videre kun svensk → dansk der er offentliggjort).<br />
<br />
'''Det tredje trin'''<nowiki> apertium-pretransfer klarer en detalje omkring multiord (f.eks 'vågne op', 'byde velkommen', 'finde på' er bedre at behandle som ét ord) hvor de grammatiske mærker skal flyttes til sidst (^vågne<vblex><pres><actv># op$ bliver til ^vågne# op<vblex><pres><actv>$)</nowiki><br />
<br />
I '''det fjerde trin''', apertium-transfer, bliver ordene i fra kildesproget erstattet med ord på målsproget, og samtidig bliver et sæt regler for, hvordan oversættelse mellem de to sprog skal ske, anvendt.<br />
<br />
Lad os nu gå over til eksempler fra svensk → dansk igen. Sætningen 'Den stora hunden vaknar.' ser, lige før apertium-transfer, sådan her ud:<br />
<br />
<nowiki>^Den<det><def><ut><sg>$ ^stor<adj><pst><un><pl><ind>$ ^hund<n><ut><sg><def><nom>$ ^vakna<vblex><pres><actv>$^.<sent>$</nowiki><br />
<br />
<nowiki>På dansk bruger vi ikke dobbelt bestemt form, så 'Den<def> store hund</nowiki>'''en'''<nowiki><def>'</nowiki><nowiki> skal laves om til 'Den<def> store hund<ind>' (<def>=definit/bestemt form, <ind>=indefinit/ubestemt form). Samtidig skal </nowiki>'vakna' erstattes med 'vågne op' (de andre ord bliver også erstattet, men de hedder det samme på dansk og svensk)<br />
<br />
<nowiki>^Den<det><def><ut><sg>$ ^stor<adj><pst><un><pl><ind>$ ^hund<n><ut><sg><ind><nom>$ ^vågne<vblex><pres><actv># op$^.<sent>$</nowiki><br />
<br />
<br />
'''Det sidste trin''' laver det modsatte af det første trin; morfologisk genering, dvs sørger for at oversætte de leksikalsk former til overfladeformer: <br />
<br />
<nowiki>echo "^Den<det><def><ut><sg>$ ^stor<adj><pst><un><pl><ind>$ ^hund<n><ut><sg><ind><nom>$ ^vågne<vblex><pres><actv># op$^.<sent>$" | lt-proc -g sv-da.autogen.bin</nowiki><br />
<br />
Den store hund vågner op.<br />
<br />
<br />
Apertium lægger uden om disse et de-formattering og et re-formatteringsfilter så man kan oversætte f.eks HTML eller ODF-filer. <br />
<br />
Her ses de forskellige trin på skematisk form:<br />
<br />
<br />
[[Image:]]<br />
<br />
Som sagt sker de ovenstående trin uafhængigt af hinanden, med pipes, og derfor kan ethvert element udskiftes eller udbygges efter behov. Det giver en meget stor fleksibilitet (som også er nødvendig da vi har med sprog at gøre). <br />
<br />
For eksempel får taggeren i trin to, der vælger i tilfælde af tvetydigheder baseret på statistiske metoder (skjulte markovkæder), ofte en 'forbrænder' på som på forhånd udelukker nogle muligheder ud fra nogle regler (GramTrans' Constraint Grammar - også under GPL).<br />
<br />
Der er også mange sprogpar som bruger tre eller flere undertrin i det fjerde trin (transfer), for at fange dybere sproglige strukturer end det er nødvendigt at fange mellem dansk og svensk (for f.eks. engelsk-esperanto er der 5 undertrin).<br />
<br />
<br />
= Boks 1 =<br />
[[Image:]]Svensk-dansk blev hovedsageligt lavet af Michael Kristensen der fik et af Googles sommerstipender (Google Summer of Code - SCoC) til arbejdet over sommeren 2009.<br />
<br />
I 2010 er igen stipendier fra Google på ca. 27000 kr til studerende der vil arbejde med Open Source. Interesserede bør kontakte Jacob allerede i marts måned. Der er også foredrag om GSoC i Ballerup onsdag den 3. marts kl 15.15 i lokale X1.80 på Ingeniørhøjskolen i København, se [http://dejo.dk/gsoc http://dejo.dk/gsoc] .<br />
<br />
<br />
= Boks 2 =<br />
Jacob Nordfalk er ekstern lektor på Ingeniørhøjskolen i København og medudvikler på Apertium, hvor han har lavet esperanto-engelsk. I foråret 2009 var han mentor på udviklingen af det svensk-danske sprogpar.<br />
<br />
Jacob er interesseret i at finde folk der vil samarbejde omkring Apertium og holder gerne foredrag om det:<br />
<br />
* I november holdt han et 2-timers foredrag i DKUUG ([http://www.dkuug.dk/content/view/259/ http://www.dkuug.dk/content/view/259/]) som kan ses på [http://video.dkuug.dk/ http://video.dkuug.dk/]. <br />
* Onsdag den 3. marts kl 16 holder han foredrag i Ballerup på Ingeniørhøjskolen i København - se [http://dejo.dk/apertium http://dejo.dk/apertium].<br />
* Lørdag den 6. marts kl 9 holder han et lynforedrag (på engelsk) og en workshop på [http://opensourcedays.org/ http://opensourcedays.org/].<br />
* Mandag den 29. marts kl 19.30 holder han foredrag (på esperanto) i Kopenhaga Esperanto-Klubo - se [http://dejo.dk/apertium#KEK http://dejo.dk/apertium#KEK] .<br />
<br />
= Boks 3 =<br />
Se også:<br />
<br />
- [http://apertium.org/ http://apertium.org] - Apertiums hjemmeside<br />
<br />
- [http://wiki.apertium.org/ http://wiki.apertium.org] - wiki med masser af hjælp<br />
<br />
- IRC-kanalen #apertium på irc.oftc.net - hjælp til alt om Apertium, ca. 20 timer i døgnet.<br />
<br />
- [http://gramtrans.com/ http://gramtrans.com/] - Den bedste maskinoversætter mellem svensk og dansk (desværre ikke med åben kildekode).<br />
<br />
- [http://hdl.handle.net/10045/12024 http://hdl.handle.net/10045/12024] artikel om det svensk-danske sprogpar</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Template:Dragons&diff=73443Template:Dragons2021-05-27T06:25:34Z<p>Tino Didriksen: Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net"</p>
<hr />
<div><center><br />
<div style="background-color: #fbfbfb; width: 600px; height: 65px; padding: 3px; border: 1px solid #aaa; border-left: 10px solid #f28500; text-align: justify; "><br />
[[Image:Dragon.jpg|65px|left]] <br />
<br />
<span style="font-size: 130%">'''{{sc|hic·sunt·dracones}}'''</span><br/><br />
<span style="font-size: 90%">Here be dragons! To make improvements, please [{{fullurl:{{FULLPAGENAME}}|action=edit}} dive right in]. It is highly recommended that you<br />
hang out on the IRC channel (<code>irc.oftc.net #apertium</code>) and join the [https://lists.sourceforge.net/lists/listinfo/apertium-stuff mailing list].</span><br />
</div><br />
</center></div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=User:Yamaha5&diff=73442User:Yamaha52021-05-27T06:22:43Z<p>Tino Didriksen: Text replacement - "http://webchat.freenode.net" to "https://webchat.oftc.net"</p>
<hr />
<div>I am active at fa.wikipedia.org.<br />
<br />
==Farsi==<br />
*[https://svn.code.sf.net/p/apertium/svn/incubator/apertium-pes/ farsi sources]<br />
*[[Farsi]]<br />
*[[Iranian languages]]<br />
<br />
===paired==<br />
*[[Apertium-tgk-pes]]<br />
*[[Apertium-pes-glk]]<br />
<br />
==links==<br />
*[[Apertium New Language Pair HOWTO]]<br />
*[[Minimal installation from SVN]]<br />
*[https://webchat.oftc.net/?channels=apertium IRC]<br />
<pre><br />
Note: if you want to quickly create the "skeleton" of a language pair (empty dictionaries, makefiles etc.), do:<br />
wget https://raw.githubusercontent.com/apertium/bootstrap/master/apertium-init.py<br />
python3 apertium-init.py xxx<br />
python3 apertium-init.py yyy<br />
python3 apertium-init.py xxx-yyy<br />
You'll get three directories apertium-xxx, apertium-yyy, apertium-xxx-yyy which you should be able to compile like<br />
cd apertium-xxx<br />
./autogen.sh<br />
make<br />
cd -<br />
cd apertium-yyy<br />
./autogen.sh<br />
make<br />
cd -<br />
cd apertium-xxx-yyy<br />
./autogen.sh —with-lang1=../apertium-xxx —with-lang2=../apertium-yyy<br />
make<br />
</pre></div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Template:Main_page_header_fr&diff=73441Template:Main page header fr2021-05-27T06:22:42Z<p>Tino Didriksen: Text replacement - "http://webchat.freenode.net" to "https://webchat.oftc.net"</p>
<hr />
<div>{|style="width:100%; text-align:center;"<br />
|style="width:66%; text-align: left; font-size: 110%;float:left;"|<br />
'''[[Installation (français)|Installation]] • [[Ressources (français)|Ressources]] • [[Contact (français)|Contact]] • [[Documentation (français)|Documentation]] • [[Développement (français)|Développement]] • [[Outils]]'''<br />
|style="width: 33%; text-align: right; font-size: 95%; float:right;"|<br />
[[Image:Gnome-home.png|15px]] [http://www.apertium.org Page principale] • [[Image:Bugs.png|15px]] [http://bugs.apertium.org/ Bugs] • [[Image:Internet.png|15px]] [[Main Page|Wiki]] • [[Image:Gaim.png|15px]] [https://webchat.oftc.net/?channels=apertium Chat]<br />
|}<br />
----</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Hfst&diff=73440Hfst2021-05-27T06:22:41Z<p>Tino Didriksen: Text replacement - "http://webchat.freenode.net" to "https://webchat.oftc.net"</p>
<hr />
<div>{{TOCD}}<br />
'''hfst''' is the Helsinki finite-state toolkit. This is formalism-compatible with both lexc and twolc, so, kind of like [[foma]] is to xfst. It is currently being used in [[apertium-sme-nob]], [[apertium-fin-sme]], [[apertium-kaz-tat]] and in few other pairs which involve Turkic languages.<br />
<br />
The IRC channel is <code>#hfst</code> at <code>irc.freenode.net</code> (you may try [irc://irc.freenode.net/#hfst irc://irc.freenode.net/#hfst] if your browser supports it, or enter #hfst into https://webchat.oftc.net/ if you want a web client). The [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstHome HFST Wiki] has some very good documentation (see especially the page [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstReadme HfstReadme] when you run into compilation problems).<br />
<br />
HFST is actually created as a set of wrappers over several possible ''back-ends'', [[Foma]], [[OpenFST]], [[SFST]], …. The latest versions of HFST include the back-ends you need, so there's no reason to install any of these backends separately.<br />
<br />
{{Github-migration-check}}<br />
==Building and installing HFST==<br />
<br />
<span style="color: #f00;">See [[Installation]], for most real operating systems you can now get pre-built packages of HFST (as well as other core tools) through your regular package manager.</span><br />
<br />
<br />
If you wish to hack on the HFST C++ code itself (or you are on some system that doesn't have packages yet), you can follow this procedure:<br />
<br />
===Install prerequisites===<br />
<br />
You will need the regular build dependencies:<br />
* <code>automake, autoconf, libtool, flex, bison, g++, libreadline-dev</code><br />
<br />
If you've already installed apertium/lttoolbox these should be installed already; if not, they should be easily installable with your package manager, e.g. <br />
* Ubuntu: <code>sudo apt-get install automake autoconf libtool flex bison g++ libreadline-dev</code><br />
* Arch Linux: <code>sudo pacman -S base-devel</code><br />
* MacOS X users should install the general [[Prerequisites_for_Mac_OS_X]] first, then <code>sudo port install bison readline</code><br />
<br />
===Download HFST===<br />
<br />
Either use the latest release (recommended for users), or go with the bleeding-edge Git version (recommended for developers).<br />
<br />
====From Git repository====<br />
<br />
<pre><br />
$ git clone https://github.com/hfst/hfst.git<br />
$ cd hfst/<br />
$ ./autogen.sh<br />
$ ./configure<br />
$ make<br />
</pre><br />
<br />
(The autogen step is only needed when using Git, not with the tarball.)<br />
<br />
====Released tarball====<br />
<br />
Download the latest release, named something like hfst-X.Y.Z.tar.gz, from https://github.com/hfst/hfst/releases, then<br />
<pre><br />
$ tar -xzf hfst-X.Y.Z.tgz<br />
$ cd hfst-X.Y.Z/<br />
</pre><br />
(replacing X.Y.Z for the version you downloaded)<br />
<br />
===Configure===<br />
<br />
In the configure step, you can turn on/off features and backends and such. <small>The [[OpenFST]] backend is included in the HFST distribution, while [[foma]] and [[SFST]] are not and are not recommended since they typically lead to more trouble than it's worth.</small><br />
<br />
For most users, this should work:<br />
<pre><br />
$ ./configure --enable-proc --without-foma --enable-lexc --enable-all-tools<br />
</pre><br />
<br />
The above command will configure it to be installed to /usr/local in the <code>make install</code> step (below). <br />
<br />
If you want hfst and back-ends installed somewhere else, you can do<br />
<pre><br />
$ ./configure --enable-proc --without-foma --enable-lexc --enable-all-tools --prefix=/home/USERNAME/local/<br />
</pre><br />
<br />
'''Note: When we say USERNAME we mean your username, you need to replace it with your username, if you don't know what it is, you can find out by typing <code>whoami</code>'''<br />
<br />
<br />
You can also add <code>--with-unicode-handler=glib</code> (or <code>--with-unicode-handler=ICU</code>) to the ./configure step if you have glib (or ICU) installed and want better Unicode [https://en.wikipedia.org/wiki/Case_folding#Case_folding Case_folding].<br />
<br />
===Compile and install===<br />
If your autotools version is older than 1.14 (check with <code>automake --version</code>), first do:<br />
<pre>$ scripts/generate-cc-files.sh</pre><br />
<br />
Build by running<br />
<pre>$ make</pre><br />
<br />
<br />
Then you need to install (Note: you need to use <code>sudo make install</code> if you installed it in /usr/local (or did not give a --prefix in the configure step); otherwise, no sudo!)<br />
<pre><br />
$ make install<br />
</pre><br />
<br />
And finally, unless you have a Mac, you may need to do:<br />
<pre><br />
$ sudo ldconfig<br />
</pre><br />
<br />
==Troubleshooting==<br />
When doing "make" with old autotools (pre 1.14?)<br />
<pre>make[5]: *** No rule to make target `xre_parse.hh', needed by `xre_lex.ll'. Stop.</pre><br />
Run <code>scripts/generate-cc-files.sh</code> and then make again.<br />
<br />
<br />
If, during the ./configure step, you see<pre>checking for GNU libc compatible malloc... no<br />
[…]<br />
checking for GNU libc compatible realloc... no</pre> and then during make a bunch of errors like: <pre>/usr/local/include/sfst/mem.h:37:57: error: 'malloc' was not declared in this scope</pre>, try the following:<br />
<br />
<pre>sudo ldconfig<br />
export LD_LIBRARY_PATH=/usr/local/lib<br />
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig</pre><br />
<br />
and then ./configure and make.<br />
<br />
<br />
If, during make, you see errors like<br />
<pre>xre_parse.cc:2293:24: error: invalid conversion from 'const char*' to 'char*' [-fpermissive]</pre><br />
try instead<br />
<pre><br />
make CXXFLAGS=-fpermissive<br />
</pre><br />
<br />
<br />
If, when compiling a dictionary, you end up in a "foma" prompt where you can type stuff, you should remove anything related to foma or "hfst-xfst" from your system, and build HFST anew as described above. <br />
<br />
<br />
For more advices on installation problems, have a look at [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstReadme the Hfst Readme page].<br />
<br />
See also [[Foma]], [[OpenFST]] and [[SFST]] for problems regarding the back-ends.<br />
<br />
==Using==<br />
<br />
<pre><br />
$ svn co https://victorio.uit.no/langtech/trunk/langs/fao<br />
$ cd fao/src<br />
$ make -f Makefile.hfst<br />
<br />
$ echo "orð" | hfst-lookup ../bin/fao-morph.hfst<br />
lookup> <br />
orð orð+N+Neu+Sg+Nom+Indef<br />
orð orð+N+Neu+Sg+Acc+Indef<br />
orð orð+N+Neu+Pl+Nom+Indef<br />
orð orð+N+Neu+Pl+Acc+Indef<br />
<br />
lookup><br />
$<br />
<br />
</pre><br />
<br />
To compile <code>lexc</code> code, first concatenate all the lexc files:<br />
<br />
<pre><br />
$ cat fao-lex.txt noun-fao-lex.txt noun-fao-morph.txt adj-fao-lex.txt \<br />
adj-fao-morph.txt verb-fao-lex.txt verb-fao-morph.txt adv-fao-lex.txt \<br />
abbr-fao-lex.txt acro-fao-lex.txt pron-fao-lex.txt punct-fao-lex.txt \<br />
numeral-fao-lex.txt pp-fao-lex.txt cc-fao-lex.txt cs-fao-lex.txt \<br />
interj-fao-lex.txt det-fao-lex.txt > ../tmp/lexc-all.txt<br />
</pre><br />
<br />
To compile this, just use the <code>hfst-lexc</code> program,<br />
<br />
<pre><br />
hfst-lexc < ../tmp/lexc-all.txt > ../bin/lexc-fao.bin<br />
</pre><br />
<br />
To compile the <code>twol</code> rules, just use the <code>hfst-twolc</code> program,<br />
<br />
<pre><br />
$ hfst-twolc twol-fao.txt > twol-fao.bin<br />
</pre><br />
<br />
And then to compose the lexicon and rule file, use <code>hfst-compose-intersect</code>:<br />
<br />
<pre><br />
$ hfst-compose-intersect -l lexc-fao.bin twol-fao.bin -o fao-gen.hfst<br />
</pre><br />
<br />
This will create a generator, if you want an analyser, you just need to invert the generator with <code>hfst-invert</code>:<br />
<br />
<pre><br />
$ hfst-invert fao-gen.hfst -o fao-morph.hfst<br />
</pre><br />
<br />
==See also==<br />
<br />
* [[Starting a new language with HFST]]<br />
<br />
==External links==<br />
<br />
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]<br />
<br />
[[Category:Morphological analysers]]<br />
[[Category:HFST]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Flyer&diff=73439Flyer2021-05-27T06:22:37Z<p>Tino Didriksen: Text replacement - "http://webchat.freenode.net" to "https://webchat.oftc.net"</p>
<hr />
<div>{{TOCD}}<br />
=English=<br />
Apertium ([http://www.apertium.org http://www.apertium.org]) is a free software (GPL) machine translation platform; it was initially designed to translate<br />
between the Romance languages of the Iberian peninsula, but is now being used to translate between more distant language pairs.<br />
<br />
==Who is developing it ?==<br />
<br />
The Apertium engine is being developed in the Transducens research group ([http://transducens.dlsi.ua.es http://transducens.dlsi.ua.es]) at the Department de Llenguatges i Sistemes<br />
Informàtics ([http://www.dlsi.ua.es/ http://www.dlsi.ua.es/]) within the Universitat d'Alacant and also by the company Prompsit Language Engineering ([http://www.prompsit.com http://www.prompsit.com]). Linguistic data are being developed by Transducens, the Seminario<br />
de Lingüística Informàtica of the Universidade de Vigo, the Institut Universitari de Lingüística Aplicada at the<br />
Universitat Pompeu Fabra in Barcelona, along with a number of companies including Prompsit Language Engineering, Imaxin|software and Eleka Ingenieritza Linguistikoa, as well as by independent free software developers<br />
both in Spain and abroad.<br />
<br />
==Funding==<br />
<br />
The Spanish Ministry of Industry, Tourism and Commerce funded the development of the engine and three initial<br />
language pairs: Spanish&ndash;Catalan, Spanish&ndash;Galician and Spanish&ndash;Portuguese. The project has also received funding<br />
from: the Universitat d'Alacant, the Generalitat de Catalunya (Government of Catalonia) to improve the engine for distant pairs and to develop language pairs such as English&ndash;Catalan, Occitan&ndash;Catalan and Occitan&ndash;Spanish, the Romanian Ministry of Foreign Affairs to develop translators between Spanish&ndash;Romanian and Catalan&ndash;Romanian.<br />
<br />
==Currently supported languages==<br />
<br />
There are currently several supported translation pairs published using the Apertium platform. These are:<br />
<br />
*Basque&rarr;Spanish<br />
*Catalan&rarr;Romanian<br />
*Catalan&rarr;Esperanto<br />
*Breton&rarr;French<br />
*English&harr;Catalan<br />
*English&rarr;Esperanto<br />
*English&harr;Spanish<br />
*English&harr;Galician<br />
*French&harr;Catalan<br />
*French&harr;Spanish<br />
*Norwegian Bokmål&rarr;Norwegian Nynorsk<br />
*Occitan&harr;Catalan (both Aranese and ''Occitan Larg'')<br />
*Occitan&harr;Spanish (both Aranese and ''Occitan Larg'')<br />
*Portuguese&harr;Catalan<br />
*Spanish&harr;Catalan<br />
*Spanish&rarr;Esperanto<br />
*Spanish&harr;Portuguese (both European and Brazilian)<br />
*Spanish&harr;Galician<br />
*Romanian&rarr;Spanish<br />
*Welsh&rarr;English<br />
<br />
Other pairs currently under active development, but without a stable release include: English-Afrikaans, Catalan-Romanian, Danish-Swedish, and English-Polish. Stable pairs can be tested through our web interface at [http://www.apertium.org http://www.apertium.org]. Unstable pairs can be tested (at your own risk) at [http://www.apertium.org/testing/ http://www.apertium.org/testing/].<br />
<br />
==How good is it?==<br />
<br />
The quality of the final translations depends greatly on the amount of time spent in development,<br />
and the closeness of the languages. For example Spanish-Catalan has approximately 95% accuracy, but Spanish-Portuguese<br />
has around 90%. With accuracies around 90% one can use the raw translation as a draft that can be ''post-edited'' for publication (''dissemination''). For less related and unreleased pairs such as English-Afrikaans, the accuracy, excluding unknown<br />
words is somewhere around 70%, and even lower for some other pairs, but the resulting translations can still be used to understand a good part of a text written in another language (''assimilation''). The Breton&rarr;French, Welsh&rarr;English, and Basque&rarr;Spanish pairs may be used for that.<br />
<br />
==Downloading==<br />
<br />
Current versions of the engine, linguistic data and documentation can be found on our SourceForge project page ([http://www.sf.net/projects/apertium/ http://www.sf.net/projects/apertium/]). Further<br />
documentation and discussion can be found both on our wiki ([http://xixona.dlsi.ua.es/wiki/ http://xixona.dlsi.ua.es/wiki/]) and mailing list ([mailto:apertium-stuff@lists.sf.net apertium-stuff@lists.sf.net]). We also meet at the IRC channel #apertium at irc.freenode.net (you can use [https://webchat.oftc.net https://webchat.oftc.net] if you don't have an IRC client).<br />
<br />
==Development==<br />
<br />
The project is always looking for developers who are interested in improving the engine and existing data, working on new language pairs (especially those involving less-used or under-resourced languages), creating interfaces,<br />
or adapting the software to fit your needs. Existing free (GPL) data and corpora easily reusable to feed Apertium's dictionaries are also welcome.<br />
<br />
==Applications==<br />
<br />
*Multilingual management of web content such as media<br />
*Rapid localisation of free software<br />
*Translation of documentation between a more resourced language and a less resourced language<br />
*Understanding text in a different language<br />
<br />
=Français=<br />
Apertium (http://www.apertium.org) est une plate-forme de traduction automatique de code ouvert (GPL) initialement conçue pour les langues romanes dans la Péninsule Ibérique, mais qui a été de plus en plus développée pour pouvoir traiter des paires des langues plus divergentes.<br />
<br />
==Qui le développe?==<br />
<br />
Le moteur d'Apertium est développé aussi bien dans le groupe de recherche Transducens del Departament de Llenguatges i Sistemes Informàtics de la Universitat d'Alacant que dans la spin-off Prompsit Language Engineering. Transducens et Prompsit prennent aussi en charge le développement linguistique avec le Seminario de Lingüística Informática de la Universida de Vigo, l'Institut Universitari de Lingüística Aplicada de la Universitat Pompeu Fabra de Barcelona et d'autres entreprises comme imaxin|software et Eleka Ingeniaritza Linguistikoa. Des développeurs volontaires externes tant en Espagne qu'à l'étranger y collaborent.<br />
<br />
==Financement==<br />
<br />
Le Ministère espangol de l'Industrie, du Tourisme et du Commerce a partiellement financé le développement du moteur et des deux des paires des langues initiales: espagnol-catalan et espagnol-galicien. Le projet a aussi été financé par: la Universitat d'Alacant (paire espagnol-portugais et d'autres), la Generalitat de Catalunya (paires anglais-catalan, occitain-catalan, français-catalan et occitain-espagnol, amélioration du moteur pour le traitement de langues éloignées), le Ministère des Affaires Étrangères de la Roumanie (paires espagnol-roumain et catalan-roumain), etc.<br />
<br />
==Paires de langues disponibles==<br />
<br />
Actuellement, il y a dix-sept paires des langues dans la plate-forme Apertium:<br />
<br />
*Espagnol-Catalan<br />
*Espagnol-Portugais<br />
*Espagnol-Galicien<br />
*Catalan-Français<br />
*Catalan-Occitain<br />
*Espagnol-Roumain<br />
*Anglais-Catalan<br />
*Anglais-Espganol<br />
*Espganol-Galicien<br />
*Français-Espagnol<br />
*Esperanto-Espagnol<br />
*Gallois-Anglais<br />
*Esperanto-Catalan<br />
*Portugais-Catalan<br />
*Portugais-Galicien<br />
*Basque-Espagnol<br />
<br />
Ces paires peuvent être testés sur notre site http://xixona.dlsi.ua.es/apertium/.<br />
<br />
==Quelle est la qualité?==<br />
<br />
La qualité des traductions finales dépend, dans une grande mesure, du temps mis dans le développement d'une paire déterminée et de la proximité des langues. Par exemple, entre l'espagnol et le catalan on atteint un pourcentage de succès de 95%; entre l'espagnol et le portugais de 90%. Pour les langues plus éloignées ce pourcentage est plus bas.<br />
<br />
==Téléchargements==<br />
<br />
Les versions les plus récentes du moteur, les donnés linguistiques, la documentation et d'autres outils peuvent être téléchargés sur le site du projet à SourceForge ([http://www.sf.net/projects/apertium/ http://www.sf.net/projects/apertium/]). Vous pouvez trouver de la documentation et plus d'information tant sur notre wiki ([http://xixona.dlsi.ua.es/wiki/ http://xixona.dlsi.ua.es/wiki/]), que sur notre liste de distribution ([mailto:apertium-stuff@lists.sf.net apertium-stuff@lists.sf.net]).<br />
<br />
==Développement==<br />
<br />
Le projet cherche toujours des développeurs intéressés à améliorer le moteur et les données déjà existantes, à travailler sur des nouvelles paires des langues (notamment celles dont les langues sont minoritaires ou ayant peu de ressources), à créér des interfaces ou à adapter le logiciel à des besoins particulières. Merci de rendre disponible les données et de faciliter des corpus de code ouvert et réutilisables afin d'améliorer les dictionnaires d'Apertium.<br />
<br />
==Applications==<br />
<br />
*Gestion des webs avec des contenus multilingues utilisés, par exemple, par les médias<br />
*Localisation rapide de logiciel de code ouvert<br />
*Traduction de documentation entre les langues ayant beaucoup de ressources et les langues ayant peu de ressources<br />
<br />
=Македонски=<br />
Apertium ([http://www.apertium.org http://www.apertium.org]) слободна платформа за машински превод на јазици; првично е дизајниран да преведува помеѓу Романски јазици од Иберискиот полуостров, но сега се користи за се подалечни јазици.<br />
<br />
==Кој го развива ?==<br />
<br />
Apertium погонот е развиван од Transducens истражувачката група од Department de Llenguatges i Sistemes Informàtics во склоп на Universitat d'Alacant и исто така од компанијата Prompsit Language Engineering. Лингвистичките податоци се развиваат од Transducens, the Seminario de Lingüística Informàtica од Universidade de Vigo, на институтот Universitari de Lingüística Aplicada од Universitat Pompeu Fabra во Barcelona, заедно со поголем број на компании вклучувајќи ги и Prompsit Language Engineering, Imaxin|software и Eleka Ingenieritza Linguistikoa, како и независни развивачи на слободен софтвер - како од Шпанија така и од странство.<br />
<br />
==Финансирање==<br />
<br />
Шпанското министерство за индустрија, туризам и комерција го финансираше развојот на погонот и три иницијални јазични парови: Шпанско-Каталонски, Шпанско-Галски и Шпанско-Португалски. Проектот исто така, има добиено средства од: Universitat d'Alacant, Generalitat de Catalunya (Владата на Каталонија) за подобрување на погонот за подалечни парови и за развивање на јазични парови како што се Англиско-Каталонски, Окситански-Каталонски и Окситански-Шпански, Романското министерство за надворешни работи за развивање на Шпанско-Романски и Каталонско-Романски јазик.<br />
<br />
==Подржани јазици во моментов==<br />
<br />
Во моментов достапни се седум јазични парови, кои можат да бидат преведувани преку Apertium платформата. Тоа се:<br />
<br />
*Шпанско-Каталонски<br />
*Шпанско-Португалски<br />
*Шпанско-Галициски<br />
*Каталонско-Француски<br />
*Каталонско-Окситонски<br />
*Шпанско-Романски<br />
*Англиско-Каталонски<br />
<br />
Други парови кои во моментов се во развојна фаза се: Француско-Шпански, Англиско-Африкански, Англиско-Велшки, Каталонско-Романски, Шпанско-Баскиски и Англиско-Полски. Стабилните парови (како и оние во развој, под сопствен ризик) може да бидат тестирани преку нашата веб апликација на http://xixona.dlsi.ua.es/apertium/.<br />
<br />
==Колку е добар?==<br />
<br />
Квалитетот на крајниот превод зависи во голема мера од времето поминато во развој и близината на јазиците. На пример Шпанско-Каталонскиот е преведуван приближно со 95% точност, но Шпанско-Португалскиот со околу 90%. За помалку поврзани јазици како што е Англиско-Африкански, точноста е околу 70%(исклучувајќи ги непознатите зборови).<br />
<br />
==Преземање==<br />
<br />
Актуелните верзии на погонот, лингивистичките податоци и документацијата се достапни преку SourceForge страната на нашиот проект ([http://www.sf.net/projects/apertium/ http://www.sf.net/projects/apertium/]). Понатаму, документација и дискусии може да бидат најдени на нашето вики ([http://wiki.apertium.org http://wiki.apertium.org]) и преку мејлинг листата ([mailto:apertium-stuff@lists.sf.net apertium-stuff@lists.sf.net]).<br />
<br />
==Развој==<br />
<br />
На проектот секогаш му се потребни програмери кои се заинтересирани до го подобрат погонот и постоечките податоци, работење на нови јазични парови (посебно на оние кои не се користат често или нема доволно ресурси за нив), за креирање на интерфејс програми или адаптирање на софтверот на твоите потреби. Постоечки слободни(GPL) податоци и корпус, кој што лесно може да се вметне во речниците на Apertium се исто така добредојдени.<br />
<br />
==Употреба==<br />
<br />
*Повеќејазичен менаџмент на веб содржина<br />
*Брза локализација на слободен софтвер<br />
*Превод на документација помеѓу повеќе застапени и помалку застапени јазици<br />
<br />
=Castellano=<br />
Apertium (http://www.apertium.org) es una plataforma de traducción automática de código abierto (GPL) inicialmente diseñada para las lenguas romances de la Península Ibérica, pero que ha sido recientemente ampliada para poder tratar pares de lenguas más divergentes.<br />
<br />
==¿Quién lo desarrolla?==<br />
<br />
El motor de Apertium se desarrolla tanto dentro del grupo de investigación Transducens del Departament de Llenguatges i Sistemes Informàtics de la Universitat d'Alacant como de la spin-off Prompsit Language Engineering. Transducens y Prompsit se encargan también del desarrollo lingüístico junto con el Seminario de Lingüística Informática de la Universidade de Vigo, el Institut Universitari de Lingüística Aplicada de la Universitat Pompeu Fabra de Barcelona y otras empresas como imaxin|software y Eleka Ingeniaritza Linguistikoa. También recibe las colaboraciones de desarrolladores externos voluntarios tanto de dentro como de fuera de España.<br />
<br />
==Financiación==<br />
<br />
El Ministerio de Industria, Turismo y Comercio finació parcialmente el desarrollo del motor y de dos de los pares de lenguas iniciales: español-catalán y español-gallego. El proyecto también ha sido financiado por: la Universidad de Alicante (par español-portugués y otros), la Generalitat de Catalunya (mejora del motor para el tratamiento de lenguas distantes y pares inglés-catalán, occitano-catalán, francés catalán y occitano-español), el Ministerio de Asuntos Exteriores de Rumanía (pares español-rumano y catalán-rumano), etc.<br />
<br />
==Pares de lenguas disponibles==<br />
<br />
Actualmente hay siete pares de lenguas disponibles que usan la plataforma Apertium:<br />
<br />
*Español-Catalán<br />
*Español-Portugués<br />
*Español-Gallego<br />
*Catalán-Francés<br />
*Catalán-Occitano<br />
*Español-Rumano<br />
*Inglés-Catalán<br />
<br />
<br />
Otros pares de lenguas que están siendo activamente desarrollados pero no poseen aún una versión estable son: francés-español, inglés-afrikáans, inglés-galés, catalán-rumano, español-euskera e inglés-polaco. Los pares estables (e inestables aunque sin garantías) se pueden probar a través de nuestra web en http://xixona.dlsi.ua.es/apertium/.<br />
<br />
==¿Qué calidad ofrecen?==<br />
<br />
La calidad de las traducciones finales depende, en gran medida, del tiempo invertido en el desarrollo de un par determinado y la cercanía de las lenguas. Por ejemplo, entre español y catalán se consigue un porcentaje de éxito del 95%; entre español y portugués del 90%. Para lenguas más alejadas y sin versión estable como inglés-afrikáans este porcentaje, sin contar las palabras desconocidas, está alrededor del 70%.<br />
<br />
==Descargas==<br />
<br />
Las versiones más recientes del motor, datos lingüísticos, documentación y otras herramientas se pueden descargar de la página del proyecto en SourceForge ([http://www.sf.net/projects/apertium/ http://www.sf.net/projects/apertium/]). Se puede encontrar documentación e información adicional tanto en nuestro wiki ([http://xixona.dlsi.ua.es/wiki/ http://xixona.dlsi.ua.es/wiki/]) como en nuestra lista de distribución ([mailto:apertium-stuff@lists.sf.net apertium-stuff@lists.sf.net]).<br />
<br />
==Desarrollo==<br />
<br />
El proyecto busca continuamente desarrolladores intesesados en mejorar el motor y los datos existentes, en trabajar en nuevos pares de lenguas (especialmente aquellos que incluyen lenguas minoritarias o con pocos recursos), en crear interfaces o adaptar el software a necesidades particulares. También se agradece la disponibilización de datos y corpora libres (GPL) que sean reutilizables para mejorar los diccionarios de Apertium.<br />
<br />
==Aplicaciones==<br />
<br />
*Gestión de webs con contenidos multilingües usadas, por ejemplo, por medios de comunicación<br />
*Localización rápida de software libre<br />
*Traducción de documentación entre lenguas con muchos recursos y lenguas con pocos recursos<br />
<br />
=Português=<br />
<br />
Apertium ([http://www.apertium.org http://www.apertium.org]) é uma plataforma de tradução automática de código aberto (GPL) que foi projetada inicialmente para traduzir entre línguas românicas da península Ibérica, no entanto atualmente seu uso se expandiu para pares de línguas mais distantes<br />
<br />
==Quem o desenvolve?==<br />
A máquina de Apertium está sendo desenvolvida pelo grupo Transducens formado por pesquisadores do departamento de linguagens e sistemas informáticos da Universidade de Alicante em uma associação com a empresa Prompsit Language Engineering, uma spin-off desta mesma Universidade. O grupo Transducens e Prompsit também são responsáveis pelo desenvolvimento dos dados lingüísticos junto com o Seminario de Lingüística Informàtica da Universidade de Vigo, o Institut Universitari de Lingüística Aplicada da Universitat Pompeu Fabra de Barcelona e empresas como Imaxin|software e Eleka Ingenieritza Linguistikoa. Também existe um considerável aporte voluntário de desenvolvedores de software livre tanto espanhóis como estrangeiros.<br />
<br />
==Financiamento==<br />
<br />
O Ministério Espanhol de Indústria, Turismo e Comércio financiou parcialmente o desenvolvimento inicial do motor e dos dados lingüísticos dos primeiros pares de línguas: espanhol-catalão e espanhol-galego. O projeto também obteve outras financiações: da Universidade de Alicante (para o par espanhol-português e outros), da Generalitat de Catalunya (melhora do motor para o tratamento de línguas distantes e pares inglês- catalão, occitano-catalão, francês-catalão e occitano-espanhol), do Ministério de Assuntos Exteriores da Romênia (pares espanhol-romeno e catalão-romeno), etc.<br />
<br />
==Pares de línguas disponíveis==<br />
<br />
Atualmente são oferecidos sete pares de língua que fazem uso da plataforma Apertium:<br />
*Espanhol-Catalão<br />
*Espanhol-Português<br />
*Espanhol-Galego<br />
*Catalão-Francês<br />
*Catalão-Romeno<br />
*Espanhol-Romeno<br />
*Inglês-Catalão<br />
<br />
Outros pares de línguas estão em fase de desenvolvimento, porém ainda não apresentam uma versão estável. São eles: Francês-Espanhol, Inglês-Africâner, Inglês-Galês, Catalão-Romeno, Espanhol-Vasco e Inglês-Polonês. As versões estáveis (e as inestáveis, embora sem garantias) podem ser testadas através de nossa web no endereço http://xixona.dlsi.ua.es/apertium/.<br />
<br />
==Que qualidade oferece?==<br />
<br />
A qualidade das traduções depende basicamente do tempo investido no desenvolvimento de um determinado par de línguas e da proximidade existente entre elas. Por exemplo Espanhol-Catalão tem um percentual de acerto de 95% enquanto Espanhol-Português tem aproximadamente um 90%. No entanto, para línguas não emparentadas em versões inestáveis como o par Inglês-Africâner, o grau de acerto, excluindo palavras desconhecidas, é de aproximadamente 70%.<br />
<br />
==Descargas==<br />
<br />
A plataforma Apertium em sua versão atualizada, tanto de motor, quanto dos dados lingüísticos e da documentação pode ser encontrada na página do projeto em SourceForge ([http://www.sf.net/projects/apertium/ http://www.sf.net/projects/apertium/]). Para mais informações e debates entre no nosso Wiki ([http://xixona.dlsi.ua.es/wiki/ http://xixona.dlsi.ua.es/wiki/]) ou na nossa lista de distribuição ([mailto:apertium-stuff@lists.sf.net mailto:apertium-stuff@lists.sf.net]).<br />
<br />
==Desenvolvimento==<br />
<br />
O projeto está constantemente em busca de desenvolvedores que estejam interessados em melhorar a máquina e os datos existentes, trabalhando em um par de línguas novo (principalmente aqueles que envolvem línguas menos usadas ou com menos recursos), criando interfaces,ou adaptando o software para suas próprias necessidades. Também são bem vindas contribuições de corpus e dados livres (GPL) que possam ser reutilizáveis a fim de aprimorar os dicionários de Apertium.<br />
<br />
==Aplicações==<br />
<br />
*Administração de webs com conteúdos multilíngües usadas, por exemplo, por meios de comunicação<br />
*Locaização rápida de software livres <br />
*Tradução de documentos entre línguas com muito e pouco recurso.<br />
<br />
=Català=<br />
<br />
=Afrikaans=<br />
Apertium ([http://www.apertium.org http://www.apertium.org]) is vrye sagteware (GPL) vir masjienvertaling. Hoewel dit oorspronklik ontwikkel is om tussen Romaanse tale van die Iberiese Skiereiland te vertaal, word dit tans aangewend vir tale wat verder weg geleë is.<br />
<br />
==Wie ontwikkel dit?==<br />
<br />
Die Apertium-enjin word tans ontwikkel deur die Transducens-navorsingsgroep van die Departement van Sagteware en Rekenaarstelsels aan die Universiteit van Alicante, asook by die maatskappy Prompsit Language Engineering wat daaruit ontstaan het. Die linguistiese data word ontwikkel deur Transducens, die Rekenaarlinguistiek-groep (SLI) van die Universiteit van Vigo, die Universitêre Instituut vir Toegepaste Linguistiek van die Pompeu Fabra Universiteit in Barcelona, 'n aantal maatskappye, waaronder Prompsit Language Engineering, Imaxin|software en Eleka Ingenieritza Linguistikoa, en onafhanklike ontwikkelaars van vrye sagteware in Spanje en oorsee.<br />
<br />
==Befondsing==<br />
<br />
Die ontwikkeling van die vertaalenjin en die drie aanvanklike taalpare, Spaans-Katalaans, Spaans-Galisies en Spaans-Portugees, is deur die Spaanse Ministerie van Nywerheid, Toerisme en Handel befonds. Die projek het ook fondse ontvang van die Universiteit van Alicante en die regering van Katalonië, om die vertaalenjin vir ander taalpare te verbeter en om taalpare soos Engels-Katalaans, Oksitaans-Katalaans en Oksitaans-Spaans te help ontwikkel, asook van die Roemeense Ministerie van Buitelandse Sake, om vertaalprogramme vir Spaans-Roemeens en Katalaans-Roemeens te help ontwikkel.<br />
<br />
==Tale waarmee dit tans werk==<br />
<br />
Die Apertium-platform werk tans met sewe taalpare. Hulle is:<br />
<br />
* Spaans-Katalaans<br />
* Spaans-Portugees<br />
* Spaans-Galisies<br />
* Katalaans-Frans<br />
* Katalaans-Oksitaans<br />
* Spaans-Roemeens<br />
* Engels-Katalaans<br />
<br />
Ander taalpare wat tans aktief ontwikkel word, maar wat nog nie amptelik beskikbaar gestel is nie, is Frans-Spaans, Engels-Afrikaans, Engels-Wallies, Katalaans-Roemeens, Spaans-Baskies en Engels-Pools. Taalpare met werkende weergawes (asook tale met half-werkende weergawes) kan beproef word by http://xixona.dlsi.ua.es/apertium/.<br />
<br />
==Hoe goed is die vertaling?==<br />
<br />
Die gehalte van die eindvertalings hang in 'n groot mate af van die hoeveelheid ontwikkeling wat reeds gedoen is, asook hoe naby die tale aan mekaar verwant is. Die enjin vir Spaans-Katalaans is byvoorbeeld 95% akkuraat, en Spaans-Portugees is omtrent 90% akkuraat. Tale wat nie so ná aan mekaar verwant is nie, byvoorbeeld Engels-Afrikaans, se akkuraatheid is ongeveer 70% (mits alle woorde in die teks bekend is). <br />
<br />
==Waar om af te laai==<br />
<br />
Huidige weergawes van die enjin, linguistiese data en dokumentasie kan afgelaai word op ons SourceForge-projekbladsy ([http://www.sf.net/projects/apertium/ http://www.sf.net/projects/apertium/]). Ander dokumentasie en vorige gesprekke kan op ons wiki ([http://xixona.dlsi.ua.es/wiki/ http://xixona.dlsi.ua.es/wiki/]) en poslys ([mailto:apertium-stuff@lists.sf.net apertium-stuff@lists.sf.net]) verkry word.<br />
<br />
==Ontwikkeling==<br />
<br />
Die projek verwelkom ontwikkelaars wat graag die enjin en bestaande data wil help verbeter, aan nuwe taalpare wil begin werk (veral tale wat minder algemeen is of waarvoor daar min hulpbronne bestaan), koppelvlakke wil skryf, of die sagteware vir hul eie behoeftes wil aanpas. Bestaande vrye (GPL) data en korpusse wat maklik vir Apertium se woordeboeke aangepas kan word, is ook welkom.<br />
<br />
==Toepassings==<br />
<br />
* Veeltalige hantering van webinhoud soos media<br />
* Vinnige lokalisasie van vrye sagteware<br />
* Vertaling van dokumentasie tussen meer gebruikte en minder gebruikte tale<br />
<br />
[[Category:Promotion HQ]]<br />
[[Category:Documentation]]<br />
[[Category:Documentation in English]]<br />
[[Category:Documentation en français]]<br />
[[Category:Documentación en castellano]]<br />
[[Category:Dokumentado en Esperanto]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Project_Management_Committee&diff=73360Project Management Committee2021-04-14T09:18:38Z<p>Tino Didriksen: </p>
<hr />
<div>The '''Project Management Committee''' is a group of seven Apertium committers, elected to be in charge of things like granting commit rights, signing off on releases, managing repositories and web sites, distributing funds and so on. See the [[Bylaws#Project Management Committee|Bylaws]] for details.<br />
<br />
After the elections in [[PMC_election|march 2020]], the committee is composed of the following members<br />
<br />
{| class="wikitable"<br />
|-<br />
! Name !! status<br />
|-<br />
| Francis M. Tyers || president<br />
|-<br />
| Mikel L. Forcada || elected<br />
|-<br />
| Tino Didriksen || elected<br />
|-<br />
| Xavier Ivars || elected<br />
|-<br />
| Jonathan North Washington || elected<br />
|-<br />
| Sushain K. Cherivirala || elected<br />
|-<br />
| Tanmai Khanna || elected<br />
|}<br />
<br />
Due to participation in GSoC, Tanmai Khanna's appointment was delayed until there was no conflict of interest.<br />
<br />
[[Category:Project Management Committee]][[Category:Governance]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code&diff=73327Ideas for Google Summer of Code2021-04-07T13:51:18Z<p>Tino Didriksen: </p>
<hr />
<div>{{TOCD}}<br />
This is the ideas page for [[Google Summer of Code]], here you can find ideas on interesting projects that would make Apertium more useful for people and improve or expand our functionality. If you have an idea please add it below, if you think you could mentor someone in a particular area, add your name to "Interested mentors" using <nowiki>~~~</nowiki> <br />
<br />
The page is intended as an overview of the kind of projects we have in mind. If one of them particularly piques your interest, please come and discuss with us on <code>#apertium</code> on <code>irc.freenode.net</code>, mail the [[Contact|mailing list]], or draw attention to yourself in some other way. <br />
<br />
Note that, if you have an idea that isn't mentioned here, we would be very interested to hear about it.<br />
<br />
Here are some more things you could look at:<br />
<br />
* [[Top tips for GSOC applications]] <br />
* Get in contact with one of our long-serving [[List of Apertium mentors|mentors]] &mdash; they are nice, honest!<br />
* Pages in the [[:Category:Development|development category]]<br />
* Resources that could be converted or expanded in the [[incubator]]. Consider doing or improving a language pair (see [[incubator]], [[nursery]] and [[staging]] for pairs that need work)<br />
* Unhammer's [[User:Unhammer/wishlist|wishlist]]<br />
* The open issues [https://github.com/search?q=org%3Aapertium&state=open&type=Issues on Github] - especially the [https://github.com/search?q=org%3Aapertium+label%3A%22good+first+issue%22&state=open&type=Issues Good First Issues].<br />
<br />
__TOC__<br />
<br />
If you're a student trying to propose a topic, the recommended way is to request a wiki account and then go to <pre>http://wiki.apertium.org/wiki/User:[[your username]]/GSoC2021Proposal</pre> and click the "create" button near the top of the page. It's also nice to include <code><nowiki>[[Category:GSoC_2021_student_proposals]]</nowiki></code> to help organize submitted proposals.<br />
<br />
== Ideas ==<br />
<br />
{{IdeaSummary<br />
| name = Python API for Apertium<br />
| difficulty = medium<br />
| skills = C++, Python<br />
| description = Update the Python API for Apertium to expose all Apertium modes and test with all major OSes<br />
| rationale = The current Python API misses out on a lot of functionality, like phonemicisation, segmentation, and transliteration, and doesn't work for some OSes <s>like Debian</s>.<br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /Python API<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = OmniLingo and Apertium<br />
| difficulty = medium<br />
| skills = JS, Python<br />
| description = OmniLingo is a language learning system for practising listening comprehension using Apertium data. There is a lot of text processing involved (for example tokenisation) that could be aided by Apertium tools. <br />
| rationale = <br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /OmniLingo<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Web API extensions<br />
| difficulty = medium<br />
| skills = Python<br />
| description = Update the web API for Apertium to expose all Apertium modes <br />
| rationale = The current Web API misses out on a lot of functionality, like phonemicisation, segmentation, and transliteration<br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Apertium APY<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Develop a morphological analyser<br />
| difficulty = easy<br />
| skills = XML or HFST or lexd<br />
| description = Write a morphological analyser and generator for a language that does not yet have one<br />
| rationale = A key part of an Apertium machine translation system is a morphological analyser and generator. The objective of this task is to create an analyser for a language that does not yet have one.<br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Jonathan Washington]], [[User: Sevilay Bayatlı|Sevilay Bayatlı]], Hossep, nlhowell<br />
| more = /Morphological analyser<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Support for Enhanced Dependencies in UD Annotatrix<br />
| difficulty = medium<br />
| skills = NodeJS<br />
| description = UD Annotatrix is an annotation interface for Universal Dependencies, but does not yet support all functionality<br />
| rationale = <br />
| mentors = [[User:Francis Tyers|Francis Tyers]]<br />
| more = /Morphological analyser<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = User-friendly lexical selection training<br />
| difficulty = Medium<br />
| skills = Python, C++, shell scripting<br />
| description = Make it so that training/inference of lexical selection rules is a more user-friendly process<br />
| rationale = Our lexical selection module allows for inferring rules from corpora and word alignments, but the procedure is currently a bit messy, with various scripts involved that require lots of manual tweaking, and many third party tools to be installed. The goal of this task is to make the procedure as user-friendly as possible, so that ideally only a simple config file would be needed, and a driver script would take care of the rest.<br />
| mentors = [[User:Unhammer|Unhammer]], [[User:Mlforcada|Mikel Forcada]]<br />
| more = /User-friendly lexical selection training<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Robust tokenisation in lttoolbox<br />
| difficulty = Medium<br />
| skills = C++, XML, Python<br />
| description = Improve the longest-match left-to-right tokenisation strategy in [[lttoolbox]] to be fully Unicode compliant.<br />
| rationale = One of the most frustrating things about working with Apertium on texts "in the wild" is the way that the tokenisation works. If a letter is not specified in the alphabet, it is dealt with as whitespace, so e.g. you get unknown words split in two so you can end up with stuff like ^G$ö^k$ı^rmak$ which is terrible for further processing. <br />
| mentors = [[User:Francis Tyers|Francis Tyers]], [[User:TommiPirinen|Flammie]]<br />
| more = /Robust tokenisation<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = apertium-separable language-pair integration<br />
| difficulty = Medium<br />
| skills = XML, a scripting language (Python, Perl), some knowledge of linguistics and/or at least one relevant natural language<br />
| description = Choose a language you can identify as having a good number of "multiwords" in the lexicon. Modify all language pairs in Apertium to use the [[Apertium-separable]] module to process the multiwords, and clean up the dictionaries accordingly.<br />
| rationale = Apertium-separable is a newly developed module to process lexical items with discontinguous dependencies, an area where Apertium has traditionally fallen short. Despite all the module has to offer, it has only been put to use in small test cases, and hasn't been integrated into any translation pair's development cycle.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]]<br />
| more = /Apertium separable<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = UD and Apertium integration<br />
| difficulty = Entry level<br />
| skills = python, javascript, HTML, (C++)<br />
| description = Create a range of tools for making Apertium compatible with Universal Dependencies<br />
| rationale = Universal dependencies is a fast growing project aimed at creating a unified annotation scheme for treebanks. This includes both part-of-speech and morphological features. Their annotated corpora could be extremely useful for Apertium for training models for translation. In addition, Apertium's rule-based morphological descriptions could be useful for software that relies on Universal dependencies.<br />
| mentors = [[User:Francis Tyers]] [[User:Firespeaker| Jonathan Washington]]<br />
| more = /UD and Apertium integration <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = rule visualization tools<br />
| difficulty = Medium<br />
| skills = python? javascript? XML<br />
| description = make tools to help visualize the effect of various rules<br />
| rationale = TODO see https://github.com/Jakespringer/dapertium for an example<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Sevilay Bayatlı|Sevilay Bayatlı]]<br />
| more = /Visualization tools<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = dictionary induction from wikis<br />
| difficulty = Medium<br />
| skills = MySQL, mediawiki syntax, perl, maybe C++ or Java; Java, Scala, RDF, and DBpedia to use DBpedia extraction<br />
| description = Extract dictionaries from linguistic wikis<br />
| rationale = Wiki dictionaries and encyclopedias (e.g. omegawiki, wiktionary, wikipedia, dbpedia) contain information (e.g. bilingual equivalences, morphological features, conjugations) that could be exploited to speed up the development of dictionaries for Apertium. This task aims at automatically building dictionaries by extracting different pieces of information from wiki structures such as interlingual links, infoboxes and/or from dbpedia RDF datasets.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]]<br />
| more = /Dictionary induction from wikis<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = unit testing framework<br />
| difficulty = Medium<br />
| skills = perl<br />
| description = adapt https://github.com/TinoDidriksen/regtest for general Apertium use. [https://github.com/TinoDidriksen/regtest/wiki Screenshots of regtest action]<br />
| rationale = We are gradually improving our quality control, with (semi-)automated tests, but these are done on the Wiki on an ad-hoc basis. Having a unified testing framework would allow us to be able to more easily track quality improvements over all language pairs, and more easily deal with regressions.<br />
| mentors = [[User:Xavivars|Xavi Ivars]]<br />
| more = /Unit testing<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Bring an unreleased translation pair to releasable quality<br />
| difficulty = Medium<br />
| skills = shell scripting<br />
| description = Take an unstable language pair and improve its quality, focusing on testvoc<br />
| rationale = Many Apertium language pairs have large dictionaries and have otherwise seen much development, but are not of releasable quality. The point of this project would be bring one translation pair to releasable quality. This would entail obtaining good naïve coverage and a clean [[testvoc]].<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Seviay Bayatlı|Sevilay Bayatlı]], [[User:Hectoralos|Hèctor Alòs i Font]]<br />
| more = /Make a language pair state-of-the-art<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Develop a prototype MT system for a strategic language pair<br />
| difficulty = Medium<br />
| skills = XML, some knowledge of linguistics and of one relevant natural language <br />
| description = Create a translation pair based on two existing language modules, focusing on the dictionary and structural transfer<br />
| rationale = Choose a strategic set of languages to develop an MT system for, such that you know the target language well and morphological transducers for each language are part of Apertium. Develop an Apertium MT system by focusing on writing a bilingual dictionary and structural transfer rules. Expanding the transducers and disambiguation, and writing lexical selection rules and multiword sequences may also be part of the work. The pair may be an existing prototype, but if it's a heavily developed but unreleased pair, consider applying for "Bring an unreleased translation pair to releasable quality" instead.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Sevilay Bayatlı| Sevilay Bayatlı]], [[User:Hectoralos|Hèctor Alòs i Font]]<br />
| more = /Adopt a language pair<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Misc<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Improve elements of Apertium's web infrastructure<br />
| rationale = Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]] have numerous open issues. This project would entail choosing a subset of open issues and features that could realistically be completed in the summer. You're encouraged to speak with the Apertium community to see which features and issues are the most pressing.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Dictionary Lookup<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Finish implementing dictionary lookup mode in Apertium's web infrastructure<br />
| rationale = Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]] have numerous open issues, including half-completed features like dictionary lookup. This project would entail completing the dictionary lookup feature. Some additional features which would be good to work would include automatic reverse lookups (so that a user has a better understanding of the results), grammatical information (such as the gender of nouns or the conjugation paradigms of verbs), and information about MWEs. See [https://github.com/apertium/apertium-html-tools/issues/105 the open issue on GitHub].<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Spell checking<br />
| difficulty = Medium<br />
| skills = html, js, css, python<br />
| description = Add a spell-checking interface to Apertium's web tools<br />
| rationale = [[Apertium-html-tools]] has seen some prototypes for spell-checking interfaces (all in stale PRs and branches on GitHub), but none have ended up being quite ready to integrate into the tools. This project would entail polishing up or recreating an interface, and making sure [[APy]] has a mode that allows access to Apertium voikospell modules. The end result should be a slick, easy-to-use interface for proofing text, with intuitive underlining of text deemed to be misspelled and intuitive presentation and selection of alternatives. [https://github.com/apertium/apertium-html-tools/issues/390 the open issue on GitHub]<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Spell checker web interface<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Suggestions<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Finish implementing a suggestions interface for Apertium's web infrastructure<br />
| rationale = Some work has been done to add a "suggestions" interface to Apertium's website infrastructure [[Apertium-html-tools]] and its supporting API [[APy|Apertium APy]], whereby users can suggest corrected translations. This project would entail finishing that feature. There are some related [https://github.com/apertium/apertium-html-tools/issues/55 issues] and [https://github.com/apertium/apertium-html-tools/pull/252 PRs] on GitHub.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Website Improvements: Orthography conversion interface<br />
| difficulty = Medium<br />
| skills = html, js, css, python<br />
| description = Add an orthography conversion interface to Apertium's web tools<br />
| rationale = Several Apertium language modules (like Kazakh, Kyrgyz, Crimean Tatar, and Hñähñu) have orthography conversion modes in their mode definition files. This project would be to expose those modes through [[APy|Apertium APy]] and provide a simple interface in [[Apertium-html-tools]] to use them.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]]<br />
| more = /Website improvements<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Apertium Browser Plugin<br />
| difficulty = Medium<br />
| skills = html, css, js, python<br />
| description = Expand functionality of Geriaoueg vocabulary assistant<br />
| rationale = [[Geriaoueg]] is a vocabulary assistant with Firefox/Chrom[e/ium] plugins. These plugins interface with Apertium's web API, [[APy|Apertium APy]], and allow a user to look up (in Apertium's dictionaries) word forms from a web page they're viewing. A Firefox/Chrom[e/ium] plugin should also be able to provide in-browser website translation. This project is to clean up the dictionary lookup functionality and add translation support to the plugins. Some APy features may need to be tweaked, but most of the work in this project will be solely in the plugins.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]], [[User:Xavivars|Xavi Ivars]], [[User:Tino_Didriksen|Tino Didriksen]]<br />
| more = /Geriaoueg browser plugin<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Extend Weighted transfer rules<br />
| difficulty = Medium<br />
| skills = C++, python<br />
| description = The weighted transfer module is already applied to the chunker transfer rules. And the idea here is to extend that module to be applied to interchunk and postchunk transfer rules too. <br />
| rationale = As a resource see https://github.com/aboelhamd/Weighted-transfer-rules-module<br />
| mentors = [[User: Sevilay Bayatlı|Sevilay Bayatlı]]<br />
| more = /Make a module <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Automatic Error-Finder / Backpropagation<br />
| difficulty = Medium<br />
| skills = python?<br />
| description = Develop a tool to locate the approximate source of translation errors in the pipeline.<br />
| rationale = Being able to generate a list of probable error sources automatically makes it possible to prioritize issues by frequency, frees up developer time, and is a first step towards automated generation of better rules.<br />
| mentors = ???<br />
| more = /Backpropagation<br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Add support for NMT to web API<br />
| difficulty = Medium<br />
| skills = python, NMT<br />
| description = Add support for a popular NMT engine to Apertium's web API<br />
| rationale = Currently Apertium's web API [[APy|Apertium APy]], supports only Apertium language modules. But the front end could just as easily interface with an API that supports trained NMT models. The point of the project is to add support for one popular NMT package (e.g., OpenNMT or JoeyNMT) to the APy.<br />
| mentors = [[User:Firespeaker|Jonathan Washington]]<br />
| more = <br />
}}<br />
<br />
{{IdeaSummary<br />
| name = Localization (l10n/i18n) of Apertium tools<br />
| difficulty = Medium<br />
| skills = C++<br />
| description = All our command line tools are currently hardcoded as English-only and it would be good if this were otherwise. [https://github.com/apertium/organisation/issues/28#issuecomment-803474833 Coding Challenge]<br />
| rationale = ...<br />
| mentors = [[User:Tino_Didriksen|Tino Didriksen]]<br />
| more = https://github.com/apertium/organisation/issues/28 Github<br />
}}<br />
<br />
[[Category:Development]]<br />
[[Category:Google Summer of Code]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Geriaoueg&diff=73296Geriaoueg2021-04-04T09:27:26Z<p>Tino Didriksen: </p>
<hr />
<div>{{Github-unmigrated-tool}}<br />
[[Image:Pantallazo-Geriaoueg.png|thumb|right|Browsing a Breton website with Geriaoueg pop-up vocabulary hints in French.]]<br />
'''Geriaoueg''' is a set of scripts which use apertium morphological analysers (see [[list of dictionaries]]) and a bilingual wordlist to provide vocabulary help when browsing the web. It was inspired by the BBC's "[http://www.bbc.co.uk/cymru/vocab/ Vocab]" tool. The software can be found in [https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/geriaoueg/ <code>apertium-tools/geriaoueg</code>], currently it works with most web browsers with the exception of Internet Explorer.<br />
<br />
In order to set up the software for a new pair of languages you will need a morphological analyser which outputs Apertium format analyses (e.g. with [[lttoolbox]] or [[HFST]]) and a bilingual tab-separated wordlist. It is hoped that this will make Apertium format resources more useful to more people, and be a step toward building a full machine translation system.<br />
<br />
There has also been work on browser plugins with similar functionality.<br />
<br />
==Todo==<br />
<br />
* Make it parse HTML better &mdash; there will be a list of websites it is expected to parse, e.g. BBC News, VilaWeb, VOA, Wikipedia<br />
* Make it deal with different character encodings (at least ISO-8859-x and UTF-8)<br />
* Make it work with Internet Explorer<br />
* Extend it to support more languages (at least all of the trunk/ languages &mdash; approx. 20)<br />
* Internationalisation of the interface (.po-ise all strings)<br />
* Make it prettier (see for example BBC Vocab and Lingro)<br />
* An option which will only highlight the words above/under a certain frequency freshhold, like a sliding bar. For beginners they can have all the words highlighted, but for more advanced readers, they can have only the most infrequent words.<br />
* A way to have Basque, Kyrgyz and Sámi work, and compounds<br />
* Make it optionally read in dictionaries in [[lttoolbox]] format.<br />
<br />
==External links==<br />
<br />
* [http://elx.dlsi.ua.es/geriaoueg/ Geriaoueg]<br />
* https://github.com/vigneshv59/geriaoueg-firefox (gci 2014 project)<br />
* https://github.com/vigneshv59/geriaoueg-chrome (gci 2014 project)<br />
* https://github.com/GrammarSoft/proofing-webext may also have useful code to parse HTML, specifically [https://github.com/GrammarSoft/proofing-webext/blob/master/js/shared.js#L159 skipNonText] and rest of that file.<br />
<br />
[[Category:Tools]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Geriaoueg&diff=73295Geriaoueg2021-04-04T09:24:38Z<p>Tino Didriksen: </p>
<hr />
<div>{{Github-unmigrated-tool}}<br />
[[Image:Pantallazo-Geriaoueg.png|thumb|right|Browsing a Breton website with Geriaoueg pop-up vocabulary hints in French.]]<br />
'''Geriaoueg''' is a set of scripts which use apertium morphological analysers (see [[list of dictionaries]]) and a bilingual wordlist to provide vocabulary help when browsing the web. It was inspired by the BBC's "[http://www.bbc.co.uk/cymru/vocab/ Vocab]" tool. The software can be found in [https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/geriaoueg/ <code>apertium-tools/geriaoueg</code>], currently it works with most web browsers with the exception of Internet Explorer.<br />
<br />
In order to set up the software for a new pair of languages you will need a morphological analyser which outputs Apertium format analyses (e.g. with [[lttoolbox]] or [[HFST]]) and a bilingual tab-separated wordlist. It is hoped that this will make Apertium format resources more useful to more people, and be a step toward building a full machine translation system.<br />
<br />
There has also been work on browser plugins with similar functionality.<br />
<br />
==Todo==<br />
<br />
* Make it parse HTML better &mdash; there will be a list of websites it is expected to parse, e.g. BBC News, VilaWeb, VOA, Wikipedia<br />
* Make it deal with different character encodings (at least ISO-8859-x and UTF-8)<br />
* Make it work with Internet Explorer<br />
* Extend it to support more languages (at least all of the trunk/ languages &mdash; approx. 20)<br />
* Internationalisation of the interface (.po-ise all strings)<br />
* Make it prettier (see for example BBC Vocab and Lingro)<br />
* An option which will only highlight the words above/under a certain frequency freshhold, like a sliding bar. For beginners they can have all the words highlighted, but for more advanced readers, they can have only the most infrequent words.<br />
* A way to have Basque, Kyrgyz and Sámi work, and compounds<br />
* Make it optionally read in dictionaries in [[lttoolbox]] format.<br />
<br />
==External links==<br />
<br />
* [http://elx.dlsi.ua.es/geriaoueg/ Geriaoueg]<br />
* https://github.com/vigneshv59/geriaoueg-firefox (gci 2014 project)<br />
* https://github.com/vigneshv59/geriaoueg-chrome (gci 2014 project)<br />
* https://github.com/GrammarSoft/proofing-webext may also have useful code to parse HTML<br />
<br />
[[Category:Tools]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Translate_without_disambiguation&diff=73270Translate without disambiguation2021-04-02T21:05:42Z<p>Tino Didriksen: wget -> curl</p>
<hr />
<div>Prerequisites:<br />
<br />
<pre><br />
curl -sS https://apertium.projectjj.com/apt/install-nightly.sh | sudo bash<br />
sudo apt install python3-streamparser apertium-eng-ita<br />
</pre><br />
<br />
<pre><br />
$ echo "fruit flies like a banana" | lt-proc -w '/usr/share/apertium/apertium-eng-ita/eng-ita.automorf.bin' | python3 -c $'import streamparser,sys\nfor (b, lu) in streamparser.parse_file(sys.stdin,with_text=True):\n print(b+"[/]".join(["^"+streamparser.reading_to_string(r)+"$" for r in lu.readings]),end="")' | apertium-pretransfer| lt-proc -b '/usr/share/apertium/apertium-eng-ita/eng-ita.autobil.bin' | python3 -c $'import streamparser,sys\nfor (b, lu) in streamparser.parse_file(sys.stdin, with_text=True):\n print(b + "[/]".join(["^"+lu.wordform+"/"+streamparser.reading_to_string(r)+"$" for r in lu.readings]), end="")' | apertium-transfer -b '/usr/share/apertium/apertium-eng-ita/apertium-eng-ita.eng-ita.t1x' '/usr/share/apertium/apertium-eng-ita/eng-ita.t1x.bin' | apertium-interchunk '/usr/share/apertium/apertium-eng-ita/apertium-eng-ita.eng-ita.t2x' '/usr/share/apertium/apertium-eng-ita/eng-ita.t2x.bin' | apertium-postchunk '/usr/share/apertium/apertium-eng-ita/apertium-eng-ita.eng-ita.t3x' '/usr/share/apertium/apertium-eng-ita/eng-ita.t3x.bin' | lt-proc -g '/usr/share/apertium/apertium-eng-ita/eng-ita.autogen.bin' | lt-proc -p '/usr/share/apertium/apertium-eng-ita/eng-ita.autopgen.bin'<br />
<br />
</pre><br />
gives<br />
<pre><br />
Mosche di frutta[/]#volare come piacere[/][/]piace[/]#piacere una banana[/]di banano <br />
</pre><br />
<br />
That was one long pipeline. Put this bit in a file:<br />
<pre><br />
lt-proc -w '/usr/share/apertium/apertium-eng-ita/eng-ita.automorf.bin' | python3 -c $'import streamparser,sys\nfor (b, lu) in streamparser.parse_file(sys.stdin,with_text=True):\n print(b+"[/]".join(["^"+streamparser.reading_to_string(r)+"$" for r in lu.readings]),end="")' | apertium-pretransfer| lt-proc -b '/usr/share/apertium/apertium-eng-ita/eng-ita.autobil.bin' | python3 -c $'import streamparser,sys\nfor (b, lu) in streamparser.parse_file(sys.stdin, with_text=True):\n print(b + "[/]".join(["^"+lu.wordform+"/"+streamparser.reading_to_string(r)+"$" for r in lu.readings]), end="")' | apertium-transfer -b '/usr/share/apertium/apertium-eng-ita/apertium-eng-ita.eng-ita.t1x' '/usr/share/apertium/apertium-eng-ita/eng-ita.t1x.bin' | apertium-interchunk '/usr/share/apertium/apertium-eng-ita/apertium-eng-ita.eng-ita.t2x' '/usr/share/apertium/apertium-eng-ita/eng-ita.t2x.bin' | apertium-postchunk '/usr/share/apertium/apertium-eng-ita/apertium-eng-ita.eng-ita.t3x' '/usr/share/apertium/apertium-eng-ita/eng-ita.t3x.bin' | lt-proc -g '/usr/share/apertium/apertium-eng-ita/eng-ita.autogen.bin' | lt-proc -p '/usr/share/apertium/apertium-eng-ita/eng-ita.autopgen.bin'<br />
</pre><br />
<br />
call it "eng-ita_ambig.mode" and chmod +x it. Now you can <br />
<pre><br />
$ echo "fruit flies like a banana" | ./eng-ita_ambig.mode<br />
</pre></div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Prerequisites_for_Debian&diff=73269Prerequisites for Debian2021-04-02T21:05:23Z<p>Tino Didriksen: wget -> curl</p>
<hr />
<div>[[Debian için Gereksinimler|Türkçe]]<br />
<br />
This page shows how to install the standard dependencies of apertium (and related packages) on Debian / Ubuntu / Mint / other Debian-based operating systems.<br />
<br />
<br />
If you don't plan on working on the core C++ packages (but only want to work on / use language pairs), you can install all prerequisites with apt-get, using [[User:Tino Didriksen]]'s repository. The first line here adds this repository to apt, then we can just install the usual way:<br />
<pre><br />
<br />
curl -sS https://apertium.projectjj.com/apt/install-nightly.sh | sudo bash<br />
<br />
# to get the minimal dependencies for building apertium packages:<br />
sudo apt-get -f install apertium-all-dev<br />
<br />
# or, to get all dependencies for building a language from git:<br />
sudo apt-get -f install locales build-essential automake subversion git pkg-config \<br />
gawk libtool apertium-all-dev<br />
</pre><br />
<br />
(Note that you have to run that first line, you should '''not''' install the apertium-related packages that are in the standard Debian/Ubuntu repos if you want to do development, these are massively out-of-date.)<br />
<br />
If you just want to ''use'' a language pair, you can also install that with e.g. <code>sudo apt-get install apertium-kaz-tat</code>.<br />
<br />
If you want to ''work on'' a language pair, you'll have to [[Minimal installation from SVN|check out the language data from SVN]] and compile it (but you can still skip the stuff about installing apertium/lttoolbox/apertium-lex-tools).<br />
<br />
<br />
Otherwise, e.g. if you want to work on the core C++ packages, install their dependencies with apt-get like this:<br />
<br />
<pre><br />
sudo apt-get -f install subversion build-essential pkg-config gawk libxml2 \<br />
libxml2-dev libxml2-utils xsltproc flex automake libtool libpcre3-dev zlib1g-dev<br />
</pre><br />
<br />
If you need [[Vislcg3#Installing_VISL_CG3|vislcg3/cg-proc/cg-comp]] (Constraint Grammar), you should also do:<br />
<pre><br />
sudo apt-get -f install libboost-dev libgoogle-perftools-dev libicu-dev cmake<br />
</pre><br />
<br />
Once you've installed these packages, continue to [[Minimal installation from SVN]].<br />
<br />
<br />
<br />
<br />
[[Category:Installation]]<br />
[[Category:Documentation in English]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Using_Giellatekno_Divvun_spellers_with_LibreOffice-Voikko_on_Debian&diff=73268Using Giellatekno Divvun spellers with LibreOffice-Voikko on Debian2021-04-02T21:02:55Z<p>Tino Didriksen: wget -> curl</p>
<hr />
<div>This shows how to get Northern Saami spell checking under LibreOffice (and on the command line) on Ubuntu/Debian, using the Voikko plugins and Giellatekno/Divvun language data.<br />
<br />
The guide should also be applicable to other languages supported by Giellatekno/Divvun or Apertium that are set up with spelling.<br />
<br />
==Install prerequisites==<br />
<br />
curl -sS https://apertium.projectjj.com/apt/install-nightly.sh | sudo bash<br />
sudo apt-get install libreoffice-voikko<br />
<br />
<!-- Optionally: --><br />
<br />
<!-- hfst-ospell hfst-ospell-dev --><br />
<br />
Now we would like to do <code>sudo apt-get install libvoikko-dev</code>, but unfortunately that package is currently outdated (gives <code>E: Initialižation of Voikko failed: No valid dictionaries were found</code>), so we have to compile it manually:<br />
<br />
sudo apt-get install hfst-ospell-dev \<br />
locales build-essential automake git pkg-config libtool<br />
git clone https://github.com/voikko/corevoikko/<br />
cd corevoikko/libvoikko<br />
./autogen.sh<br />
./configure --with-dictionary-path=/usr/share/voikko:/usr/lib/voikko --enable-hfst<br />
make -j4<br />
sudo make install<br />
echo 'export LD_LIBRARY_PATH=/usr/local/lib:"${LD_LIBRARY_PATH}"' >> ~/.bash_profile<br />
echo 'export PATH=/usr/local/bin:"${PATH}"' >> ~/.bash_profile<br />
<br />
Now close your terminal and open a new one.<br />
<br />
==Install the language data==<br />
<br />
===Installing Giellatekno/Divvun language data===<br />
You can see available Giellatekno/Divvun language packages by doing<br />
<br />
apt-cache search giella single<br />
<br />
We want Northern Saami, so we do:<br />
<br />
sudo apt-get install giella-sme<br />
<br />
But Giellatekno/Divvun packages currently install the actual speller data into a non-standard directory; we'll put it in a special folder inside our home folder which is always searched.<br />
<br />
First we check where the speller data file is:<br />
<br />
dpkg -L giella-sme | grep zhfst$<br />
<br />
This should show something like <code>/usr/lib/x86_64-linux-gnu/voikko/3/se.zhfst</code>.<br />
<br />
Then we make our target folder and copy the file there:<br />
<br />
mkdir -p ~/.voikko/3<br />
cp /usr/lib/x86_64-linux-gnu/voikko/3/se.zhfst ~/.voikko/3/<br />
<br />
<br />
===Installing Apertium language data===<br />
You can also search for <code>apertium</code> in case you want a language covered there:<br />
<br />
apt-cache search apertium single<br />
<br />
Say we want Kazakh:<br />
<br />
apt-cache search apertium-kaz<br />
<br />
Not all Apertium packages have speller-data, so to avoid disappointment we first check that there's speller data in there:<br />
<br />
dpkg -L apertium-kaz | grep zhfst$<br />
<br />
That gives us something like <code>/usr/share/apertium/apertium-kaz/kaz.zhfst</code>, which is what we want. But again, it's in a non-standard directory, so we copy it into ~/.voikko/3, and also give it a name that LibreOffice will understand:<br />
<br />
mkdir -p ~/.voikko/3<br />
cp /usr/share/apertium/apertium-kaz/kaz.zhfst ~/.voikko/3/kk.zhfst<br />
<br />
<br />
== Test the speller from the command line ==<br />
<br />
echo gafe | voikkospell -s -d se<br />
<br />
This should give something like:<br />
<br />
W: gafe<br />
S: gáfe<br />
S: gábe<br />
S: gáfen<br />
S: gáfes<br />
S: gáffe<br />
<br />
==Test in LibreOffice==<br />
<br />
[[Image:Language-LibreOffice.png|thumb|right|300px|This is where you change languages for your text]]<br />
<br />
* In the status line at the bottom of your document, click on the fourth tab to change language (you may have to select all your text first, if you've already written something)<br />
* It probably won't be listed, so click "More …" and select "Saami, Northern" (it may also be listed as e.g. "Nordsamisk (Noreg)").<br />
<br />
<br />
[[Image:Libre-Office-Voikko.png|thumb|right|300px|Example underlines for Kazakh]]<br />
<br />
Now you should get red lines under words :-)<br />
<br />
<br />
For some languages, you can select them as Document language under Tools→Options→Language Settings→Languages→Default Languages for Documents. Unfortunately, this does not (yet) go for Northern Saami, but it works for e.g. Kazakh.<br />
<br />
== Problems ==<br />
If you have any problems, [http://wiki.apertium.org/wiki/Contact get in touch] and we'll try to help.<br />
<br />
[[Category:Documentation]]<br />
[[Category:Documentation in English]]<br />
[[Category:Spell checking]]</div>Tino Didriksenhttps://wiki.apertium.org/w/index.php?title=Debian_i%C3%A7in_Gereksinimler&diff=73267Debian için Gereksinimler2021-04-02T21:02:34Z<p>Tino Didriksen: wget -> curl</p>
<hr />
<div>[[Prerequisites for Debian|In English]]<br />
<br />
Bu sayfa apertium için standart gereksinimlerin Debian / Ubuntu / Mint / ve diğer Debian tabanlı işletim sistemlerine kurulumunu göstermektedir.<br />
<br />
Temel C++ paketlerini kullanmayı düşünmüyorsanız (sadece dil çiftlerini kullanmak istiyorsanız), gerekli her şeyi [[User:Tino Didriksen]]'in depolarını kullanarak apt-get ile yükleyebilirsiniz, İlk satır depoları apt'ye ekler ve geri kalanını normal yoldan yapabilirsiniz:<br />
<pre><br />
curl -sS https://apertium.projectjj.com/apt/install-nightly.sh | sudo bash<br />
sudo apt-get -f install apertium-all-dev<br />
<br />
# ya da, svn'den dil oluşturmak için gerekli tüm şeyleri elde etmek için::<br />
sudo apt-get -f install locales build-essential automake subversion pkg-config \<br />
gawk libtool apertium-all-dev<br />
</pre><br />
<br />
(Önemli, geliştirmek istiyorsanız ilk satırı çalıştırmanız gerekiyor. Standart Debian/Ubuntu depolarındaki apertium- ile başlayan paketleri '''yüklememeniz''' gerekiyor. Çünkü onların çoğu eskidir.)<br />
<br />
Sadece dil ikililerini ''kullanmak'' istiyorsanız örnekteki kodu kullanabilirsiniz: <code>sudo apt-get install apertium-kaz-tat</code>.<br />
<br />
Dil ikilileri ''üzerinde çalışmak'' istiyorsanız [[Minimal installation from SVN|check out the language data from SVN]] sayfasını okumalı ve derlemelisiniz ( Hala apertium/lttoolbox/apertium-lex-tools paketlerini yüklemenize gerek yok. Yukarıdaki paketler işinizi görecektir.).<br />
<br />
<br />
Öte yandan, örneğin temel C++ paketleri üzerinde çalışmak istiyosanız, gerekenleri apt-get kullanarak şu şekilde yükleyebilirsiniz:<br />
<br />
<pre><br />
sudo apt-get -f install subversion build-essential pkg-config gawk libxml2 \<br />
libxml2-dev libxml2-utils xsltproc flex automake libtool libpcre3-dev zlib1g-dev<br />
</pre><br />
<br />
[[Vislcg3#Installing_VISL_CG3|vislcg3/cg-proc/cg-comp]] (Kısaltmalar)'a ihtiyacınız olursa ayrıca bunu da uygulayabilirsiniz:<br />
<pre><br />
sudo apt-get -f install libboost-dev libgoogle-perftools-dev libicu-dev cmake<br />
</pre><br />
Bu paketleri yükledikten sonra [[Minimal installation from SVN]] makalesi ile devam edebilirsiniz.<br />
<br />
<br />
<br />
==Ayrıca==<br />
* [[Apertium on Ubuntu]] – Ubuntu/Debian hakkında daha özel yükleme ve hata çözümleri için detaylı sayfaya da bakabilirsiniz.<br />
<br />
<br />
[[Category:Installation]]<br />
[[Category:Documentation in Turkish]]</div>Tino Didriksen