https://wiki.apertium.org/w/api.php?action=feedcontributions&user=Nemo+bis&feedformat=atomApertium - User contributions [en]2024-03-29T01:58:54ZUser contributionsMediaWiki 1.34.1https://wiki.apertium.org/w/index.php?title=Talk:Czech_and_Slovak&diff=59462Talk:Czech and Slovak2016-08-09T22:08:20Z<p>Nemo bis: Created page with "According to [https://www.academia.edu/4080349/Mutual_Intelligibility_of_Languages_in_the_Slavic_Family Lindsay], there are formal intelligibility studies that found written C..."</p>
<hr />
<div>According to [https://www.academia.edu/4080349/Mutual_Intelligibility_of_Languages_in_the_Slavic_Family Lindsay], there are formal intelligibility studies that found written Czech and Slovak to be 98 % intelligible in both directions. The source might be Nabelkova, M. 2007. Closely related languages in contact: Czech, Slovak, "Czechoslovak".<br />
International Journal of the Sociology of Language 183: 53-73. [[User:Nemo bis|Nemo]] ([[User talk:Nemo bis|talk]]) 00:08, 10 August 2016 (CEST)</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=Governance&diff=54170Governance2015-09-25T06:19:32Z<p>Nemo bis: /* Proccess to write a constitution */ typo</p>
<hr />
<div>{{TOCD}}<br />
[http://www.apertium.org Apertium] is turning into a rather large and complex free/open-source project and is ready for the implementation of a more formal scheme for its governance. Important decisions such as, for instance<br />
<br />
* settling a higher-level dictionary format in the spirit of [[metadix]] <br />
* deciding which standard of a language will be adopted, as was done with Occitan<br />
* deciding if Apertium will be moving away from [http://www.sf.net Sourceforge]) <br />
* etc.<br />
<br />
are to be taken in the immediate future in a way which is accepted by a large majority of the Apertium community.<br />
<br />
Currently (January 19, 2009), the project has 6 administrators: ftyers, g-ramirez, mlforcada, sanmarf, sortiz, and xgg, and 79 developers. These positions have all been appointed by a co-opting system.<br />
<br />
Perhaps we need to draft a brief constitution, taking the following points into consideration:<br />
* How does one decide when a developer is appointed or dismissed (currently, administrators have freely appointed developers, and no developer has been dismissed)<br />
* Should decisions affecting the engine and compilers be handled differently from those affecting a language pair or other components?<br />
* How does one decide when a developer becomes administrator (this decision has been taken similarly to the previous one)<br />
* Defining voting rights to take decisions:<br />
** do all decisions need the same degree of consensus?<br />
** should developers have the right to vote?<br />
** should the vote of everyone have the same weight? For instance, should it depend on the number of commits?<br />
** will there be vetoing rights?<br />
** how large a majority will be needed to adopt a decision?<br />
<br />
==Process to write a constitution==<br />
<br />
* 1st. Every developer sends to <code>apertium-stuff</code> a list with 6 developers s/he thinks should be part of the group of 9 people that will write a proposal on how to take decisions and this sort of things.<br />
<br />
* 2nd. The 9 most voted people make a proposal.<br />
<br />
* 3rd. The rest of developers may propose amendments to the original proposal.<br />
<br />
* 4th. The final proposal is approved or rejected in a referendum in which all developers can vote.<br />
<br />
NOTE: To participate in the process one needs to be a developer before November 11.<br />
===First stage: Voting===<br />
<br />
Votes should be sent to the <code>apertium-stuff</code> mailing list, before the deadline of: December 11<br />
<br />
Table of votes registered to date:<br />
<br />
{|class="wikitable sortable"<br />
! Person !! Votes<br />
|-<br />
|Fran || 10<br />
|-<br />
|Mikel || 10<br />
|-<br />
|Felipe || 9<br />
|-<br />
|Gema || 9<br />
|-<br />
|Jimmy || 9<br />
|-<br />
|Sergio || 8<br />
|-<br />
|Juan Antonio || 5<br />
|-<br />
|Jacob || 4<br />
|-<br />
|Mireia || 4<br />
|-<br />
|Miquel || 1<br />
|-<br />
|Unhammer || 1<br />
|-<br />
|Víctor || 1<br />
|-<br />
|Xavi || 1<br />
|-<br />
|}<br />
<br />
{|class="wikitable sortable"<br />
! People who voted<br />
|-<br />
|Jimmy<br />
|-<br />
|Felipe<br />
|-<br />
|Sergio<br />
|-<br />
|Mikel<br />
|-<br />
|Fran<br />
|-<br />
|Gema<br />
|-<br />
|Juan Antonio<br />
|-<br />
|Jacob<br />
|-<br />
|Víctor<br />
|-<br />
|Unhammer<br />
|-<br />
|Hèctor<br />
|-<br />
|Mireia<br />
|}<br />
<br />
==Proposals==<br />
<br />
===Things not to forget (Mikel)===<br />
<br />
* Defining the mission of Apertium as a project<br />
* Defining the roles (perhaps not all of them) and how people acquire those roles: user, developer, admin, module maintainer, member of the assembly, member of the executive committee<br />
* Who grants the license for each module of Apertium?<br />
<br />
===Sergio's proposal===<br />
Translations: [http://www.apertium.org/common/browser.php?dir=es-eo&inurl=http%3A%2F%2Fwiki.apertium.org%2Fwiki%2FGovernance Esperanto].<br />
[http://www.apertium.org/common/browser.php?dir=es-en&inurl=http%3A%2F%2Fwiki.apertium.org%2Fwiki%2FGovernance English].<br />
The following text is only a draft. Please edit, add & remove freely.<br />
<br />
Puede parecer una propuesta bastante antipática, pero se pretende agilizar la toma de decisiones lo más posible.<br />
<br />
* Los órganos de gobierno de Apertium son la Asamblea y la Comisión Ejecutiva. <br />
* La Asamblea está formada por un grupo inicial de miembros (por determinar). El número de miembros de la Asamblea de Apertium no tiene ningún límite. <br />
* Para acceder a formar parte de la Asamblea de Apertium basta con los votos de la mitad de los miembros de la Asamblea + 1 o bien por los votos de toda la Comisión Ejecutiva. <br />
* Para dejar la Asamblea de Apertium basta con no responder un mensaje de correo electrónico sobre el particular durante un mes o bien expresar la voluntad de abandonar la Asamblea.<br />
* La Comisión Ejecutiva está formada por cinco miembros de la Asamblea designados por esta para tomar las decisiones del día a día. <br />
* La Comisión Ejecutiva está dirigida por un Presidente y tiene cuatro vocales más, uno de los cuales ejerce de Secretario.<br />
* La Comisión Ejecutiva vota todas sus decisiones. El Presidente tiene tres votos en cada sufragio, mientras que los otros miembros solamente tienen uno.<br />
* Se sale de la Comisión Ejecutiva por las mismas razones que de la Asamblea o bien por decisión personal del Presidente de la Comisión Ejecutiva, que nombra a otro miembro de la Comisión de entre los miembros de la Asamblea.<br />
* La Asamblea únicamente decide sobre asuntos planteados por la Comisión Ejecutiva. <br />
* La Comisión Ejecutiva tiene la obligación de someter a la Asamblea las decisiones que afecten al trabajo de la mitad o más de los miembros de la Asamblea, pudiendo consultar en otros asuntos si así lo creyese necesario.<br />
* Una vez consultada, las decisiones de la Asamblea son ejecutivas: se prohibe utilizar a la Asamblea con carácter consultivo. <br />
* Las decisiones que la Comisión Ejecutiva toma sin consultar a la Asamblea son archivadas en un diario y son presentadas a la Asamblea, que las aprueba o reprueba en bloque. <br />
* La consecuencia de una reprobación de una gestión por parte de la Asamblea es la salida del Presidente y la elección de una nueva Comisión Ejecutiva por parte de la Asamblea.<br />
<br />
<br />
===Mikel's comments===<br />
<br />
* I would propose a slightly larger executive committee, with 7, or perhaps 9 members, where the chairperson has a quality vote in case of a draw<br />
<br />
=== English Translation ===<br />
<br />
* The governing bodies of Apertium are the Assembly and the Executive Committee. <br />
* The Assembly consists of an initial group of members (to be decided). The number of members of the Assembly of Apertium has no limit. <br />
* ?? <br />
* To leave the Assembly of Apertium it shall suffice to not send an email on the matter for a month or to express the desire to leave the Assembly. <br />
* The Executive Committee consists of five members appointed by the Assembly to make day to day decisions. <br />
* The Executive Committee is directed by a President and has four other members, one of whom acts as Secretary. <br />
* The Executive Committee votes on all decisions. The President's vote shall count as three votes, while those of other members as only one. <br />
* ?? <br />
* The Assembly decides only on matters raised by the Executive Committee? ''placement of "only" makes a big difference''<br />
* The Executive Committee is obliged to submit to the Assembly the decisions which affect the work of half or more of the members of the Assembly and may consult on other such issues if they believe it necessary. <br />
* Once consulted, the Assembly's decisions are executive: prohibit the use of the Assembly in an advisory capacity. <br />
* The decisions that the Executive Committee takes without consulting the Assembly are filed in a journal and presented to the Assembly, which approves or disapproves in block. <br />
* The result of a vote of no confidence by the Assembly is the removal of the president and the election of a new Executive Committee by the Assembly.<br />
<br />
=Interesting reading=<br />
<br />
* [http://gump.apache.org/bylaws.html Apache Gump Project Bylaws]<br />
* [http://helma.org/wiki/Helma+Project+Bylaws/ Helma Project Bylaws]<br />
* [http://ieeexplore.ieee.org/iel5/9518/30166/01385626.pdf Lattemann, Stieglitz (2005) Framework for Governance in Open Source Communities]<br />
* [http://www.techforce.com.br/index.php/news/linux_blog/scientific_study_about_debian_governance_and_organiz%61tion Scientific study about Debian Project governance and social organization]<br />
<br />
[[Category:Governance| ]]</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=User:Nemo_bis/English_and_Italian&diff=49932User:Nemo bis/English and Italian2014-08-21T15:20:35Z<p>Nemo bis: typo</p>
<hr />
<div>I'm interested in [[English and Italian]]. This page is structured as a stub of GSoC application because I thought of applying for it in 2014, but I won't because [http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2014/help_page#10._I_would_like_to_participate_in] ([[user:Nikerabbit|Nikerabbit]] prefers me to be primary/official mentor of a MediaWiki GSoC project, as I had previously promised).<br />
<br />
==Personals==<br />
;Name: Federico Leva<br />
;E-mail address: FirstLastname@tiscali.it<br />
;Other information that may be useful to contact you: Email is a safe bet with me (unless it bounced back to you, of course): if it was clear I had to reply or act on it, then it's in the queue. I enable email notifications for all wikis where it's possible, so you also have a few hundreds talk pages available depending on the topic.<br />
;Why is it you are interested in machine translation? : Because I'm interested in language and i18n/l10n and I'm working on it since 2005 or so. I'm [https://translatewiki.net/wiki/Support/Open_requests translatewiki.net's "pokemaster"] (as [[User:Nikerabbit|Nikerabbit]] called me once) and I'm active as [[MediaZilla:]] triager (in all-time top 10 for some activity metrics) as well as some [http://www.ohloh.net/accounts/nemobis i18n code tweaking]. Plus other Wikimedia stuff you can find by following links from my user page, too much to list. Machine translation has always been a hot topic in Wikimedia.<br />
;Why is it that you are interested in the Apertium project? : I first met it on translatewiki.net around 2009–2010 I guess; my interest was revived when Niklas followed a course on it by Francis Tyers and Tommi Pirinen in 2013 (''[http://laxstrom.name/blag/2013/05/22/on-course-to-machine-translation/ On course to machine translation]''). <br />
;Studies: undergraduate, maths at unimi.it<br />
<br />
==Project==<br />
<br />
[[English and Italian]]!<br />
<br />
* The pair is not released yet, so the GSoC project would actually be two in one: [[Ideas for Google Summer of Code/Adopt a language pair|Adopt a language pair]] + [[Ideas for Google Summer of Code/Make a language pair state-of-the-art|Make a language pair state-of-the-art]].<br />
* Why this pair?<br />
** I want to contribute to Apertium, also because I want to contribute to the Translate extension and to the projects using Translate, and I want to do so in a way that is special, doing something that nobody else is able or interested in doing. Providing Apertium and hence Translate with a translation pair seems to be the best way possible for me.<br />
** I tried asking around and wondering who else in Italy could be interested in developing this Apertium pair, but the university professors I reached out to couldn't think of anyone who could be interested in any kind of work in this area in Italy. There are also very few MediaWiki/Wikimedia developers (zero) and FLOSS developers (including GSoC students: google-opensource.blogspot.it/2013/08/google-summer-of-code-full-of-stats.html]) from Italy. I guess the environment is not favourable, we may never find anyone interested. Despite everything, and until the imminent implosion of our decaying culture, Italian is still an important language and having such a pair would certainly be something to be proud of for Apertium and Google, if we succeeded.<br />
** I have an interest in multilingual Wikimedia projects, for a number of reasons. [[Wiktionary:Meta:Special:LanguageStats/it]] is not good at all; [[Wiktionary:Meta:Meta:Babylon/Translation stats|Italian translation is not in top 20 by activity]] despite Italian language projects being in top 7 by visits: this depresses me. However, I already put too much time into Wikimedia and one of the few rules I manage to respect is that I don't do translations myself unless there is some important text to proofread, otherwise it would be a full time job alone. Instead, I can provide the existing translators with a tool to make their work easier. We can't use Google Translate on Wikimedia projects due to licensing and privacy policy. but we could use Apertium. Our translation memory is fantastic, but helps only to a degree.<br />
** I'm a language geek, but I'm not that good at learning new languages. I know Italian extremely well due to extensive literature reading and intense grammar studying/discussing; and I'm good enough at English thanks to many years of daily written use and a passion for Edward Morgan Forster and Virginia Woolf in original language... but I don't know other languages in a useful way. I can understand Milanese and Venetian for family reasons, but not speak them; and they would not be useful for point (1).<br />
* How is it going to work?<br />
** [[User:Francis Tyers]] indicates as first requirement for a successful pair "Not in Google or can get better quality than in Google". Google seems to be around 20-25 % WER, so the europarl/moses baseline is probably around 25-30 %, so we'd need to reach 30-35 % to be useful and have a reasonable starting point. It doesn't seem impossible.<br />
** Most of the current stub in the incubator comes from the work on Spanish. [[English and Spanish]] is a released pair, it should be possible to reach the same status for Italian and probably reuse some more work done on that pair since 2010 when the last changes to en-it were made.<br />
* What are your qualifications? Why do you think you can manage?<br />
** The following skills are requested by the project description:<br />
*** XML: well, this is "my bread" as we say in Italian, I handled TB of XML by exporting wikis in [https://archive.org/details/wikiteam wikiteam];<br />
***a scripting language (Python, Perl): I don't call myself a coder, but I'm de facto maintainer of [http://code.google.com/p/wikiteam/source/browse/trunk wikiteam scripts] since about 2012 and I've worked with pywikibot since 2006 (especially grammar fixes and other bot replacements; a love story with regex);<br />
*** good knowledge of the language pair adopted: see above; as past experience, I translated PhpBB in 2005 when I was moderator of one of the biggest forums of italianistics.[http://achyra.org/cruscate/profile.php?mode=viewprofile&u=55] <br />
** Dedication: I rarely give up on a project I take up, even though sometimes it takes much longer than expected. For instance I started [[wikt:it:Wikizionario:Importazione dizionari PD|project to import a public domain Italian vocabulary into the Italian Wiktionary]] around 2009. It proved much harder than expected and nobody ever helped me, in practice, but I never gave up. Slowly, it keeps progressing (at bursts). Experience with Apertium should also help me when the time will come to import the transcribed vocabulary into Wiktionary, as I probably won't find a bot owner to whom to delegate this task.<br />
** Coding challenge: ... (''I expect to be able to do a variant of both in less than a day of work; will surely try at some point just for fun'') <br />
** Other <br />
<br />
=== Schedule ===<br />
<br />
I'd make one if I actually applied! It would be interesting to know what parts would require how much relative effort, though. Is there a use case/study on how much work was put in the various phases of the en-es pair?<br />
<br />
I don't plan any full time job this summer; I hope to attend a summer course of Finnish in July-August, depending on available time and accommodation; main exam sessions in my department are in June, July and September; if this project takes more time than expected I can always reduce the huge pool of hours committed to Wikimedia/MediaWiki stuff (it would be nice to learn once again how to limit that ;), I manage only when forced by other commitments).</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=English_and_Italian/Google_Translate&diff=47706English and Italian/Google Translate2014-03-21T16:24:50Z<p>Nemo bis: /* Method */ qualitative</p>
<hr />
<div>A basic evaluation of the Google Translate translation from English to Italian was made on 2014-03-20 according to [[Evaluation]] instructions and apertium-eval-translator.pl from latest trunk. We found a '''21.63 % WER'''.<br />
<br />
== Method ==<br />
<br />
As a base we used about 1000 words of an English leaflet (originally translated from German, which accounts for some peculiarities) by Wikimedia and Creative Commons (which accounts for some specialized terminology): [https://meta.wikimedia.org/w/index.php?title=Free_knowledge_based_on_Creative_Commons_licenses%2Fit&diff=7896398&oldid=7879177 Google Translate], [https://meta.wikimedia.org/w/index.php?title=Free_knowledge_based_on_Creative_Commons_licenses/it&diff=7896990&oldid=7896398 manual corrections].<br />
<br />
Considerations:<br />
* only agrammatical passages and turns of grammatical meaning were corrected,<br />
* as well as some inconsistencies in translation and major lexical errors which didn't convey the original meaning at all;<br />
* but errors which would not be evident without knowing the source were left alone, as well as lexical choices which are disputable but not outright wrong,<br />
* and the text wasn't made as fluent as it would be required to completely cover the machine translation origin.<br />
<br />
The second result was calculated after removing the whitespace incorrectly added around punctuation; the difference is very significant, confirming our choice not to correct such whitespace errors to avoid excess noise in the evaluation.<br />
<br />
Qualitative evaluation: anecdotically speaking, responsible translators often say that adjusting a Google Translation often takes more time and effort than translating on your own, because the meaning twists and errors of all sorts are often catastrophic, so you need to check the source language continuously and the track chosen by the machine is often the steepest one to follow. Mostly, one saves time by not having to open a dictionary dozens or hundreds times.<br />
<br />
== First revision ==<br />
Common errors found:<br />
*missing concordance of singular/plural and male/female between noun and adjective/pronoun;<br />
*articles, especially definite article vs. no article;<br />
*co-ordinated sentences and pronouns (those... who and the like).<br />
<br />
<pre><br />
$ perl apertium-eval-translator.pl -test MT.txt -ref postedit.txt<br />
Test file: 'MT.txt'<br />
Reference file 'postedit.txt'<br />
<br />
Statistics about input files<br />
-------------------------------------------------------<br />
Number of words in reference: 994<br />
Number of words in test: 984<br />
Number of unknown words (marked with a star) in test: <br />
Percentage of unknown words: 0.00 %<br />
<br />
Results when removing unknown-word marks (stars)<br />
-------------------------------------------------------<br />
Edit distance: 215<br />
Word error rate (WER): 21.63 %<br />
Number of position-independent correct words: 862<br />
Position-independent word error rate (PER): 13.28 %<br />
<br />
Results when unknown-word marks (stars) are not removed<br />
-------------------------------------------------------<br />
Edit distance: 215<br />
Word Error Rate (WER): 21.63 %<br />
Number of position-independent correct words: 862<br />
Position-independent word error rate (PER): 13.28 %<br />
<br />
Statistics about the translation of unknown words<br />
-------------------------------------------------------<br />
Number of unknown words which were free rides: 0<br />
Percentage of unknown words that were free rides: 0%<br />
</pre><br />
<br />
== Second revision ==<br />
<br />
<pre><br />
$ perl apertium-eval-translator.pl -test MT.txt -ref postedit.txt<br />
Test file: 'MT.txt'<br />
Reference file 'postedit.txt'<br />
<br />
Statistics about input files<br />
-------------------------------------------------------<br />
Number of words in reference: 915<br />
Number of words in test: 984<br />
Number of unknown words (marked with a star) in test: <br />
Percentage of unknown words: 0.00 %<br />
<br />
Results when removing unknown-word marks (stars)<br />
-------------------------------------------------------<br />
Edit distance: 345<br />
Word error rate (WER): 37.70 %<br />
Number of position-independent correct words: 719<br />
Position-independent word error rate (PER): 28.96 %<br />
<br />
Results when unknown-word marks (stars) are not removed<br />
-------------------------------------------------------<br />
Edit distance: 345<br />
Word Error Rate (WER): 37.70 %<br />
Number of position-independent correct words: 719<br />
Position-independent word error rate (PER): 28.96 %<br />
<br />
Statistics about the translation of unknown words<br />
-------------------------------------------------------<br />
Number of unknown words which were free rides: 0<br />
Percentage of unknown words that were free rides: 0%<br />
</pre></div>Nemo bishttps://wiki.apertium.org/w/index.php?title=User:Nemo_bis/English_and_Italian&diff=47700User:Nemo bis/English and Italian2014-03-21T16:15:11Z<p>Nemo bis: expand</p>
<hr />
<div>I'm interested in [[English and Italian]]. This page is structured as a stub of GSoC application because I thought of applying for it in 2014, but I won't because [http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2014/help_page#10._I_would_like_to_participate_in] ([[user:Nikerabbit|Nikerabbit]] prefers me to be primary/official mentor of a MediaWiki GSoC project, as I had previously promised).<br />
<br />
==Personals==<br />
;Name: Federico Leva<br />
;E-mail address: FirstLastname@tiscali.it<br />
;Other information that may be useful to contact you: Email is a safe bet with me (unless it bounced back to you, of course): if it was clear I had to reply or act on it, then it's in the queue. I enable email notifications for all wikis where it's possible, so you also have a few hundreds talk pages available depending on the topic.<br />
;Why is it you are interested in machine translation? : Because I'm interested in language and i18n/l10n and I'm working on it since 2005 or so. I'm [https://translatewiki.net/wiki/Support/Open_requests translatewiki.net's "pokemaster"] (as [[User:Nikerabbit|Nikerabbit]] called me once) and I'm active as [[MediaZilla:]] triager (in all-time top 10 for some activity metrics) as well as some [http://www.ohloh.net/accounts/nemobis i18n code tweaking]. Plus other Wikimedia stuff you can find by following links from my user page, too much to list. Machine translation has always been a hot topic in Wikimedia.<br />
;Why is it that you are interested in the Apertium project? : I first met it on translatewiki.net around 2009–2010 I guess; my interest was revived when Niklas followed a course on it by Francis Tyers and Tommi Pirinen in 2013 (''[http://laxstrom.name/blag/2013/05/22/on-course-to-machine-translation/ On course to machine translation]''). <br />
;Studies: undergraduate, maths at unimi.it<br />
<br />
==Project==<br />
<br />
[[English and Italian]]!<br />
<br />
* The pair is not released yet, so the GSoC project would actually be two in one: [[Ideas for Google Summer of Code/Adopt a language pair|Adopt a language pair]] + [[Ideas for Google Summer of Code/Make a language pair state-of-the-art|Make a language pair state-of-the-art]].<br />
* Why this pair?<br />
** I want to contribute to Apertium, also because I want to contribute to the Translate extension and to the projects using Translate, and I want to do so in a way that is special, doing something that nobody else is able or interested in doing. Providing Apertium and hence Translate with a translation pair seems to be the best way possible for me.<br />
** I tried asking around and wondering who else in Italy could be interested in developing this Apertium pair, but the university professors I reached out to couldn't think of anyone who could be interested in any kind of work in this area in Italy. There are also very few MediaWiki/Wikimedia developers (zero) and FLOSS developers (including GSoC students: google-opensource.blogspot.it/2013/08/google-summer-of-code-full-of-stats.html]) from Italy. I guess the environment is not favourable, we may never find anyone interested. Despite everything, and until the imminent implosion of our decaying culture, Italian is still an important language and having such a pair would certainly be something to be proud of for Apertium and Google, if we succeeded.<br />
** I have an interest in multilingual Wikimedia projects, for a number of reasons. [[Wiktionary:Meta:Special:LanguageStats/it]] is not good at all; [[Wiktionary:Meta:Meta:Babylon/Translation stats|Italian translation is not in top 20 by activity]] despite Italian language projects being in top 7 by visits: this depresses me. However, I already put too much time into Wikimedia and one of the few rules I manage to respect is that I don't do translations myself unless there is some important text to proofread, otherwise it would be a full time job alone. Instead, I can provide the existing translators with a tool to make their work easier. We can't use Google Translate on Wikimedia projects due to licensing and privacy policy. but we could use Apertium. Our translation memory is fantastic, but helps only to a degree.<br />
** I'm a language geek, but I'm not that good at learning new languages. I know Italian extremely well due to extensive literature reading and intense grammar studying/discussing; and I'm good enough at English thanks to many years of daily written use and a passion for Edward Morgan Forster and Virginia Woolf in original language... but I don't know other languages in a useful way. I can understand Milanese and Venetian for family reasons, but not speak them; and they would not be useful for point (1).<br />
* How is it going to work?<br />
** [[User:Francis Tyers]] indicates as first requirement for a successful pair "Not in Google or can get better quality than in Google". Google seems to be around 20-25 % WER, so the europarl/moses baseline is probably around 25-30 %, so we'd need to reach 30-35 % to be useful and have a reasonable starting point. It doesn't seem impossible.<br />
** Most of the current stub in the incubator comes from the work on Spanish. [[English and Spanish]] is a released pair, it should be possible to reach the same status for Italian and probably reuse some more work done on that pair since 2010 when the last changes to en-it were made.<br />
* What are your qualifications? Why do you think you can manage?<br />
** The following skills are requested by the project description:<br />
*** XML: well, this is "my bread" as we say in Italian, I handled TB of XML by exporting wikis in [https://archive.org/details/wikiteam wikiteam];<br />
***a scripting language (Python, Perl): I don't call myself a coder, but I'm de facto maintainer of [http://code.google.com/p/wikiteam/source/browse/trunk wikiteam scripts] since about 2012 and I've worked with pywikibot since 2006 (especially grammar fixes and other bot replacements; a love story with regex);<br />
*** good knowledge of the language pair adopted: see above; as past experience, I translated PhpBB in 2005 when I was moderator of one of the biggest forums of italianistics.[http://achyra.org/cruscate/profile.php?mode=viewprofile&u=55] <br />
** Dedication: I rarely give up on a project I take up, even though sometimes it takes much longer than expected. For instance I started [[wikt:it:Wikizionario:Importazione dizionari PD|project to import a public domain Italian vocabulary into the Italian Wiktionary]] around 2009. It proved much harder than expected and nobody ever helped me, in practice, but I never gave up. Slowly, it keeps progressing (at bursts). Experience with Apertium should also help me when the time will come to import the transcribed vocabulary into Wiktionary, as I probably won't find a bot owner to whom to delegate this task.<br />
** Coding challenge: ... (''I expect to be able to a variant of both in less than a day of work; will surely try at some point just for fun'') <br />
** Other <br />
<br />
=== Schedule ===<br />
<br />
I'd make one if I actually applied! It would be interesting to know what parts would require how much relative effort, though. Is there a use case/study on how much work was put in the various phases of the en-es pair?<br />
<br />
I don't plan any full time job this summer; I hope to attend a summer course of Finnish in July-August, depending on available time and accommodation; main exam sessions in my department are in June, July and September; if this project takes more time than expected I can always reduce the huge pool of hours committed to Wikimedia/MediaWiki stuff (it would be nice to learn once again how to limit that ;), I manage only when forced by other commitments).</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=User:Nemo_bis/English_and_Italian&diff=47647User:Nemo bis/English and Italian2014-03-21T12:01:39Z<p>Nemo bis: stub</p>
<hr />
<div>I'm interested in [[English and Italian]]. This page is structured as a stub of GSoC application because I thought of applying for it in 2014, but I won't because [http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2014/help_page#10._I_would_like_to_participate_in] ([[user:Nikerabbit|Nikerabbit]] prefers me to be primary/official mentor of a MediaWiki GSoC project, as I had previously promised).<br />
<br />
==Personals==<br />
;Name: Federico Leva<br />
;E-mail address: FirstLastname@tiscali.it<br />
;Other information that may be useful to contact you: Email is a safe bet with me (unless it bounced back to you, of course): if it was clear I had to reply or act on it, then it's in the queue. I enable email notifications for all wikis where it's possible, so you also have a few hundreds talk pages available depending on the topic.<br />
;Why is it you are interested in machine translation? : Because I'm interested in language and i18n/l10n and I'm working on it since 2005 or so. I'm [https://translatewiki.net/wiki/Support/Open_requests translatewiki.net's "pokemaster"] (as [[User:Nikerabbit|Nikerabbit]] called me once) and I'm active as [[MediaZilla:]] triager (in all-time top 10 for some activity metrics) as well as some [http://www.ohloh.net/accounts/nemobis i18n code tweaking]. Plus other Wikimedia stuff you can find by following links from my user page, too much to list. Machine translation has always been a hot topic in Wikimedia.<br />
;Why is it that you are interested in the Apertium project? : I first met it on translatewiki.net around 2009–2010 I guess; my interest was revived when Niklas followed a course on it by Francis Tyers and Tommi Pirinen in 2013 (''[http://laxstrom.name/blag/2013/05/22/on-course-to-machine-translation/ On course to machine translation]''). <br />
;Studies: undergraduate, maths at unimi.it<br />
<br />
==Project==</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=Apertium:Copyrights&diff=47643Apertium:Copyrights2014-03-21T11:13:13Z<p>Nemo bis: better than nothing</p>
<hr />
<div>#REDIRECT [[Using linguistic resources]]</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=User_talk:LA2&diff=47642User talk:LA22014-03-21T11:10:27Z<p>Nemo bis: Created page with "Hi! Nice to see you here, I found you from an edit where you added a category. :D Nice graph and stuff there. --~~~~"</p>
<hr />
<div>Hi! Nice to see you here, I found you from an edit where you added a category. :D Nice graph and stuff there. --[[User:Nemo bis|Nemo]] ([[User talk:Nemo bis|talk]]) 11:10, 21 March 2014 (UTC)</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=Indirect_contribution_guide&diff=47641Indirect contribution guide2014-03-21T10:56:15Z<p>Nemo bis: /* Other */ cat</p>
<hr />
<div>{{TOCD}}<br />
<br />
Many people come to us with a question like "I'm not a programmer/linguist/whatever. Is there any way I can contribute?". This document is intended to show how you can make an "indirect" contribution, by documenting language resources, helping us to build bilingual test sets, translating, promoting, etc.<br />
<br />
===About This Tutorial===<br />
<br />
This tutorial will teach you:<br />
<br />
* How to create contrastive analyses.<br />
* How to catalog resources.<br />
* How to convert dictionaries.<br />
* How to translate.<br />
* How to help "Apertium" in other ways .<br />
<br />
===Basics===<br />
<br />
When in doubt, ask!<br />
<br />
If you are participating as part of a programme such as Google Summer of Code, or Google Code-In, ask your mentor. Otherwise, ask for help on the [[IRC]] channel, or on the [[mailing list]]. Use the talk pages on the wiki -- leave your questions there, so you don't need to remember them later!<br />
<br />
===Create contrastive analyses===<br />
<br />
A 'contrastive analysis' is a set of example sentences which show the differences and similarities between a pair of languages. In a sense, it's a 'feature corpus' which we can use to develop and test rule hypotheses: if we see that the pattern ''noun + adjective'' becomes ''adjective + noun'', then we have a good basis for building a rule. Think of it as 'raw input to a linguist': when we have a good enough idea of what a pair of languages look like, we later use these analyses to build translation rules.<br />
<br />
One thing to note is that, when we see that something happens 9/10 times, or 8/10 times, etc., then we need to expand that exceptional part of the analysis, to get a better idea of what's happening: is it a certain class of words, or just a pure exception?<br />
<br />
====What you must do.====<br />
<br />
Your task is to make a set of test sentences in the first language, and translate them to the other. A translation in a third language may be useful in enlisting help, but is not required.<br />
<br />
A sample sentence in wiki markup looks like this:<br />
<br />
<pre>* {{test|First language abbreviation|First language.|Second language.|Other translation.}}</pre><br />
<br />
Examples:<br />
<br />
<nowiki>* {{test|ru|Чашка большая.|Чашата е голямата.|The cup is big.}}</nowiki><br />
<br />
<nowiki>* {{test|el|Τι γίνεσαι?|как си?|How are you?}}</nowiki><br />
<br />
<nowiki>* {{test|bg|Вера се оглежда в огледалото.|Вера смотрит на себя в зеркало.|Vera is looking at herself in the mirror.}}</nowiki><br />
<br />
The following is a suggested list of features to provide coverage for. All language pairs have slightly different needs, but this list should provide a good general guideline.<br />
<br />
# Simple syntax<br />
#* Copula<br />
#* Reported speech<br />
#* Clitic placement<br />
# Pronouns<br />
#* Personal<br />
#* Demonstrative<br />
#* Relative<br />
#* Possesive<br />
#* Reflexive<br />
#* Interrogative<br />
# Nouns<br />
#* General<br />
#* Indefinite and definite forms<br />
#* 1 Noun phrases<br />
#* Indefinite (a, some)<br />
#* Definite (the)<br />
#* Demonstrative (this, that)<br />
#* Quantified (a few, no, all)<br />
# Numerals<br />
#* Cardinal<br />
#* Ordinal<br />
# Adjective<br />
#* Comparative<br />
#* Superlative<br />
# Adverbs<br />
# Verbs<br />
#* To be<br />
#* General<br />
#* Indicative mood<br />
#** Present tense<br />
#** Imperfect tense<br />
#** Aorist tense<br />
#** Perfect tense<br />
#** Pluperfect tense<br />
#** Future tense<br />
#* Conditional mood<br />
#* Imperative mood<br />
# Questions<br />
<br />
If you want you can also add <br />
# Interjections<br />
# Punctuation marks<br />
<br />
You can find some examples here: [[Bulgarian_and_Russian/Pending_tests|Bulgarian and Russian]], [[Bulgarian_and_Greek/Pending_tests|Bulgarian and Greek]].<br />
<br />
====How to do it?====<br />
<br />
This is quite an easy task if you know both languages. However, if you only know one language well, concentrate on translating '''to''' that language. We can always find help with the other direction later, and it helps to know ''what'' we need to find help with.<br />
<br />
That said, if you know people who can help, ask them to help you!<br />
<br />
Some tips:<br />
<br />
#Ask your friends if they know the language you don't know.<br />
#* If they do, ask them to help you.<br />
#* If you know more than one person who speaks the language, you will finish the job faster and better<br />
#** If your friend is busy, queue your questions and write them later. <br />
#Get a textbook for the language that you don't know.<br />
#* For some languages this is not a good idea.<br />
#** They are hard to find.<br />
#** They are expensive.<br />
#* Look on the Internet for textbooks.<br />
#** There are many textbooks and it is a good idea to look at more than one.<br />
<br />
===Catalogue resources===<br />
<br />
The task is to catalogue as many of the available linguistic resources as possible. Dictionaries, grammars, accademic papers, etc., are all resources that we can use to get a general idea of how the language works, which we can use to get an idea of how to translate from that language.<br />
<br />
;Lingistic resouces.<br />
<br />
* wordlists.<br />
* grammatical descriptions.<br />
* wordlists.<br />
* dictionaries.<br />
* spellcheckers.<br />
* papers.<br />
* corpora.<br />
* and more...<br />
<br />
;How to do it?<br />
* Use Google of course.<br />
* And other search engines, such as:<br />
** ScienceDirect.<br />
** JSTOR.<br />
** Duck Duck Go.<br />
<br />
See also Wikipedia's article about [http://en.wikipedia.org/wiki/Web_search_engine search engines].<br />
<br />
===Translate===<br />
The task is to translate text (articles on this wiki, for example) to another language. You should speak the target language (the language you are translating to) natively or near natively to translate -- if you can't, it's best to leave the translation to someone who does.<br />
<br />
;Some tips. <br />
* If you aren't sure about something, ask a native speaker or, if possible, a specialist.<br />
* You must not change the meaning of what you translating.<br />
** Follow the meaning, not the words.<br />
* You must pay attention to language nuances.<br />
* You must pay attention to the tenses.<br />
* Follow the original style:<br />
** If the text style is wordy, colloquial, funny... follow it.<br />
** Pay attention to the punctuation.<br />
<br />
; Pay attention!<br />
* Don't translate if you don't know the language!<br />
* Don't take a translation task if you know that you can't do it!<br />
* Don't take a translation task if you don't have the time to do it!<br />
<br />
===Post-edit===<br />
<br />
Related to the above: post-edit an Apertium translation, and provide us with the result. Having a set of corrections to refer to can help us to refine the translator. We are particularly interested in Open Content text (text under Free licences, such as the GNU FDL or CC BY/BY-SA), which we can freely redistribute, and translations that are under a similar licence.<br />
<br />
Even better, if you have used a Translation Memory (such as [http://www.omegat.org OmegaT]), providing us with the TMX can help us in a variety of ways.<br />
<br />
===Convert dictionaries===<br />
<br />
There are many dictionaries available under free licences, that we would like to have converted to Apertium's format. However, it's not always as simple as taking words: Apertium (usually) allows only a single translation option per word; there are also some tagging differences that need to be present in Apertium's lexicon, to assist in grammatical operations.<br />
<br />
;Tagging<br />
If the word list doesn't have part-of-speech information, you will need to add it.<br />
<br />
;Gender<br />
If one or both of the languages have grammatical gender (male, female, neuter), the lexicon needs to have information about the gender when it is different for a set of words. It's not strictly necessary in other cases, but it '''is''' useful to have in the dictionary for other reasons, so we encourage you to add grammatical gender, always!<br />
<br />
;Aspect<br />
Similarly, if you are translating verbs in a language with aspect pairs (i.e., Slavic languages), tag the aspect (even though it's not usually strictly necessary between languages with similar concepts of aspect):<br />
*perf = perfective.<br />
*imperf = imperfective.<br />
<br />
;How to do it?<br />
* If you have friends who speak the language, ask for their help.<br />
* Look in a dictionary.<br />
** You can buy a dictionary, they are not very expensive.<br />
** You can search for a free dictionary on the web.<br />
** It's a good idea to do both.<br />
* Double check: don't rely on a single source.<br />
<br />
<!-- A **good** example is required here. Copy and paste one, don't try to invent junk --><br />
<br />
Examples: <br />
<br />
Bulgarian&rarr;Russian<br />
<br />
Say you've found a Bulgarian-Russion dictionary, and it says that the imperfective verb обяснява translates into the imperfective обяснять, while the perfective напише translates into the perfective написать, the converted version of this should look like:<br />
<br />
<pre><br />
<e><p><l>обяснява<s n="vblex"/><s n="imperf"/><s n="tv"/></l><r>обяснять<s n="vblex"/><s n="imperf"/></r></p></e> <br />
<e><p><l>напише<s n="vblex"/><s n="perf"/><s n="tv"/></l><r>написать<s n="vblex"/><s n="perf"/></r></p></e> <br />
</pre><br />
<br />
However, the difficult part is not getting it into this XML format, but getting each pair of verbs, and the important information for each pair (here: aspect), and making sure it is <i>consistent</i> and machine-readable.<br />
<br />
===Other===<br />
<br />
;Other ways to help.<br />
* If you find mistakes in translation or pending test.<br />
** [[Contact|Let us know!]].<br />
** Correct them.<br />
*Ask on IRC or the mailing list (or, if you have one, your mentor) if there is another way to help.<br />
<br />
[[Category:Documentation]]</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=Indirect_contribution_guide&diff=47640Indirect contribution guide2014-03-21T10:53:13Z<p>Nemo bis: /* Post-edit */ parenthesis</p>
<hr />
<div>{{TOCD}}<br />
<br />
Many people come to us with a question like "I'm not a programmer/linguist/whatever. Is there any way I can contribute?". This document is intended to show how you can make an "indirect" contribution, by documenting language resources, helping us to build bilingual test sets, translating, promoting, etc.<br />
<br />
===About This Tutorial===<br />
<br />
This tutorial will teach you:<br />
<br />
* How to create contrastive analyses.<br />
* How to catalog resources.<br />
* How to convert dictionaries.<br />
* How to translate.<br />
* How to help "Apertium" in other ways .<br />
<br />
===Basics===<br />
<br />
When in doubt, ask!<br />
<br />
If you are participating as part of a programme such as Google Summer of Code, or Google Code-In, ask your mentor. Otherwise, ask for help on the [[IRC]] channel, or on the [[mailing list]]. Use the talk pages on the wiki -- leave your questions there, so you don't need to remember them later!<br />
<br />
===Create contrastive analyses===<br />
<br />
A 'contrastive analysis' is a set of example sentences which show the differences and similarities between a pair of languages. In a sense, it's a 'feature corpus' which we can use to develop and test rule hypotheses: if we see that the pattern ''noun + adjective'' becomes ''adjective + noun'', then we have a good basis for building a rule. Think of it as 'raw input to a linguist': when we have a good enough idea of what a pair of languages look like, we later use these analyses to build translation rules.<br />
<br />
One thing to note is that, when we see that something happens 9/10 times, or 8/10 times, etc., then we need to expand that exceptional part of the analysis, to get a better idea of what's happening: is it a certain class of words, or just a pure exception?<br />
<br />
====What you must do.====<br />
<br />
Your task is to make a set of test sentences in the first language, and translate them to the other. A translation in a third language may be useful in enlisting help, but is not required.<br />
<br />
A sample sentence in wiki markup looks like this:<br />
<br />
<pre>* {{test|First language abbreviation|First language.|Second language.|Other translation.}}</pre><br />
<br />
Examples:<br />
<br />
<nowiki>* {{test|ru|Чашка большая.|Чашата е голямата.|The cup is big.}}</nowiki><br />
<br />
<nowiki>* {{test|el|Τι γίνεσαι?|как си?|How are you?}}</nowiki><br />
<br />
<nowiki>* {{test|bg|Вера се оглежда в огледалото.|Вера смотрит на себя в зеркало.|Vera is looking at herself in the mirror.}}</nowiki><br />
<br />
The following is a suggested list of features to provide coverage for. All language pairs have slightly different needs, but this list should provide a good general guideline.<br />
<br />
# Simple syntax<br />
#* Copula<br />
#* Reported speech<br />
#* Clitic placement<br />
# Pronouns<br />
#* Personal<br />
#* Demonstrative<br />
#* Relative<br />
#* Possesive<br />
#* Reflexive<br />
#* Interrogative<br />
# Nouns<br />
#* General<br />
#* Indefinite and definite forms<br />
#* 1 Noun phrases<br />
#* Indefinite (a, some)<br />
#* Definite (the)<br />
#* Demonstrative (this, that)<br />
#* Quantified (a few, no, all)<br />
# Numerals<br />
#* Cardinal<br />
#* Ordinal<br />
# Adjective<br />
#* Comparative<br />
#* Superlative<br />
# Adverbs<br />
# Verbs<br />
#* To be<br />
#* General<br />
#* Indicative mood<br />
#** Present tense<br />
#** Imperfect tense<br />
#** Aorist tense<br />
#** Perfect tense<br />
#** Pluperfect tense<br />
#** Future tense<br />
#* Conditional mood<br />
#* Imperative mood<br />
# Questions<br />
<br />
If you want you can also add <br />
# Interjections<br />
# Punctuation marks<br />
<br />
You can find some examples here: [[Bulgarian_and_Russian/Pending_tests|Bulgarian and Russian]], [[Bulgarian_and_Greek/Pending_tests|Bulgarian and Greek]].<br />
<br />
====How to do it?====<br />
<br />
This is quite an easy task if you know both languages. However, if you only know one language well, concentrate on translating '''to''' that language. We can always find help with the other direction later, and it helps to know ''what'' we need to find help with.<br />
<br />
That said, if you know people who can help, ask them to help you!<br />
<br />
Some tips:<br />
<br />
#Ask your friends if they know the language you don't know.<br />
#* If they do, ask them to help you.<br />
#* If you know more than one person who speaks the language, you will finish the job faster and better<br />
#** If your friend is busy, queue your questions and write them later. <br />
#Get a textbook for the language that you don't know.<br />
#* For some languages this is not a good idea.<br />
#** They are hard to find.<br />
#** They are expensive.<br />
#* Look on the Internet for textbooks.<br />
#** There are many textbooks and it is a good idea to look at more than one.<br />
<br />
===Catalogue resources===<br />
<br />
The task is to catalogue as many of the available linguistic resources as possible. Dictionaries, grammars, accademic papers, etc., are all resources that we can use to get a general idea of how the language works, which we can use to get an idea of how to translate from that language.<br />
<br />
;Lingistic resouces.<br />
<br />
* wordlists.<br />
* grammatical descriptions.<br />
* wordlists.<br />
* dictionaries.<br />
* spellcheckers.<br />
* papers.<br />
* corpora.<br />
* and more...<br />
<br />
;How to do it?<br />
* Use Google of course.<br />
* And other search engines, such as:<br />
** ScienceDirect.<br />
** JSTOR.<br />
** Duck Duck Go.<br />
<br />
See also Wikipedia's article about [http://en.wikipedia.org/wiki/Web_search_engine search engines].<br />
<br />
===Translate===<br />
The task is to translate text (articles on this wiki, for example) to another language. You should speak the target language (the language you are translating to) natively or near natively to translate -- if you can't, it's best to leave the translation to someone who does.<br />
<br />
;Some tips. <br />
* If you aren't sure about something, ask a native speaker or, if possible, a specialist.<br />
* You must not change the meaning of what you translating.<br />
** Follow the meaning, not the words.<br />
* You must pay attention to language nuances.<br />
* You must pay attention to the tenses.<br />
* Follow the original style:<br />
** If the text style is wordy, colloquial, funny... follow it.<br />
** Pay attention to the punctuation.<br />
<br />
; Pay attention!<br />
* Don't translate if you don't know the language!<br />
* Don't take a translation task if you know that you can't do it!<br />
* Don't take a translation task if you don't have the time to do it!<br />
<br />
===Post-edit===<br />
<br />
Related to the above: post-edit an Apertium translation, and provide us with the result. Having a set of corrections to refer to can help us to refine the translator. We are particularly interested in Open Content text (text under Free licences, such as the GNU FDL or CC BY/BY-SA), which we can freely redistribute, and translations that are under a similar licence.<br />
<br />
Even better, if you have used a Translation Memory (such as [http://www.omegat.org OmegaT]), providing us with the TMX can help us in a variety of ways.<br />
<br />
===Convert dictionaries===<br />
<br />
There are many dictionaries available under free licences, that we would like to have converted to Apertium's format. However, it's not always as simple as taking words: Apertium (usually) allows only a single translation option per word; there are also some tagging differences that need to be present in Apertium's lexicon, to assist in grammatical operations.<br />
<br />
;Tagging<br />
If the word list doesn't have part-of-speech information, you will need to add it.<br />
<br />
;Gender<br />
If one or both of the languages have grammatical gender (male, female, neuter), the lexicon needs to have information about the gender when it is different for a set of words. It's not strictly necessary in other cases, but it '''is''' useful to have in the dictionary for other reasons, so we encourage you to add grammatical gender, always!<br />
<br />
;Aspect<br />
Similarly, if you are translating verbs in a language with aspect pairs (i.e., Slavic languages), tag the aspect (even though it's not usually strictly necessary between languages with similar concepts of aspect):<br />
*perf = perfective.<br />
*imperf = imperfective.<br />
<br />
;How to do it?<br />
* If you have friends who speak the language, ask for their help.<br />
* Look in a dictionary.<br />
** You can buy a dictionary, they are not very expensive.<br />
** You can search for a free dictionary on the web.<br />
** It's a good idea to do both.<br />
* Double check: don't rely on a single source.<br />
<br />
<!-- A **good** example is required here. Copy and paste one, don't try to invent junk --><br />
<br />
Examples: <br />
<br />
Bulgarian&rarr;Russian<br />
<br />
Say you've found a Bulgarian-Russion dictionary, and it says that the imperfective verb обяснява translates into the imperfective обяснять, while the perfective напише translates into the perfective написать, the converted version of this should look like:<br />
<br />
<pre><br />
<e><p><l>обяснява<s n="vblex"/><s n="imperf"/><s n="tv"/></l><r>обяснять<s n="vblex"/><s n="imperf"/></r></p></e> <br />
<e><p><l>напише<s n="vblex"/><s n="perf"/><s n="tv"/></l><r>написать<s n="vblex"/><s n="perf"/></r></p></e> <br />
</pre><br />
<br />
However, the difficult part is not getting it into this XML format, but getting each pair of verbs, and the important information for each pair (here: aspect), and making sure it is <i>consistent</i> and machine-readable.<br />
<br />
===Other===<br />
<br />
;Other ways to help.<br />
* If you find mistakes in translation or pending test.<br />
** [[Contact|Let us know!]].<br />
** Correct them.<br />
*Ask on IRC or the mailing list (or, if you have one, your mentor) if there is another way to help.</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=Top_tips_for_GSOC_applications&diff=47639Top tips for GSOC applications2014-03-21T10:51:24Z<p>Nemo bis: /* Other tips */ resort and make smoother</p>
<hr />
<div>{{TOCD}}<br />
<br />
== Writing your GSOC application ==<br />
Here are the main three tips to help you in writing your GSOC application with Apertium.<br />
<br />
* '''Be realistic'''<br />
** We're more likely to accept ideas which are realistic than ones which are "way out there". But if you have a "way out there" idea, don't panic! We're still interested, but we'll try to find a subset of it which is achievable in the time scale available.<br />
<br />
* '''Be appropriate'''<br />
** Demonstrate you have a knowledge of Apertium, how it works and the problem it has that you'd like to solve.<br />
<br />
* '''Have a plan'''<br />
** Three months may seem like a long time, but it isn't. Show you have a definite plan with dates and deliverables, split into weeks is probably best. Don't forget to leave time for getting familiar with the platform &mdash; this should be ideally before, or in the community bonding period &mdash; and for documentation. Anyone thinking of working on a language pair should make sure that they read about [[testvoc]] and other quality controls, and factor those in.<br />
<br />
* '''Get in contact ASAP!'''<br />
** We get a lot of proposals: some good, most bad. Get in contact with your potential mentor '''as soon as possible''' by sending your proposal to the mailing list, and asking for feedback. Be responsive to feedback. Refine your application based on feedback. If the mentors remember you, your chances of being picked are higher. <br />
<br />
* '''Read the Ideas Page!'''<br />
** If you find yourself asking 'do you have any Java/Python/Fortran/x86 assembler projects...' -- you didn't read [[Ideas for Google Summer of Code|the ideas page]]. Read the ideas page.<br />
<br />
==Other tips==<br />
<br />
We're not saying that following the advice below will automatically get you a mentor, but going through it will give you a pretty good chance!<br />
<br />
* First [[installation|install]] Apertium and a language pair; read through the [[:Category:HOWTO|new language pair HOWTO]]. This might even give you some more ideas!<br />
** '''Install from SVN!''' Don't install the released packages (and don't apt-get install apertium). See [[Installation]].<br />
* Join [[Contact|IRC]]: even if you're idling or don't say anything, you'll discover more about how Apertium works.<br />
* [[Special:CreateAccount|Create a username on this wiki]], this way we can work collaboratively on applications. If you don't have permissions to, ask on the IRC channel you just joined and an [[Special:ListAdmins|admin]] will create it for you.<br />
* Have your sourceforge username at hand... if you don't have one, create one. (Run, don't walk!)<br />
* When you think of Apertium, think Wikipedia (Be bold!) or think Nike (Just Do It!). Preferably, both.<br />
* Rule 1 here: Ask questions! Keep asking. The more you ask, the better. <br />
* Rule 2: No questions are stupid. We have all been new to Apertium once, we have all needed to ask questions. Asking them is proof to us that you are serious.<br />
* '''Even better''': Write your questions, and a summary of the answers you get, on this wiki. A good summary shows us that you have understood what we told you.<br />
* Browse the wiki again, especially [[Apertium New Language Pair HOWTO]].<br />
* Update the wiki so the next reader won't encounter the same problems as you did.<br />
* Play with some language pairs, perhaps using [[Apertium-viewer]].<br />
* In a language pair of your own choice, try to edit the files, break stuff, and then make it work again &mdash; and then tell us about it.<br />
* If you think you know the problem better than the mentor does, it could be that you have misunderstood it. Read more about Apertium before making assumptions based on your existing experience.<br />
<br />
==Frequently asked questions==<br />
<br />
; Do I first have to do the coding challenge and only then I get selected?<br />
<br />
The way it works is this: First you need to find a mentor, then you need to write a proposal, then you need to submit the proposal to the Google Mélange site. After this, we read and evaluate the proposals, and we rank them. Then Google tells us how many slots we got, and we take the top ''n'' ranked slots, where ''n'' is the number of slots we got. <br />
<br />
You don't have to do the coding challenge, but it will help you with (a) finding a mentor, and (b) writing your proposal. You are unlikely to be able to write a good proposal without knowing something about Apertium -- which the coding challenge will help you with. And by asking questions, hanging out on IRC, you will get to know the mentors, increasing the chances of finding one who is interested in your proposal.<br />
<br />
==Template==<br />
<br />
<pre><br />
Name:<br />
E-mail address:<br />
Other information that may be useful to contact you:<br />
Why is it you are interested in machine translation? <br />
Why is it that you are interested in the Apertium project? <br />
Which of the published tasks are you interested in? What do you plan to do? <br />
Include a proposal, including <br />
* a title,<br />
* reasons why Google and Apertium should sponsor it,<br />
* a description of how and who it will benefit in society,<br />
* and a detailed work plan (including, if possible, a brief schedule with milestones and deliverables).<br />
Include time needed to think, to program, to document and to disseminate.<br />
List your skills and give evidence of your qualifications. Tell us what is your current field of study, <br />
major, etc. Convince us that you can do the work. In particular we would like to know whether you <br />
have programmed before in open-source projects. <br />
List any non-Summer-of-Code plans you have for the Summer, especially employment, if you are applying for <br />
internships, and class-taking. Be specific about schedules and time commitments. we would like to be sure you have <br />
at least 30 free hours a week to develop for our project. <br />
</pre><br />
<br />
==Work plan==<br />
<br />
* Week 1: <br />
* Week 2:<br />
* Week 3:<br />
* Week 4:<br />
<br />
* '''Deliverable #1'''<br />
<br />
* Week 5: <br />
* Week 6:<br />
* Week 7:<br />
* Week 8:<br />
<br />
* '''Deliverable #2'''<br />
<br />
* Week 9: <br />
* Week 10:<br />
* Week 11:<br />
* Week 12:<br />
<br />
* '''Project completed'''<br />
<br />
[[Category:Google Summer of Code]]</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=User_talk:Francis_Tyers&diff=47637User talk:Francis Tyers2014-03-21T10:13:37Z<p>Nemo bis: /* Wiki configuration settings */ new section</p>
<hr />
<div>{{TOCD}}<br />
Francis, there's an import and export feature of the mediawiki engine, If you tweak it somehow, we may edit the whole dictionary articles here in the wiki and simply export it to Apertium xml format!!! And this will make the whole process unbelievably simpler. We may also utilize the template facilites of the wiki.<br />
<br />
:Hi Francis, thanks for the message. Üorked on the table for Pivə. Will work more later. Great project. Good luck. --[[User:Mehrdad|Mehrdad]] 19:11, 1 September 2007 (BST)<br />
<br />
:Really glad to see this project,and hope can contribute more in the future. I have to admit though that although I am a native speaker of Azerbaijani, I had no formal education in this language so I may be wrong on some cases. Yes I have a MSN account and will send you the id via email. --[[User:Mehrdad|Mehrdad]] 11:24, 4 September 2007 (BST)<br />
<br />
Hi. I don't know about 'awesome' yet - I have a bug or two to work out ("Error: Unsupported transducer type for ''."), some tag reordering to do, and a heck of a lot more pardefs to write (Polish morphology is... extensive :)<br />
<br />
Actually, I'm interested in Polish-{English,Irish,Russian} and Irish-English. I have a lot of spare time :) Yes, I would be very interested in help with SVN, thank you.<br />
<br />
IM... hmm... I have Google Chat, Tlen, and ICQ (if it still works). Any preference?<br />
<br />
I have a few Polish-English wordlists that I've built up over the past few years of learning the language; I just need to gather them, sort them, and add morphological information. I'll certainly have a look at it. Thanks again. -- [[User:Jimregan|Jimregan]] 21:16, 6 October 2007 (BST)<br />
<br />
==English==<br />
<br />
Hi Francis, it was I who made the edits to the Apertium HOWTO. One thing, I changed the spelling to "realized" because that is how it is spelled in international English (US, Canada, etc.). Your spelling is a British variation, a French derivative (not German like the "z" spelling). Same goes for many other words. For instance, internationalization, not internationisation (it is even indicated as a spelling error in this Wiki editor, with red text, and so is "realise").<br />
--[[User:Laseray|Laseray]] 14:33, 17 April 2008 (BST)<br />
<br />
== Breton ==<br />
<br />
* Hi, besides TermOfis, our terminological database, we are building a lexicographical one. I haven't checked it recently but it should be around 60 000 lemmatized forms in it by now. Do you think it could help you ? --[[User:Fulup|Fulup]] 17:31, 9 November 2008 (UTC)<br />
<br />
Yes definitely! Do you know if it includes part-of-speech also? <br />
::no, it does not (I'll show you in December), but Omegawiki have some.<br />
We have been working on extracting information from Jan Deloof's dictionary of Breton--Dutch, which he let us use under the GPL. Do you know of it? - [[User:Francis Tyers|Francis Tyers]] 17:33, 9 November 2008 (UTC)<br />
::: yes, I get it here at home, but I haven't used it (I don't speak dutch). I think it should be ok to use anyway. --[[User:Fulup|Fulup]] 17:40, 9 November 2008 (UTC)<br />
<br />
== Icelandic ==<br />
<br />
Hi Fran. The two "bread" sentences I removed from regression tests was because they have never worked for me. I was going to show how the tests worked so I wanted them to be on the "right" side of the pending/regression tests... which is why I moved them. I've always done svn up and make before testing (learned that the hard way), so I don't understand why they work for you but not for me. --[[User:Martha|Martha]] 16:41, 5 March 2009 (UTC)<br />
<br />
==Basque==<br />
Is there any difference between the main diagrams "How Apertium works" between Apertium and Matxin? If yes, where, if not: What is the difference between Matxin and Apertium (except of character coding, (Matxin only iso) and usage of FreeLing (Matxin))? <br />
<br />
My primary aim is <br />
*1. English-Hungarian and German-Hungarian, -- <s>Apertium</s> Matxin (es-eu, en-eu)<br />
*2. English-German and German-English, -- Apertium or Matxin (es-en, en-es Apertium)<br />
*3. Hungarian-English and Hungarian-German. -- <s>Matxin</s> Apertium (eu-en, eu-es)<br />
<br />
For 1 is Apertium the right tool, for 2 Apertium, for 3 Matxin, right?<br />
<br />
Muki987 12:37, 9 April 2009 (UTC)<br />
<br />
:I would reverse the order.<br />
<br />
:*1. Matxin -- We have 'deep' analysis for English, so we should use it .<br />
:*2. Matxin or Apertium -- Again, 'deep' analysis is available for English. But there are many other tools which do en-de so I would count this as reasonably low priority.<br />
:*3. Apertium -- We have POS tagging and morphological analysis for Hungarian, so we should take advantage of this. But there is no free parser available.<br />
<br />
:Matxin now supports Unicode, I have updated the page. The main difference between Apertium and Matxin is that the latter uses FreeLing to do chunking and dependency parsing and then does re-ordering based on that. Whereas Apertium is restricted to re-ordering fixed length patterns, Matxin has some degree of recursion. We are planning to extend Apertium this year to support recursive re-ordering, and any resources made now will be able to be re-used in the future. A brief breakdown about current resources would be good. e.g. English analysis (Apertium or Freeling), English generation (Apertium), English--Hungarian bilingual lexicon (?), Hungarian analysis (Hunmorph), Hungarian generation (?). - [[User:Francis Tyers|Francis Tyers]] 12:51, 9 April 2009 (UTC)<br />
<br />
:: Yes, now I see, Spanish-Basque is in Matxin. I will start with Matxin.<br />
:: I'll check if analysis of English in Maxtin is good enough for Hungarian.<br />
::I have a quite good English-Hungarian lexicon, I don't think, that causes any problem to transfer it into Apertium xml format, I also know hunspell and the tools behind it quite good. I think that will help at Hungarian generation, and I still hope, I get some support from Hunmorph group in that. <br />
:: Do you know any usable tool de-en, en-de, you consider being at the quality of Apertium?<br />
::You forgot my first question about the main diagram How Apertium works for matix. Such a diagram is very helpful for a beginner. [[User:Muki987|Muki987]] 13:21, 9 April 2009 (UTC)<br />
<br />
:::If provide some example sentences in English I can send you back the results of FreeLing analysis. -- If you don't want to install FreeLing yourself. <br />
:::Regarding the lexicon, if you send it to me I'd be happy to take a look at how difficult it would be to convert.<br />
:::Yes, hunspell is good. <br />
:::Free software tools for English--German, unfortunately not. There are many commercial tools though. <br />
:::Regarding the diagram, it is difficult to express like that, as both Apertium and Matxin are typical "transfer" systems. The easiest way of expressing it is that Matxin works on trees, while Apertium works on chunks. To get an idea of the difference, take a look at these two diagrams: [http://nltk.org/doc/en/tree_images/ch06-tree-1.png chunking] and [http://nltk.org/doc/en/tree_images/ch06-tree-2.png parsing]. Apertium analysis is more similar to the first, while Matxin approaches the second. - [[User:Francis Tyers|Francis Tyers]]<br />
<br />
*You can find my word collections on http://tkltrans.sf.net<br />
*I will install everything on my pc, so I'll generate examples myself.<br />
* I checked prompt, which is at present the best according the test, it is in fact miserable (E-G, G-E). <br />
* I think, the selection of the right word is unsolved, and even more unsolved is the finding and using of expressions like "no space left on", and the like.<br />
[[User:Muki987|Muki987]] 13:21, 10 April 2009 (UTC)<br />
<br />
==Expressions==<br />
<br />
What about expressions? For example "look after one's fences" at present not handled at all:<br />
<br />
* Peter looked after Martha's fences<br />
* Peter miraba después de las vallas de Martha <br />
<br />
The expression will be not at all recognized (Peter handled in the interest of Martha).<br />
<br />
Is there something planned for this? Are there working examples available? 20-30% of our speech are expressions!!!! Muki987 13:38, 10 April 2009 (UTC)<br />
<br />
:We have two methods of handling expressions. The first is with multiword units in our dictionaries. Please try "He took away the rubbish". The second way is with TMX files, probably you know about them, but they contain translation segments. The example you have given would be a multiword unit. Probably "look after" → "cuidar", but I'll ask the maintainer of es-en when she gets back from holiday. We tend to gear our development towards translating "news text", where these kinds of expressions tend to be less frequent. So you'll have to excuse if we don't have full coverage :) - [[User:Francis Tyers|Francis Tyers]] 20:20, 10 April 2009 (UTC)<br />
<br />
*He took away the rubbish -- is this at all an expression??<br />
*Sacó la basura -- word for word the same thing??<br />
<br />
:"Take away" is a phrasal verb which is best translated in Spanish by either "llevar" or "sacar". Of course, it has other meanings, for example "<b>take away</b> meal", "a <b>take away</b>", "two <b>take away</b> three". But the most frequent is probably the one we have. As I mentioned on your talk page, [[lexical selection]] is something we'd like to work on. - [[User:Francis Tyers|Francis Tyers]] 21:30, 10 April 2009 (UTC)<br />
<br />
* He took the minutes -- he wrote the protocol<br />
*Tomó los minutos -- no word of protocole, I think wrong again<br />
<br />
:It would be "apuntar las actas" or "tomar las actas". - [[User:Francis Tyers|Francis Tyers]] 21:30, 10 April 2009 (UTC)<br />
<br />
*He took air - he breathed<br />
*Tomó aire - took air-bad again, should be breathed<br />
<br />
:I'm not sure I would use this in English. I'd say "take a breath". It doesn't sound very natural. - [[User:Francis Tyers|Francis Tyers]] 21:30, 10 April 2009 (UTC)<br />
<br />
I could not find any working examples yet. If you have one, please also explain the English one, my English is not so good. THnaks, [[User:Muki987|Muki987]] 21:22, 10 April 2009 (UTC)<br />
<br />
::I've extracted a list of the phrasal verbs we have and you can find them here: http://www.nopaste.com/p/afrZuyaJ3 - [[User:Francis Tyers|Francis Tyers]] 21:30, 10 April 2009 (UTC)<br />
<br />
:::*They went Dutch that evening - They payed each his bill<br />
:::*Pagaron a escote que anochecer <br />
<br />
::::Again, that expression isn't part of ''my'' lexicon.<br />
<br />
:::*They want to go Dutch that evening- They wanted to pay each his bill<br />
:::*Quieren pagar a escote aquel anochecer - Google says: Want to pay a neckline that evening <br />
<br />
:::I hope that is correct, and only Google is too stupid for this. [[User:Muki987|Muki987]] 21:45, 10 April 2009 (UTC)<br />
<br />
::::And I'm presumably too stupid for not knowing an obscure expression too? ;) - [[User:Francis Tyers|Francis Tyers]] 21:49, 10 April 2009 (UTC)<br />
<br />
*He takes a backseat in this project - he played a subordinate role<br />
*Toma un backseat en este proyecto - no word about subordinate role- bad<br />
<br />
:This is quite colloquial. As I mention above, we target our development towards translating news text, so if you can't find it on a search of <code>site:news.bbc.co.uk</code> in Google, the chances are we don't have it. This is not to say the system doesn't support it, just for our purposes we don't ''yet'' find the reward sufficient for the effort. - [[User:Francis Tyers|Francis Tyers]] 21:33, 10 April 2009 (UTC)<br />
<br />
==Word with multiple meanings==<br />
* He primed the car's petrol tank He filled the gasoline tank<br />
* Él primed la gasolina del coche tanque (probably does not understand the word prime)<br />
<br />
What are you, writing novels? I'd never say this, and Zipf's law would probably agree. The problem with "car's petrol tank" → "la gasolina del coche tanque" is a serious one and we should fix that. In fact, it kind of works if you remove the preceding article.<br />
<br />
::'Prime' as a verb means to prepare a mechanism for work. In reference to a petrol tank, though I have never heard that usage before, I understand it to mean to fill it -- and fill it to the top; in terms of a weapon (a much more common use), it means to arm it. I think it more likely that the phrase was 'primed the engine', which (among other preparations) includes filling it with fuel. 'prime' also means to apply a coat of primer (paint); in either case, 'preparar' is the most acceptable general purpose Spanish translation. -- [[User:Jimregan|Jimregan]] 09:46, 11 April 2009 (UTC)<br />
<br />
<br />
:No, I should like just to be able to get rid of the translation's hard part. [[User:Muki987|Muki987]] 21:51, 10 April 2009 (UTC)<br />
<br />
::The Spanish←→English system is not suitable for post-edition, and probably won't be in the near future... that is unless you are translating a lot of repetitive text and have a large translation memory. We are an open-source project, we have to focus our limited resources on the achievable. - [[User:Francis Tyers|Francis Tyers]] 21:59, 10 April 2009 (UTC)<br />
<br />
<pre><br />
$ echo "car's petrol tank" | apertium -d . en-es<br />
El tanque de gasolina de coche<br />
</pre><br />
<br />
I'll see if I can fix that now. - [[User:Francis Tyers|Francis Tyers]] 21:44, 10 April 2009 (UTC)<br />
<br />
<pre><br />
$ echo "the car's petrol tank" | apertium -d . en-es<br />
El tanque de gasolina de coche<br />
</pre><br />
<br />
:Done. - [[User:Francis Tyers|Francis Tyers]] 21:50, 10 April 2009 (UTC)<br />
<br />
* He woke up at prime time . he woke up very early<br />
* Él woke arriba en tiempo primo - does not understand word woke (past tense of wake)<br />
<br />
"prime time" does not equate to "early" in English. <br />
<br />
<pre><br />
$ echo "He woke up" | apertium -d . en-es<br />
Despertó<br />
</pre><br />
<br />
Almost right, should be "se despertó". - [[User:Francis Tyers|Francis Tyers]] 21:43, 10 April 2009 (UTC)<br />
<br />
So also not a single working example. Any idea?[[User:Muki987|Muki987]] 21:37, 10 April 2009 (UTC)<br />
<br />
:Yes, as I mentioned above, you can give examples of phrases which don't work "until the cows come home", but that isn't what we focus our efforts on. If you want to focus your efforts on that fine... we're primarily interested in dealing with the most ''frequent'' structures first. - [[User:Francis Tyers|Francis Tyers]] 21:43, 10 April 2009 (UTC)<br />
<br />
::You see, I am happy, if one example works. The rest is diligence. I understand, that finding the <b>right word and the right expression</b> is by far the hardest part. Then comes <b>word order change,</b> which as far as I can see, also handled. If one example works, one day all will work. [[User:Muki987|Muki987]] 21:51, 10 April 2009 (UTC)<br />
<br />
:::If you're looking for "set phrases", then [http://www.nopaste.com/p/aHhn7fiLhb here] is a list of some we have collected (personally I think this is a waste of time, but some people like it). If you are looking for phrasal verbs, please see the examples in [http://www.nopaste.com/p/afrZuyaJ3 this list]. If you are looking for "this phrase I heard one time in a film" to work, then possibly you have the wrong project. - [[User:Francis Tyers|Francis Tyers]] 21:57, 10 April 2009 (UTC)<br />
<br />
::::I do not search for anything extravagant or unusual. All I'd like to have is to let the machine make the dirty work at translation. Maybe English is not that pictoresque language as Hungarian and German are, but I can say from own experience, that we (Hungarians) use in more than 10% of our speech expressions, that have a different meaning as a group of words, than simply the words following each other. When I shall be testing, I can show you for sure lot of them. And this is also the case for German. Now I am going to understand, install and test. I hope, all that will make sense. [[User:Muki987|Muki987]] 10:04, 11 April 2009 (UTC)<br />
<br />
:::::When we build translators, we pay a lot of attention to frequency. That is, instead of starting with the low frequency "jewels" of the language, we start with the high frequency "building blocks" (this terminology thanks to Mikel, and you might enjoy [http://www.dlsi.ua.es/~mlf/docum/forcada06p2.pdf this paper]). Probably more than 10% of spoken English is expressions, but we consider it less important to correctly translate these than to correctly translate (for example "article noun" "article adjective noun" type phrases). What you are referring to is non-compositionality (the meaning of two words is not the sum of the meanings of the constituent words &mdash; e.g. "compact disc"), and it is one of the main "open issues" in machine translation. The importance of frequency in building MT systems usually takes a while to sink in &mdash; as it is usually completely the opposite of what linguists and translators think of as important, but most people get it eventually (although if like me they'll waste a good deal of time in the process!). - [[User:Francis Tyers|Francis Tyers]] 10:19, 11 April 2009 (UTC)<br />
<br />
==Expressions==<br />
<br />
1. I found the study of Mikel interesting. He - however- does not handle at all the quality of commercial translation systems. At present the best one is promt, a Russian one, and it produces 60-80% accuracy, which is far away from being usable. Why that?<br />
* Even the words are not completely available for any language pair of the world. That, because one person, the redakteur - is not able to understand and handle all words af a great language. I can give you good examples for this.<br />
::I will always sacrifice completeness to frequency. Le mieux est l'ennemi du bien. (The ''better'' is the enemy of the good) See [http://borel.slu.edu/pub/mt.pdf this paper] as well. - [[User:Francis Tyers|Francis Tyers]] 11:21, 11 April 2009 (UTC)<br />
* The expression coverage is even worse. Hungarian is in my opinion a very coherent language, much less special words and special prononciation than English or German, and still, even for Hungarian I do not know any, even near to complete collection of expressions, even thought the language itself is rich in expressions. I assume, English and German look even worse.<br />
* Statistic approaches, like google's look very promising at the first glance. At the second one they show, that there is no room for improvements in them, and they will remain forever on their 60-90% level because the lack of internal understanding and intelligence.<br />
::Personally I consider the way forward in MT to be a combination of rule-based and statistical approaches. For doing lexical selection for example, statistical approaches have many benefits over rule-based ones. - [[User:Francis Tyers|Francis Tyers]] 11:21, 11 April 2009 (UTC)<br />
<br />
I tried to interpret you sentence: The importance of frequency in building MT systems usually takes a while to sink in.<br />
<br />
Sink in means decrease?<br />
<br />
::Sink in here means "to understand fully" ("to assimilate") - [[User:Francis Tyers|Francis Tyers]] 11:11, 11 April 2009 (UTC)<br />
<br />
You mean here, first an MT system bust be built frequently, and later on less frequently, because quality gets better and better?<br />
<br />
::No, I mean that when building a machine translation system, it is of vital importance to plan the work according to the frequency in the language. For example, it is more important to be able to translate "the" and "a" well than "communitarianism". It is more important to be able to translate simple structures (e.g. basic noun phrases... article noun 'the book', article adjective noun 'the big book') than complex relative clauses. With 1,000 words you can cover around 50% and with 20,000 words you can cover 90% of any English text. For systems that I build from scratch typically I set the "gauge of quality" for a 0.1 release to be "translates reasonably well sentences of 5--7 words". Starting from scratch, this typically takes around 6 months. - [[User:Francis Tyers|Francis Tyers]] 11:11, 11 April 2009 (UTC)<br />
<br />
The expression "take a while" is not in your collection, however, the translation seems to be ok.:<br />
*it takes a while to listen to you<br />
*Toma un rato para escuchar a ti<br />
<br />
::The 'a while' is well translated, 'toma' should probably be 'cuesta', 'escuchar a ti' might be better 'escucharte'. Although the sentence in English doesn't make much sense, do you mean "It takes a while to understand you" ? If so, in Spanish I'd probably say "Cuesta entenderte" (although I'm not a native speaker, so any of my translations ''into'' Spanish are suspect) - [[User:Francis Tyers|Francis Tyers]] 11:11, 11 April 2009 (UTC)<br />
<br />
[[User:Muki987|Muki987]] 10:54, 11 April 2009 (UTC)<br />
==Ispell-aspell==<br />
Matxin docs says:<br />
5.8 Morphological dictionary (eu_morph_gen)<br />
<br />
Basque morphology is complex and owing to its agglutinative character, much<br />
of the standard free software for dealing with morphology (such as ispell <br />
or aspell) is not well adapted for it. <br />
-------------------<br />
The above is not quite correct. The complete situation is as follows:<br />
*<b>Ispell</b> was written by Geoff Kuennings, an Englishman, however, the checking algorithmus was developed by Dömölki Bálint, a Hungarian. <br />
Ispell is, as is, quite well suited for agglutinative languages with its suffix/prefix concept, that is its central part since its beginning.<br />
The only disadvantages are:<br />
**a. the limited number of affixes and <br />
**b. the not existing 2 level suffixes (it has just one level)<br />
However, with that concept as it is, it is possible to write a very well working Hungarian spell checker, and I doubt, that any other language is more sophisticated in agglutination, than Hungarian.<br />
<br />
*<b>Myspell</b> is the development of Kevin ..., Canada, started as Ispell in c++. Had all features of Ispell from the very beginning. Németh László from Hungary added to that<br />
** 2 level affixing/prefixing<br />
** speed up of dictionary read in <br />
** Morphological capabilities<br />
** handling of UTF-8 characterset<br />
and the product was renamed to <b>hunspell</b><br />
<br />
*<b>Aspell</b> is an own story. It was originally focused to word corrections, and was not sufficient at all for agglutinating languages, because it did not support the suffix/prefix concept at all. From version 0.60 however, Kevin Atkinson added Ispell's suffix/prefix concept, that made aspell being as good as ispell for agglutinating languages, and from version 0.60.6 (maybe even earlier, not sure) it also supports 2 level prefixing/affixing, exactly as hunspell does. [[User:Muki987|Muki987]] 19:03, 12 April 2009 (UTC)<br />
<br />
::I'm just the translator here, please feel free to send this commentary to Aingeru for correction. - [[User:Francis Tyers|Francis Tyers]] 19:07, 12 April 2009 (UTC)<br />
<br />
==Hungarian generator works fine==<br />
<br />
The secret is: dictionaries have to be taken from http://magyarispell.sf.net. chmorph and analyze are part of hunspell 1.2.8 (current version). Example on my discussion page, at the end. [[User:Muki987|Muki987]] 20:04, 15 April 2009 (UTC)<br />
<br />
==Phases==<br />
I'd like to see the phases of translation. Here is what I did:<br />
<pre><br />
/tmp/x: Martha's cat and Peter are sweet<br />
<br />
en@anonymous:~/tmp/download/forditas/apertium-en-es-0.6$ apertium -d . en-es-anmor /tmp/x<br />
^Martha/Martha<np><ant><f><sg>$^'s/'s<gen>$ ^cat/cat<n><sg>$ ^and/and<cnjcoo>$ ^Peter/Peter<np><ant><m><sg>$ ^are/be<vbser><pres>$ ^sweet/sweet<adj><sint>/sweet<n><sg>$^./.<sent>$<br />
<br />
en@anonymous:~/tmp/download/forditas/apertium-en-es-0.6$ apertium -d . en-es-tagger /tmp/x<br />
^Martha<np><ant><f><sg>$^'s<gen>$ ^cat<n><sg>$ ^and<cnjcoo>$ ^Peter<np><ant><m><sg>$ ^be<vbser><pres>$ ^sweet<adj><sint>$^.<sent>$<br />
<br />
en@anonymous:~/tmp/download/forditas/apertium-en-es-0.6$ apertium -d . en-es-pretransfer /tmp/x<br />
^Martha<np><ant><f><sg>$^'s<gen>$ ^cat<n><sg>$ ^and<cnjcoo>$ ^Peter<np><ant><m><sg>$ ^be<vbser><pres>$ ^sweet<adj><sint>$^.<sent>$<br />
<br />
en@anonymous:~/tmp/download/forditas/apertium-en-es-0.6$ apertium -d . en-es-chunker /tmp/x<br />
^nom_genitiu_nom<SN><DET><GD><sg>{^el<det><def><3><4>$ ^gato<n><3><4>$ ^de<pr>$ ^Martha<np><ant><f><sg>$}$ ^cnj<cnjcoo>{^y<cnjcoo>$}$ ^nom<SN><UNDET><m><sg>{^Peter<np><ant><3><4>$}$ ^be<Vcop><vbser><pri><PD><ND>{^ser<vbser><3><4><5>$}$ ^adj<SA><mf><ND>{^dulce<adj><2><3>$}$^punt<sent>{^.<sent>$}$<br />
<br />
en@anonymous:~/tmp/download/forditas/apertium-en-es-0.6$ apertium -d . en-es-interchunk /tmp/x<br />
^nom_genitiu_nom<SN><DET><m><sg>{^el<det><def><3><4>$ ^gato<n><3><4>$ ^de<pr>$ ^Martha<np><ant><f><sg>$}$ ^cnj<cnjcoo>{^y<cnjcoo>$}$ ^nom<SN><PDET><m><sg>{^Peter<np><ant><3><4>$}$ ^be<Vcop><vbser><pri><p3><pl>{^ser<vbser><3><4><5>$}$ ^adj<SA><mf><pl>{^dulce<adj><2><3>$}$^punt<sent>{^.<sent>$}$<br />
<br />
en@anonymous:~/tmp/download/forditas/apertium-en-es-0.6$ apertium -d . en-es-postchunk /tmp/x<br />
^el<det><def><m><sg>$ ^gato<n><m><sg>$ ^de<pr>$ ^Martha<np><ant><f><sg>$ ^y<cnjcoo>$ ^Peter<np><ant><m><sg>$ ^ser<vbser><pri><p3><pl>$ ^dulce<adj><mf><pl>$^.<sent>$<br />
<br />
en@anonymous:~/tmp/download/forditas/apertium-en-es-0.6$ apertium -d . en-es-generador /tmp/x<br />
el gato ~de Martha ~y Peter son dulces<br />
</pre><br />
<br />
Is that the correct order?<br />
<br />
:Yes, exactly correct. One is missing though <code>en-es-anmor</code>, which is the first stage. - [[User:Francis Tyers|Francis Tyers]] 09:22, 16 April 2009 (UTC)<br />
<br />
::Thanks, added anmor. [[User:Muki987|Muki987]] 09:28, 16 April 2009 (UTC)<br />
==Comments==<br />
Added some comments to hunmorph speed on my talk page. [[User:Muki987|Muki987]] 20:25, 16 April 2009 (UTC)<br />
PLease see the end of my talk page, thanks [[User:Muki987|Muki987]] 21:01, 22 April 2009 (UTC) Please check again. Thanks [[User:Muki987|Muki987]] 22:44, 22 April 2009 (UTC) x [[User:Muki987|Muki987]] 08:12, 23 April 2009 (UTC)x. [[User:Muki987|Muki987]] 08:48, 23 April 2009 (UTC) x [[User:Muki987|Muki987]] 10:42, 25 April 2009 (UTC) x (Dep analysis, at the end)x Thanks, Ok, I do not report. [[User:Muki987|Muki987]] 21:17, 20 May 2009 (UTC)<br />
===Wiki===<br />
If you happen to know, how to add editing images in upper left part of the editing window, and added it, that would make work with wiki easier. [[User:Muki987|Muki987]] 09:03, 26 May 2009 (UTC)<br />
<br />
page Matxin: I added a part /generation/ , I think it was forgotten last time. I translated it with promt, please check. The pictures are better as center, as thumb it is cumbersome to work with them, also at printout not so good. Thanks. [[User:Muki987|Muki987]] 08:45, 27 May 2009 (UTC)<br />
<br />
==sh files==<br />
Hi,<br />
I put my files and the result onto: <br />
[[Talk:Apertium_New_Language_Pair_HOWTO]]<br />
<br />
Could you please check, where is the error? Thanks, [[User:Muki987|Muki987]] 22:18, 29 April 2009 (UTC)<br />
<br />
<br />
::<nowiki>:)</nowiki>. Please see the comments on your page Muki. [[User:Francis Tyers|Francis Tyers]] 05:47, 30 April 2009 (UTC)<br />
<br />
:: I translated everything, corrected en.dix, error behaviur same as yesterday, wrote now 3 outputs fyi, maybe better to try to find piece after piece . [[User:Muki987|Muki987]] 09:17, 30 April 2009 (UTC)<br />
==Freelang==<br />
Does freelang has something like depsvg to display the output?<br />
does it support conll format (then depsvg could be used). <br />
<br />
The problem with present output is, that it utterly strays the words, so it is hard to find the original sentence in the output.[[User:Muki987|Muki987]] 14:10, 17 June 2009 (UTC)<br />
<br />
== Apertium unstable ==<br />
<br />
Hi Francis, we're getting a lot of outages of Apertium from translatewiki.net. Service appears to have been down for at least the last hour. Siebrand 18:40, 11 June 2010 (UTC)<br />
<br />
== Modifications de la page principale ==<br />
Bonjour et bienvenue à le Wiki d'Apertium, je voudrais te démander, pour quoi est-ce que tu as mis le paire "anglais--arabe" dans le page principal ? Merci beaucoup, - [[User:Francis Tyers|Francis Tyers]] 17:41, 14 June 2011 (UTC)<br />
:Je reviens sur Internet après les congés d'été. Le rajout de la paire anglais - arabe a été rajouté sous IP le 4 mai 2011 et mes modifications de la page principales étaient du 3 juin. Ma première modification provient du fait que la page [[List of language pairs]] est en l'état (inchangé depuis des mois) d'une rare inutilité: le contenu du tableau se limitant à<br />
name ↓ left ↓ right ↓ left monodix ↓ right monodix ↓ bidix ↓ left→right rules ↓ right→left rules ↓<br />
??-?? ?? ?? 0 0 0<br />
:J'avais donc mis le lien sur la page des mainteneurs qui contient au moins des informations intelligibles. L'intervention qui a suivi a consisté à rééquilibrer la hauteur des colonnes du tableau pour éviter que la colonne de droite soit la plus longue. Actuellement, elle l'est de noveau chez moi probablement parcve que ma largeur d'affichage est limité (je n'ai pas sucombé à la mode des écrans hyper panoramiques) et les paires nn-nb et mk-bg sont les premières à s'afficher sur 2 lignes. Peut être vaudrait-il mieux se limiter à un tableau de 3 colonnes. [[User:Bech|Bech]] 17:57, 31 August 2011 (UTC)<br />
<br />
== Réactivité des mainteneurs de paires ==<br />
Après avoir traduit les pages de mon tout dernier site avec apertium, j'ai signalé par mail du 23 mai des mots manquants pour la traduction du français vers l'espagnol (en proposant la traduction) à mlf@ua.es (j'avais vu quelque part qu'avec toi, c'était ou ça avait été l'un des développeurs), puis en juin des mots à rajouter pour la traduction français-esperanto à Hèctor Alòs (contacté part l'interface du wiki).<br />
Dans les 2 cas, aucun retour, pas même un accusé de réception. Et les traducteurs n'ont pas été actualisés. Il me reste des mots à traduire entre l'espagnol et le portugais, mais est il utile dans ce cas de le faire savoir à quelqu'un ? [[User:Bech|Bech]] 17:57, 31 August 2011 (UTC)<br />
<br />
:Vous pouvez ajouter les mots vous-mêmes. Je ne sais pas pourquoi vous n'avez pas reçu un retour, mais peut être ils sont très occupes. Si vous voulez plus d'information, vous pouvez contacter avec notre liste de corriel: apertium-stuff@lists.sourceforge.net - [[User:Francis Tyers|Francis Tyers]] 06:19, 2 September 2011 (UTC)<br />
<br />
::Pour les modification personnelles des fichiers Apertium, j'attendrai un peu. Je commencerai par traduire des pages de ce wiki en français et ce sera une occasion d'étudier le fonctionnement d'Apertium. <br />
<br />
:::Bien <nowiki>:)</nowiki><br />
<br />
::Je pense qu'il faudrait rajouter une catégorie '''English documentation''' (je peux la faire), tout comme il y aurait la catégorie '''Documentation en français''' et peut être d'autres selon les contributeurs. <br />
<br />
:::Je suis totalement d'accord, c'est un très bonne idee. J'ai déjà commencé, [[:Category:Documentation in English]], [[:Category:Documentation en français]]. - [[User:Francis Tyers|Francis Tyers]] 10:02, 3 September 2011 (UTC)<br />
<br />
Pour le mises à jour de fichiers Apertium, je pense que la manière de le faire est plus contraignante que pour les logiciels libres que je suis le seul à faire évoluer. J'ai de toutes façon renoncé provisoirement à installer Apertium à partir de fichiers tar. L'installation m'a l'air plus compliquée que pour un logiciel plus classique. Et le fait que (c'est peut être un choix imposé par Sourceforge) les explications pour des couples de langues comme eo-fr, fr-es, es-pt soient systématiquement et exclusivement en anglais est (encore plus que pour d'autres logiciels) d'une rare stupidité qui a le don de m'énerver. <br />
<br />
:Les explicationes où ? Les fichiers 'INSTALL' ? Tu as raison que c'est ridicule avoir les instructions seulement en anglais, et si tu nous peux expliquer exactement les changements que nous devons faire, j'essayerai de les faire.<br />
<br />
Ce sera certainement plus facile avec des apt-get, mais je n'ai pas encore fait de partition Debian/Ubuntu.<br>Sinon, compte tenu de ton prénom et de ton français sans faute, je me suis demandé si tu étais français et j'ai trouvé la réponse sur ton profil du wikipedia anglophone. Je pense que tu peux rajouter le français dans les langues que tu afirmes parler (au moins un peu). Ou alors Apertium est encore plus performant que je pensais. [[User:Bech|Bech]] 23:36, 2 September 2011 (UTC)<br />
<br />
:Merci beaucoup, je n'utilise pas Apertium pour faire les traductions. Seulement de temps en temps pour traduire un mot que je ne sais pas très bien comment s'écrit. Je crois que peut être je peut mettre français comme "A2", mais seulement de la langue écrite. De parler je suis null ;) - [[User:Francis Tyers|Francis Tyers]] 10:02, 3 September 2011 (UTC)<br />
<br />
name ↓ left ↓ right ↓ left monodix ↓ right monodix ↓ bidix ↓ left→right rules ↓ right→left rules ↓<br />
??-?? ?? ?? 0 0 0<br />
:J'avais donc mis le lien sur la page des mainteneurs qui contient au moins des informations intelligibles. L'intervention qui a suivi a consisté à rééquilibrer la hauteur des colonnes du tableau pour éviter que la colonne de droite soit la plus longue. Actuellement, elle l'est de noveau chez moi probablement parcve que ma largeur d'affichage est limité (je n'ai pas sucombé à la mode des écrans hyper panoramiques) et les paires nn-nb et mk-bg sont les premières à s'afficher sur 2 lignes. Peut être vaudrait-il mieux se limiter à un tableau de 3 colonnes. [[User:Bech|Bech]] 17:57, 31 August 2011 (UTC)<br />
<br />
== Réactivité des mainteneurs de paires ==<br />
Après avoir traduit les pages de mon tout dernier site avec apertium, j'ai signalé par mail du 23 mai des mots manquants pour la traduction du français vers l'espagnol (en proposant la traduction) à mlf@ua.es (j'avais vu quelque part qu'avec toi, c'était ou ça avait été l'un des développeurs), puis en juin des mots à rajouter pour la traduction français-esperanto à Hèctor Alòs (contacté part l'interface du wiki).<br />
Dans les 2 cas, aucun retour, pas même un accusé de réception. Et les traducteurs n'ont pas été actualisés. Il me reste des mots à traduire entre l'espagnol et le portugais, mais est il utile dans ce cas de le faire savoir à quelqu'un ? [[User:Bech|Bech]] 17:57, 31 August 2011 (UTC)<br />
<br />
:Vous pouvez ajouter les mots vous-mêmes. Je ne sais pas pourquoi vous n'avez pas reçu un retour, mais peut être ils sont très occupes. Si vous voulez plus d'information, vous pouvez contacter avec notre liste de corriel: apertium-stuff@lists.sourceforge.net - [[User:Francis Tyers|Francis Tyers]] 06:19, 2 September 2011 (UTC)<br />
<br />
::Pour les modification personnelles des fichiers Apertium, j'attendrai un peu. Je commencerai par traduire des pages de ce wiki en français et ce sera une occasion d'étudier le fonctionnement d'Apertium. <br />
<br />
:::Bien <nowiki>:)</nowiki><br />
<br />
::Je pense qu'il faudrait rajouter une catégorie '''English documentation''' (je peux la faire), tout comme il y aurait la catégorie '''Documentation en français''' et peut être d'autres selon les contributeurs. <br />
<br />
:::Je suis totalement d'accord, c'est un très bonne idee. J'ai déjà commencé, [[:Category:Documentation in English]], [[:Category:Documentation en français]]. - [[User:Francis Tyers|Francis Tyers]] 10:02, 3 September 2011 (UTC)<br />
<br />
Pour le mises à jour de fichiers Apertium, je pense que la manière de le faire est plus contraignante que pour les logiciels libres que je suis le seul à faire évoluer. J'ai de toutes façon renoncé provisoirement à installer Apertium à partir de fichiers tar. L'installation m'a l'air plus compliquée que pour un logiciel plus classique. Et le fait que (c'est peut être un choix imposé par Sourceforge) les explications pour des couples de langues comme eo-fr, fr-es, es-pt soient systématiquement et exclusivement en anglais est (encore plus que pour d'autres logiciels) d'une rare stupidité qui a le don de m'énerver. <br />
<br />
:Les explicationes où ? Les fichiers 'INSTALL' ? Tu as raison que c'est ridicule d'avoir les instructions seulement en anglais, et si tu nous peux expliquer exactement les changements que nous devons faire, j'essayerai de les faire.<br />
<br />
Ce sera certainement plus facile avec des apt-get, mais je n'ai pas encore fait de partition Debian/Ubuntu.<br>Sinon, compte tenu de ton prénom et de ton français sans faute, je me suis demandé si tu étais français et j'ai trouvé la réponse sur ton profil du wikipedia anglophone. Je pense que tu peux rajouter le français dans les langues que tu afirmes parler (au moins un peu). Ou alors Apertium est encore plus performant que je pensais. [[User:Bech|Bech]] 23:36, 2 September 2011 (UTC)<br />
<br />
:Merci beaucoup, je n'utilise pas Apertium pour faire les traductions. Seulement de temps en temps pour traduire un mot dont je ne sais pas très bien comment il s'écrit. Je crois que peut être je peux mettre français comme "A2", mais seulement de la langue écrite. Pour parler je suis null ;) - [[User:Francis Tyers|Francis Tyers]] 10:02, 3 September 2011 (UTC)<br />
<br />
== Administrateur ==<br />
<br />
Maintenant tu es aussi administrateur si tu veux effacer de pages avec du spam. - [[User:Francis Tyers|Francis Tyers]] 22:32, 11 September 2011 (UTC)<br />
:J'ai vu ça dès lundi matin. Il y a eu beaucoup de nominations. Merci de ta confiance. Le jour où j'ai créé mon compte et ma page de présentation, j'ai été surpris de la rapidité avec laquelle tu as corrigé une faute de genre dans le texte espagnol. Mais à voir ce que font beaucoup de nouveaux utilisateurs avec leur compte, ça explique ta réactivité. Mieux vaut effectivement être plusieurs pour pouvoir faire ça sans y passer 365 jours/an. Je ne sais pas si les problèmes sont aussi fréquents dans d'autres encyclopédies. [[User:Bech|Bech]] 23:59, 14 September 2011 (UTC)<br />
<br />
== <nowiki>{{TOCD}}</nowiki> ==<br />
<br />
Sur certaines pages, par exemple [[Apertium on Mac OS X (System)]], ça donne des résultats peu esthétiques lorsque l'affichage manque de pixels en largeur. Il se peut aussi qu'avec de vieux navigateurs, la table des matières chevauche des lignes de texte préformatées. Pour la traduction en français, je préfère ne pas utiliser ce template de mise en page. [[User:Bech|Bech]] 00:00, 15 September 2011 (UTC)<br />
<br />
== Liste des paires et des dictionnaires ==<br />
<br />
Voilà, les pages de listes sont prêtes ou à peu près :<br />
<br />
* [[List of language pairs]]<br />
* [[Liste des paires de langues]] (un peu pour le fun de traduire avec <code>sed</code>)<br />
* [[List of dictionaries]]<br />
<br />
Il manque :<br />
<br />
* à donner le nom de la langue correspondante à une liste de 28 codes et à clarifier le sens d'un 29ème. Je vais envoyer un mail sur la liste pour ça.<br />
* à corriger si nécessaire le texte d'introduction des pages en anglais (tu peux apporter des corrections directement).<br />
<br />
Sinon, je pense qu'il y a ce qu'il faut dans les listes de paires de langues pour que la page reste un document "grand public" qui pourra notamment servir de référence sur Wikipedia. Les aspects techniques sont dans la liste de dictionnaires. Là, surtout pour les dictionnaires bilingues, je ne savais pas trop quelles balises compter. Peut être des améliorations à faire à ce niveau et pour les titres de colonnes.<br />
<br />
Ensuite, on pourra discuter de l'utilisation de mon script pour générer les pages automatiquement. Si le script doit fonctionner sur un ordinateur qui dispose localement de la totalité des paires de langues d'Apertium (avec les dates de mise à jour des fichiers identiques à celles de SVN), il y a moyen de simplifier et d'accélérer beaucoup le fonctionnement du script.<br />
<br />
Actuellement la durée de traitement sans copie locale des paires :<br />
* 30 à 60 minutes la première fois ou après effacement des fichiers mémorisant les résultats des exécutions précédentes du script,<br />
* une dizaine de secondes si rien n'a été modifié depuis la dernière exécution (un seul appel de svn list),<br />
* rajouter une dizaine de secondes par branche modifiée et quelques secondes par paire modifiée dans la branche.<br />
<br />
Comme tu restes l'un des principaux intervenants sur ce wiki, je te demande ton avis directement.<br />
<br />
Sinon, la plupart des pages que tu as rédigé sont faciles à comprendre (peut être as tu pris la précaution de formuler les phrases à la manière de celles des langues romanes). Je préfère mettre un ? dans mes traductions et ne pas te contacter trop souvent chaque fois que je tombe sur une difficulté. Mais pour l'une des pages, autant moi que unhammer , nous avons eu le même problème. Voir [[Talk:Corpus test]] pour améliorer un morceau de paragraphe. [[User:Bech|Bech]] 16:03, 13 January 2012 (UTC)<br />
<br />
==GSOC==<br />
<br />
Bonjour! Est-ce que tu peut traduire le 'brochure' (on peut traduire "flyer" comme ça?) de GSOC a français ? J'ai les directions ici: [[Google Summer of Code/Flyer translations]]. Normalment, nous essayons traduire le brochure à toutes les langues que nous pouvons pour démontrer que la communauté d'Apertium c'est grande! :) Merci beaucoup, - [[User:Francis Tyers|Francis Tyers]] 15:12, 7 February 2012 (UTC)<br />
:D'accord. Ce n'est pas un texte très long. Ça peut se faire éventuellement cet après-midi, ou sinon dans le week-end. J'aurai eu ton message la veille, j'aurai même pu mettre à contribution un québécois que j'ai accueilli en bewelcoming. Il m'a donné de bonnes idées le soir de son arrivée pour la traduction de certains mots de la page [[Chunking]] (dont le titre), mais ces 2 derniers soir, on a préféré utiliser mon ordinateur pour d'autres choses.<br><br />
:J'ai noté aussi quelques pages à mettre à jour suite à la première release de 2012.<br><br />
:Pour le terme "flyer", dans la mesure où le document tient sur une feuille, il vaut mieux dire ''plaquette''. Un'''e''' ''brochure'' est un document de plusieurs pages reliées.<br><br />
:Un problème d'interprétation des intention de l'auteur que j'ai déjà eu pour la page [[Using linguistic resources]] est la meilleure traduction de ''open source'' sachant qu'un logiciel''à code ouvert'' n'est pas forcément libre, alors qu'un logiciel libre est obligatoirement à code ouvert. Dans la page [[Utilisation de ressources linguistiques]], j'avais mis : ''de logiciels open-source et de logiciels libres''. En fait, je voie 5 possibilités de "traduction" :<br />
:* open source (on ne traduit pas)<br />
:* à code ouvert<br />
:* libre<br />
:* libre ou open source<br />
:* libre ou à code ouvert<br />
:As tu une préférence par rapport à l'esprit du texte, et notamment par rapport à ce que tu pourrais savoir des buts (pour Google) du "Summer of Code" ? Et est-ce que je traduit "Summer of Code" ?<br><br />
:Une possibilité pourrait être aussi de parler de ''logiciel libre'' dans le titre mais plutôt de ''code ouvert'' dans le texte. [[User:Bech|Bech]] 14:36, 10 February 2012 (UTC)<br />
<br />
==Liste des langues==<br />
<br />
Pour quoi il n'y a pas le paire "udmurt-rosse" ? Il reste dans le nursery. - [[User:Francis Tyers|Francis Tyers]] 10:48, 16 February 2012 (UTC)<br />
<br />
:Une faute de frappe dans une expression régulière.<br />
dirpair="apertium.[a-zl[a-z][a-z]?(_[A-Z][A-Z])?-[a-z][a-z][a-z]?(_[A-Z][A-Z])?/"<br />
^<br />
:au lieu de<br />
dirpair="apertium.[a-z][a-z][a-z]?(_[A-Z][A-Z])?-[a-z][a-z][a-z]?(_[A-Z][A-Z])?/"<br />
:qui empêchait de détecter les langues dont le premier code est sur 3 caractères.<br />
:corrigé dans le shell mais pas le temps de refaire une mise à jour ce soir (je suis rentré de vacances la nuit dernière et j'ai d'autres travaux en retard). Sinon, j'ai pensé réécrire le shell en plus moderne (Cshell + beaucoup moins de fichiers pour mémoriser les résultats d'un analyse) mais il serait bon de mettre ça sur l'ordinateur qui gère le wiki, avec un cron journalier. Comme Jimmy O'Regan (je crois) avait parlé de rebaptiser .t1x les fichiers de transfert de apertium-es-an , je n'avais rien corrigé et à chaque mise à jour le sens de traduction est indéfini. [[User:Bech|Bech]] 19:58, 6 March 2012 (UTC)<br />
<br />
== [[Apertium ile yeni bir dil çeviri sistemi yapmak]] ==<br />
<br />
A page with this name was in the category Documentation in English with (since 11th of december), with the exact content of [[Apertium New Language Pair HOWTO]]. You created this page. One day after a first person [[user talk:Omerfaruk04|Omerfaruk04]] wrote in it in Turkich 3 lines about what this page will speak. One week later [[user talk:Yatezcan|Yatezcan]] just put the english content of [[Apertium New Language Pair HOWTO]] and never changed anything on it. I just removed from it the Documentation in English category today.<br />
<br />
Translators losts ? If it is the case, it would be better to make of this page a redirection to the English version, or to remove it. [[User:Bech|Bech]] 13:58, 24 March 2012 (UTC)<br />
<br />
== [[Cookbook]] ==<br />
<br />
You deleted this page on the 15th of april. Was this Apertium page obsolete or is it a mistake when deletin other non Apertium pages ? [[User:Bech|Bech]] 21:24, 9 May 2012 (UTC)<br />
<br />
:It was ancient, and not accurate anymore. - [[User:Francis Tyers|Francis Tyers]] 05:52, 10 May 2012 (UTC)<br />
<br />
== Antispam agaçant ==<br />
<br />
I got these messages when inserting 2 new pages :<br />
<br />
Page automatiquement protégée pour cause de spam<br />
<br />
La page que vous avez voulu publier a été bloquée par le filtre anti-pourriel. Ceci est probablement causé par un lien sur liste noire pointant vers un site externe.<br />
<br />
La chaîne de caractères « iz<b></b>ing » a déclenché le détecteur de pourriel.<br />
<br />
Revenir à la page Formation d'un tagger de langue cible.<br />
<br />
<br />
Page automatiquement protégée pour cause de spam<br />
<br />
La page que vous avez voulu publier a été bloquée par le filtre anti-pourriel. Ceci est probablement causé par un lien sur liste noire pointant vers un site externe.<br />
<br />
La chaîne de caractères « iz<b></b>ation » a déclenché le détecteur de pourriel.<br />
<br />
Revenir à la page Créer un tagger en mode automatique.<br />
<br />
With some html like <pre><b></b></pre> it works, but for several users of the wiki (administrators) this filter should be taken off.<br />
<br />
I forgot to sign. But after woatching these pages, as the string is inside <nowiki><pre></nowiki>, what I added can be seen on the text. I don't work more on that now. [[User:Bech|Bech]] 00:47, 11 November 2012 (UTC)<br />
:La solution plus facile c'est reemplacer '-iz-' avec '-is-'. Je seulement voix -iźation/-iźing dans les pages du pourriel. Les utilisateurs du Wiki d'Apertium normalment utilisent -isation/-ising. - [[User:Francis Tyers|Francis Tyers]] 00:50, 11 November 2012 (UTC)<br />
::Je voie que je ne suis pas le seul à travailler sur ordinateur à cette heure-ci. :-)) Simplement, les pages anglaises du wiki mettaient ''iz''. N'est-ce pas le texte généré par l'outil ? [[User:Bech|Bech]] 00:58, 11 November 2012 (UTC)<br />
<br />
==Terminologie==<br />
<br />
Peut-être meilleur: <br />
<br />
* tagueur -> étiqueteur (ca. etiquetador)<br />
* formation -> entraînement (ca. entrenament)<br />
<br />
- [[User:Francis Tyers|Francis Tyers]] 14:31, 11 November 2012 (UTC)<br />
<br />
Pourquoi pas ! <br />
<br />
Certains titres de page ont été créés alors que j'avais un aperçu moins global d'Apertium qu'aujourd'hui.<br />
<br />
Pour "entraînement", j'ai pris l'habitude de lire "training/train a tagger" sur la liste de discussion alors qu'il y a 6 mois ou plus, cette expression me semblait étrange.<br />
<br />
Pour "étiqueteur" pourquoi pas également, mais ça supposera de retoucher pas mal de pages où je parle de "balises", "baliser" pour "tag" et tagging".<br />
<br />
[[User:Bech|Bech]] 15:14, 11 November 2012 (UTC)<br />
<br />
== Exemples de règles de transfert ==<br />
<br />
You seem not to have worked on translation. For the 2 next days, I have time to work on it. [[User:Bech|Bech]] 09:09, 26 March 2013 (UTC)<br />
<br />
:It's ok, I will translate it, but so far the people I have sent it to speak French :) - [[User:Francis Tyers|Francis Tyers]] 09:10, 26 March 2013 (UTC)<br />
<br />
::I wrote that because I have nothing other to do before 17 PM today, so I started the translation as I did not see a new English page started. [[User:Bech|Bech]] 10:01, 26 March 2013 (UTC)<br />
<br />
== Wiki configuration settings ==<br />
<br />
Hello! As you surely know, this wiki isn't particularly fast. It's our (MediaWiki's) fault, the configuration documentation is horrible; I'm working on it, at [[mediawikiwiki:Manual:Performance tuning]]. What you may not know is that the wiki appears to be completely unusable if one sets an interface language other than English (or at least, I unsuccessfully waited for a page to load for several minutes); this is a strong hint that we're hitting the database to fetch l10n, I suspect (though I'm not a dev). I propose two things:<br />
* first, set [[mediawikiwiki:Manual:$wgCacheDirectory|$wgCacheDirectory]], for instance <code>$wgCacheDirectory = $IP/cache</code>, and then run <code>php maintenance/rebuildLocalisationCache.php</code>;<br />
* second, if you want, for me to quickly try and give a patch that you can easily apply, email me your LocalSettings.php (without DB passwords and the like, of course :) ) and, as information on the server, at least the output of <code>php -r 'phpinfo();' | grep apc</code>.<br />
--[[User:Nemo bis|Nemo]] ([[User talk:Nemo bis|talk]]) 10:13, 21 March 2014 (UTC)</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=MediaWiki_talk:Sidebar&diff=47630MediaWiki talk:Sidebar2014-03-21T10:02:08Z<p>Nemo bis: Created page with "A sysop should remove the community portal line, unless there is some discussion venue on this wiki which it can be pointed to. I redirected help to Documentation. --~~~~"</p>
<hr />
<div>A sysop should remove the community portal line, unless there is some discussion venue on this wiki which it can be pointed to. I redirected help to [[Documentation]]. --[[User:Nemo bis|Nemo bis]] ([[User talk:Nemo bis|talk]]) 10:02, 21 March 2014 (UTC)</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=Help:Contents&diff=47628Help:Contents2014-03-21T09:58:32Z<p>Nemo bis: Redirected page to Documentation</p>
<hr />
<div>#REDIRECT [[Documentation]]</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=Apertium:Current_events&diff=47627Apertium:Current events2014-03-21T09:58:09Z<p>Nemo bis: Category:Promotion HQ</p>
<hr />
<div>* 26 November 2012: Google Code In has started with the participation of Apertium. The contest ends 14 January 2013.<br />
<br />
* 19 May 2010: [[User:Jacob Nordfalk|Jacob]] will hold the Danish talk again 'Open Source regelbaseret maskinoversættelse med Apertium' in [http://www.cbs.dk/ Copenhagen Business School] - [http://www.cbs.dk/forskning_viden/institutter_centre/institutter/inf Institut for Informatik] see http://www.communitybuilder.dk/opensourcelab/ <br />
<br />
* 29 March 2010: [[User:Jacob Nordfalk|Jacob]] will hold a talk in Esperanto 'Maŝintraduko - kiel ĝi funksciias, kion ĝi kapablas. Kun speciala trakto de la sistemo Apertium' see http://dejo.dk/apertium/masintraduko/view?set_language=eo<br />
<br />
* 26 March 2010: [[User:Jacob Nordfalk|Jacob]] will hold a talk in Danish 'Open Source regelbaseret maskinoversættelse med Apertium' in the Department of Computer Science at the University of Copenhagen [http://diku.dk/ DIKU]. Time: 14:15 at DIKU, room A+B on the mezzanine (ved hovedindgangen som kendetegnes ved Adam og Eva-statuerne).<br />
<br />
* 25 March 2010: [[User:Gramirez|Gema]] will hold a talk at the Universitat Politècnica de València (Jornada Connecta'l al Valencià) at 12:30 called "The Apertium free/open-source MT platform: resources for Valencian".<br />
<br />
* 12 May 2009 : Apertium-br-fr has been released.<br />
* 1 August 2008 : Apertium-cy-en has been released.<br />
* 17 July 2007: Apertium-es-it has been created. Italian appears for the first time in Apertium with 30 000 lemmatas.<br />
* 09 July 2007: <code>apertium-es-gl</code> and <code>apertium-es-ca</code> are now GPL!<br />
* 29 June 2007: The Apertium server at xixona.dlsi.ua.es is being reinstalled<br />
* 08 June 2007: <code>apertium-unicode</code> module is now in SVN!<br />
* 19 May 2007: Apertium Wiki is opened.<br />
<br />
[[Category:Promotion HQ]]</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=User:Nemo_bis&diff=47626User:Nemo bis2014-03-21T09:56:52Z<p>Nemo bis: old</p>
<hr />
<div>[[mediawikiwiki:User:Nemo_bis|Nemo]].<br />
<br />
[[/English and Italian/]].</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=User:Nemo_bis&diff=47625User:Nemo bis2014-03-21T09:47:28Z<p>Nemo bis: actually, more fun</p>
<hr />
<div>[[mw:User:Nemo_bis|Nemo]].<br />
<br />
[[/English and Italian/]].</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code/Make_a_language_pair_state-of-the-art&diff=47471Ideas for Google Summer of Code/Make a language pair state-of-the-art2014-03-20T12:13:17Z<p>Nemo bis: +FAQ from a chat with spectie yesterday about en-it</p>
<hr />
<div>{{TOCD}}<br />
<br />
Take a released language pair, and drastically improve the performance both in terms of coverage, and in terms of translation quality. This will involve working with dictionaries, transfer rules, scripting, corpora. The objective is to make an Apertium language pair state-of-the-art, or close to state-of-the-art in terms of translation quality. This will involve improving coverage to 95-98% on a range of corpora and decreasing word error rate by 30-50%. For example if the current word error rate is 30%, then it should be reduced to 15-20%. <br />
<br />
==Coding challenge==<br />
<br />
* Find a language pair of your choice.<br />
* Translate 2,000 words of text (e.g. four articles of 500 words).<br />
* Postedit the text to make a reference translation.<br />
* Use two articles to improve the translator. <br />
** Add all the words, and cover all the structures with transfer rules.<br />
* [[Evaluation]]: calculate the improvement that you were able to make on these two articles, and on your two held out articles.<br />
<br />
==Frequently asked questions==<br />
<br />
; What if my pair is composed of two popular languages, for instance two official languages of EU? : Then this task will be hard. Pairs which have huge corpora of parallel texts, like the 24 official EU languages or the 3 EU working languages, tend to have very good statistical machine translation software already. Your work will be valuable, and accepted, only if you manage to reach a comparable quality. For instance, if europarl+moses gets a [[WER]] of 15-20%, we'd be happy with 25%.<br />
<br />
; What happens if I don't reach the expected results? : It's no big deal! GSoC is a scholarship, not a service contract. If you don't deliver what agreed/expected, you'll fail the [http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2014/help_page#9._How_do_evaluations_work final or midterm evaluation] and [http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2014/help_page#1._How_do_payments_work lose the consequent stipend installment(s)]. At least you tried! And, hopefully, learnt a lot in the process.<br />
<br />
More: ''[[contact|ask us]] something!'' :)<br />
<br />
==See also==<br />
<br />
* An example work plan for a language pair: [[Maltese_and_Arabic/Work_plan]]<br />
<br />
[[Category:Ideas for Google Summer of Code|Make a language pair state-of-the-art]]</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=English_and_Italian/Google_Translate&diff=47466English and Italian/Google Translate2014-03-20T11:00:21Z<p>Nemo bis: /* First revision */ common errors before I forget</p>
<hr />
<div>A basic evaluation of the Google Translate translation from English to Italian was made on 2014-03-20 according to [[Evaluation]] instructions and apertium-eval-translator.pl from latest trunk. We found a '''21.63 % WER'''.<br />
<br />
== Method ==<br />
<br />
As a base we used about 1000 words of an English leaflet (originally translated from German, which accounts for some peculiarities) by Wikimedia and Creative Commons (which accounts for some specialized terminology): [https://meta.wikimedia.org/w/index.php?title=Free_knowledge_based_on_Creative_Commons_licenses%2Fit&diff=7896398&oldid=7879177 Google Translate], [https://meta.wikimedia.org/w/index.php?title=Free_knowledge_based_on_Creative_Commons_licenses/it&diff=7896990&oldid=7896398 manual corrections].<br />
<br />
Considerations:<br />
* only agrammatical passages and turns of grammatical meaning were corrected,<br />
* as well as some inconsistencies in translation and major lexical errors which didn't convey the original meaning at all;<br />
* but errors which would not be evident without knowing the source were left alone, as well as lexical choices which are disputable but not outright wrong,<br />
* and the text wasn't made as fluent as it would be required to completely cover the machine translation origin.<br />
<br />
The second result was calculated after removing the whitespace incorrectly added around punctuation; the difference is very significant, confirming our choice not to correct such whitespace errors to avoid excess noise in the evaluation.<br />
<br />
== First revision ==<br />
Common errors found:<br />
*missing concordance of singular/plural and male/female between noun and adjective/pronoun;<br />
*articles, especially definite article vs. no article;<br />
*co-ordinated sentences and pronouns (those... who and the like).<br />
<br />
<pre><br />
$ perl apertium-eval-translator.pl -test MT.txt -ref postedit.txt<br />
Test file: 'MT.txt'<br />
Reference file 'postedit.txt'<br />
<br />
Statistics about input files<br />
-------------------------------------------------------<br />
Number of words in reference: 994<br />
Number of words in test: 984<br />
Number of unknown words (marked with a star) in test: <br />
Percentage of unknown words: 0.00 %<br />
<br />
Results when removing unknown-word marks (stars)<br />
-------------------------------------------------------<br />
Edit distance: 215<br />
Word error rate (WER): 21.63 %<br />
Number of position-independent correct words: 862<br />
Position-independent word error rate (PER): 13.28 %<br />
<br />
Results when unknown-word marks (stars) are not removed<br />
-------------------------------------------------------<br />
Edit distance: 215<br />
Word Error Rate (WER): 21.63 %<br />
Number of position-independent correct words: 862<br />
Position-independent word error rate (PER): 13.28 %<br />
<br />
Statistics about the translation of unknown words<br />
-------------------------------------------------------<br />
Number of unknown words which were free rides: 0<br />
Percentage of unknown words that were free rides: 0%<br />
</pre><br />
<br />
== Second revision ==<br />
<br />
<pre><br />
$ perl apertium-eval-translator.pl -test MT.txt -ref postedit.txt<br />
Test file: 'MT.txt'<br />
Reference file 'postedit.txt'<br />
<br />
Statistics about input files<br />
-------------------------------------------------------<br />
Number of words in reference: 915<br />
Number of words in test: 984<br />
Number of unknown words (marked with a star) in test: <br />
Percentage of unknown words: 0.00 %<br />
<br />
Results when removing unknown-word marks (stars)<br />
-------------------------------------------------------<br />
Edit distance: 345<br />
Word error rate (WER): 37.70 %<br />
Number of position-independent correct words: 719<br />
Position-independent word error rate (PER): 28.96 %<br />
<br />
Results when unknown-word marks (stars) are not removed<br />
-------------------------------------------------------<br />
Edit distance: 345<br />
Word Error Rate (WER): 37.70 %<br />
Number of position-independent correct words: 719<br />
Position-independent word error rate (PER): 28.96 %<br />
<br />
Statistics about the translation of unknown words<br />
-------------------------------------------------------<br />
Number of unknown words which were free rides: 0<br />
Percentage of unknown words that were free rides: 0%<br />
</pre></div>Nemo bishttps://wiki.apertium.org/w/index.php?title=English_and_Italian/Google_Translate&diff=47405English and Italian/Google Translate2014-03-20T00:24:38Z<p>Nemo bis: new evaluation</p>
<hr />
<div>A basic evaluation of the Google Translate translation from English to Italian was made on 2014-03-20 according to [[Evaluation]] instructions and apertium-eval-translator.pl from latest trunk. We found a '''21.63 % WER'''.<br />
<br />
== Method ==<br />
<br />
As a base we used about 1000 words of an English leaflet (originally translated from German, which accounts for some peculiarities) by Wikimedia and Creative Commons (which accounts for some specialized terminology): [https://meta.wikimedia.org/w/index.php?title=Free_knowledge_based_on_Creative_Commons_licenses%2Fit&diff=7896398&oldid=7879177 Google Translate], [https://meta.wikimedia.org/w/index.php?title=Free_knowledge_based_on_Creative_Commons_licenses/it&diff=7896990&oldid=7896398 manual corrections].<br />
<br />
Considerations:<br />
* only agrammatical passages and turns of grammatical meaning were corrected,<br />
* as well as some inconsistencies in translation and major lexical errors which didn't convey the original meaning at all;<br />
* but errors which would not be evident without knowing the source were left alone, as well as lexical choices which are disputable but not outright wrong,<br />
* and the text wasn't made as fluent as it would be required to completely cover the machine translation origin.<br />
<br />
The second result was calculated after removing the whitespace incorrectly added around punctuation; the difference is very significant, confirming our choice not to correct such whitespace errors to avoid excess noise in the evaluation.<br />
<br />
== First revision ==<br />
<br />
<pre><br />
$ perl apertium-eval-translator.pl -test MT.txt -ref postedit.txt<br />
Test file: 'MT.txt'<br />
Reference file 'postedit.txt'<br />
<br />
Statistics about input files<br />
-------------------------------------------------------<br />
Number of words in reference: 994<br />
Number of words in test: 984<br />
Number of unknown words (marked with a star) in test: <br />
Percentage of unknown words: 0.00 %<br />
<br />
Results when removing unknown-word marks (stars)<br />
-------------------------------------------------------<br />
Edit distance: 215<br />
Word error rate (WER): 21.63 %<br />
Number of position-independent correct words: 862<br />
Position-independent word error rate (PER): 13.28 %<br />
<br />
Results when unknown-word marks (stars) are not removed<br />
-------------------------------------------------------<br />
Edit distance: 215<br />
Word Error Rate (WER): 21.63 %<br />
Number of position-independent correct words: 862<br />
Position-independent word error rate (PER): 13.28 %<br />
<br />
Statistics about the translation of unknown words<br />
-------------------------------------------------------<br />
Number of unknown words which were free rides: 0<br />
Percentage of unknown words that were free rides: 0%<br />
</pre><br />
<br />
== Second revision ==<br />
<br />
<pre><br />
$ perl apertium-eval-translator.pl -test MT.txt -ref postedit.txt<br />
Test file: 'MT.txt'<br />
Reference file 'postedit.txt'<br />
<br />
Statistics about input files<br />
-------------------------------------------------------<br />
Number of words in reference: 915<br />
Number of words in test: 984<br />
Number of unknown words (marked with a star) in test: <br />
Percentage of unknown words: 0.00 %<br />
<br />
Results when removing unknown-word marks (stars)<br />
-------------------------------------------------------<br />
Edit distance: 345<br />
Word error rate (WER): 37.70 %<br />
Number of position-independent correct words: 719<br />
Position-independent word error rate (PER): 28.96 %<br />
<br />
Results when unknown-word marks (stars) are not removed<br />
-------------------------------------------------------<br />
Edit distance: 345<br />
Word Error Rate (WER): 37.70 %<br />
Number of position-independent correct words: 719<br />
Position-independent word error rate (PER): 28.96 %<br />
<br />
Statistics about the translation of unknown words<br />
-------------------------------------------------------<br />
Number of unknown words which were free rides: 0<br />
Percentage of unknown words that were free rides: 0%<br />
</pre></div>Nemo bishttps://wiki.apertium.org/w/index.php?title=User_talk:Nikerabbit&diff=47404User talk:Nikerabbit2014-03-20T00:10:58Z<p>Nemo bis: Created page with "==Finnish and Italian== <nowiki>:D</nowiki> ~~~~"</p>
<hr />
<div>==[[Finnish and Italian]]==<br />
<nowiki>:D</nowiki><br />
[[User:Nemo bis|Nemo bis]] ([[User talk:Nemo bis|talk]]) 00:10, 20 March 2014 (UTC)</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=Finnish_and_Italian&diff=47403Finnish and Italian2014-03-20T00:10:08Z<p>Nemo bis: Created page with "This pair is only a dream as of 2014. If you share this dream and/or are willing to work on it, let User:Nikerabbit| and Nemo know! [[Category:Finnish_a..."</p>
<hr />
<div>This pair is only a dream as of 2014. If you share this dream and/or are willing to work on it, let [[User:Nikerabbit|Nikerabbit]]<br />
and [[User:Nemo_bis|Nemo]] know!<br />
<br />
[[Category:Finnish_and_Italian]]</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=User:Nikerabbit&diff=47402User:Nikerabbit2014-03-20T00:09:59Z<p>Nemo bis: link</p>
<hr />
<div>[[mediawikiwiki:User:Nikerabbit|Nikerabbit]].</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=Category:Finnish_and_Italian&diff=47401Category:Finnish and Italian2014-03-20T00:08:03Z<p>Nemo bis: Created page with "Category:Language pairs"</p>
<hr />
<div>[[Category:Language pairs]]</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=Category:English_and_Italian&diff=47400Category:English and Italian2014-03-20T00:07:13Z<p>Nemo bis: Created page with "Category:Language pairs"</p>
<hr />
<div>[[Category:Language pairs]]</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=English_and_Italian&diff=47399English and Italian2014-03-20T00:06:37Z<p>Nemo bis: information from svn log and spectie</p>
<hr />
<div>This pair doesn't exist yet, but it's in the incubator since 2009: https://svn.code.sf.net/p/apertium/svn/incubator/apertium-en-it/<br />
<br />
It's considered a hard one: not only the two languages are not very closely related, but the SMT/Moses+Europarl baseline is expected to be rather good and difficult to reach in quality.<br />
<br />
== Subpages ==<br />
<br />
{{Special:PrefixIndex/English and Italian/}}<br />
<br />
[[Category:English and Italian]]</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=Apertium-en-it&diff=47398Apertium-en-it2014-03-20T00:01:59Z<p>Nemo bis: Redirected page to English and Italian</p>
<hr />
<div>#REDIRECT[[English and Italian]]</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=Evaluating_with_Wikipedia&diff=47397Evaluating with Wikipedia2014-03-19T23:54:32Z<p>Nemo bis: MetaWikipedia:Machine translation, where this document is reproduced</p>
<hr />
<div>One of the ways of improving your MT system, and at the same time improve and add content in Wikipedias is to use Wikipedias as a test bed. You can translate text from one Wikipedia to another, then either post-edit yourself, or wait for, or ask other people to post-edit the text. One of the nice things is that MediaWiki (the software Wikipedia is based on) allows you to view diffs between the versions (see the 'history' tab).<br />
<br />
This strategy is beneficial both to Wikipedia and to Apertium. Wikipedia gets new articles in languages which might not otherwise have them, and Apertium gets information on how we can improve the software. It is important to note that Wikipedia is a community effort, and that rightly people can be concerned about machine translation. To get an idea of this, put yourself in the place of people having to fix a lot of "hit and run" SYSTRAN translations, with little time and not much patience.<br />
<br />
==Guidelines==<br />
<br />
*Don't just start translating texts and waiting for people to fix them. The first thing you should do, is create an account on the Wikipedia, and then find the "Community notice board". Ask there how regular contributors would feel about you using the Wikipedia for tests. The community notice board should be linked from the front page. It might be called something like "La tavèrna" in Occitan, or "Geselshoekie" in Afrikaans. When you are asking them, make the following clear:<br />
<br />
:* This is free software / open source machine translation.<br />
:* You would like to help the community and are doing these translations both to help their Wikipedia expand the range of articles, and to improve the translation software.<br />
:* The translations will be added only with the consent of the community, you do not intend to flood them with poorly translated articles.<br />
:* The translations will be added by a '''human''' not by a bot.<br />
:* Ask them if there are any subjects that they prefer you would cover, perhaps they have a page of "requested translations".<br />
:* One way of looking at it might be as a non-native speaker of the language trying to learn the language. Point out that the initial translation will be done by machine, then you will try and fix the translation, but anything that you don't fix you would be grateful for other people to fix.<br />
<br />
An example of the kind of conversation you might have is found [http://af.wikipedia.org/wiki/Wikipedia:Geselshoekie/MT here].<br />
<br />
==How to translate==<br />
<br />
In order to be more useful, when you create the page, first paste in the uneditted machine translation output. Save the page with an ''edit summary'' saying that you're still working on it. Then proceed to post-edit the output. After you've finished, save the page again. If you go to the history tab at the top of the page and do "Compare selected versions" you will see the differences (diff) between the machine translation and the post-editted output. This gives a good indication of how good the original Apertium output was.<br />
<br />
==Current collaborations==<br />
<br />
If you´d like to know more about contributing to Wikipedia with Apertium, you can ask people below:<br />
<br />
* [[User:Francis Tyers]] is working with the [http://af.wikipedia.org Afrikaans Wikipedia]<br />
* [[User:Carmentano]] is working with the [http://oc.wikipedia.org Occitan Wikipedia]<br />
* [[User:Gnuphilly]] is working with the [http://fr.wikipedia.org French Wikipedia]<br />
* [[User:Trondtr]] and [[User:Unhammer]] are working with the [http://nn.wikipedia.org Nynorsk Wikipedia]<br />
<br />
== See also ==<br />
<br />
* [[MetaWikipedia:Machine translation]], where this document is reproduced<br />
<br />
[[Category:Evaluation]]<br />
[[Category:Documentation in English]]</div>Nemo bishttps://wiki.apertium.org/w/index.php?title=User:Nemo_bis&diff=47396User:Nemo bis2014-03-19T23:50:49Z<p>Nemo bis: Created page with "Nemo."</p>
<hr />
<div>[[MetaWikipedia:User:Nemo_bis|Nemo]].</div>Nemo bis