Search results

Jump to navigation Jump to search

Page title matches

Task ideas for Google Code-in/Syntax tree printing in bison
...of a sentence using GNU Bison. The output could be text, or a a <code>dot</code> file using GraphViz. * [https://svn.code.sf.net/p/apertium/svn/branches/transfer4 An example grammar]

375 bytes (59 words) - 16:11, 14 November 2013
Working with Apertium in the Google Code-in
Here are some top-tips for working with Apertium in the [[Google Code-in]]: ...GitHub] account. We use [[git]] and GitHub for collaboratively developing code.

518 bytes (88 words) - 08:11, 16 September 2018
Task ideas for Google Code-in/Language detection in simple-html and apertium-apy
The language detection in the [[simple-html]] interface currently uses a 2.9M javascript file. The ob ===Implement language detection in apertium-apy===

1 KB (205 words) - 20:49, 13 November 2013
Ideas for Google Summer of Code/Regular expressions in lt-tmxproc
...numbers, by inserting the special symbol <n> in place of the number in the transducer; at runtime, when this symbol is encountered, numbers are co ...project is to extend lt-tmxproc to include the regular expressions support in lttoolbox.

2 KB (332 words) - 19:55, 24 March 2020
Google Code-in/Application 2018
;Why does your organisation want to participate in Google Code-in 2018? ...that focuses a lot on marginalised languages. GCI gives us a chance to get in touch with the next generation of speakers, and to show them how they can h

3 KB (443 words) - 11:20, 11 September 2018
Task ideas for Google Code-in/Add words
...c.) detect the 50 most frequent unknown words (source words which are not in the dictionaries of the language pair). [[Category:Tasks for Google Code-in|Add words]]

2 KB (271 words) - 05:34, 17 December 2015
Ideas for Google Summer of Code/Use preferences in pair
...e variation is possible and useful, systematising it, and then enabling it in the language pair by turning hard restrictions into ambiguity and selectors * initial documentation of possible preferences in a pair of your choice (which doesn't already have preferences enabled)

2 KB (239 words) - 09:33, 4 March 2024
Task ideas for Google Code-in (2013)
...s page for Google Code-in 2013 (http://www.google-melange.com/gci/homepage/google/gci2013), here you can find ideas on interesting tasks that will improve yo '''For current GCI task ideas, see [[Task ideas for Google Code-in]]'''

68 KB (10,323 words) - 15:37, 25 October 2014
Ideas for Google Summer of Code/Improvements in lexical-selection module
...ptimisations to the lexical selection module. The lexical selection module in Apertium is currently a prototype. There are many optimisations that could * Do proper processing of tags in all scripts.

1 KB (186 words) - 18:06, 22 March 2013
Task ideas for Google Code-in/Mentor guidelines
...o are able to mentor Apertium-related tasks are eligible to be Google Code-In mentors for Apertium. This can include: ...ive community, but if you don't have any experience with Apertium tools or code, please don't bother asking to be a mentor after it's already been announce

2 KB (345 words) - 13:19, 14 November 2019
Task ideas for Google Code-in/Categorise words from frequency list
! Part-of-speech !! Code | Noun || <code>n</code>

3 KB (286 words) - 22:00, 8 December 2019
Ideas for Google Summer of Code/Cyclical paths in .dix format
At the moment it is not possible to define cyclical paths in [[lttoolbox]]'s XML-based transducer format. The idea of this project is to .../code> element]] in any analysis, meaning there can be no <code>#</code>'s in the actual cycles.

1 KB (158 words) - 07:23, 2 September 2014
Task ideas for Google Code-in/Morphologically disambiguating text
In this page we describe how to morphologically disambiguate (tag) text so tha Example of a tagger error in the English tagger for the sentence "Where do you come from?":

2 KB (228 words) - 12:16, 26 September 2016
Task ideas for Google Code-in/Add words to monolingual dictionary
...most frequent unknown words''' (words in the source document which are not in the dictionary). See below for information about how to do this. Note: th ...nolingual dictionary''' (the appropriate <code>.dix</code> or <code>.lexc</code> file) so that they are not unknown anymore. Make sure to categorise stems

2 KB (299 words) - 19:44, 30 December 2019
Task ideas for Google Code-in/Lemmatise words from frequency list
...requency. The lemma of a word is it's "base form" (the form you might find in a dictionary) ...t. Work from top to bottom. After each asterisk '<code><nowiki>*</nowiki></code>' you should replace the surface form with the lemma.

2 KB (207 words) - 16:21, 14 November 2013
Ideas for Google Summer of Code/Flag diacritics in lttoolbox
Flag diacritics are a method used in the [[HFST]] tools to allow the writer of a transducer to exclude impossibl Some work on [[Flag diacritics]] has already been made in [[lttoolbox-java]].

1 KB (176 words) - 06:40, 20 October 2014
Task ideas for Google Code-in/Unigram tagger
[[Category:Tasks for Google Code-in|Unigram tagger]]

255 bytes (43 words) - 14:56, 26 October 2014
Ideas for Google Code-In (2011)
...as page for Google Code-in 2011(http://www.google-melange.com/gci/homepage/google/gci2011); here you can find ideas on interesting tasks that will improve yo <b>For current GCI ideas, see [[Ideas for Google Code-in]]</b>

187 KB (21,006 words) - 22:14, 12 November 2012
Completing tasks for Google Code-in
This article will explain the basic process for completing a [[Google Code-in]] task. First, go to the [https://codein.withgoogle.com/tasks/ Google Code-in tasks page]. Next, open the "Organizations" drop-down and click the box nex

444 bytes (73 words) - 02:31, 18 December 2019
Task ideas for Google Code-in/Add words from frequency list
...tion task]]); and its part-of-speech (see the [[Task ideas for Google Code-in/Categorise words from frequency list|categorisation task]]). The next step ...rent depending on the dictionary format and the language in question. When in doubt, ask your mentor for help.

3 KB (519 words) - 19:05, 7 November 2016
Ideas for Google Summer of Code/Improve integration of lttoolbox in libvoikko
* Writing a method for <code>liblttoolbox</code> which would allow analysis of a string as opposed to a file stream. [[Category:Ideas for Google Summer of Code|Improve integration of lttoolbox in libvoikko]]

932 bytes (130 words) - 14:22, 29 February 2012
Task ideas for Google Code-in/Apy pipedebug
This task is almost done, see -r59428 and -r57945 in apy SVN. * _e in mode names turns into underlined e in the dropdown, should just be _e

4 KB (652 words) - 12:52, 26 March 2015
Google Code-in/Application 2013
(Second draft in by [[User:Francis Tyers|Francis Tyers]] 15:34, 28 October 2013 (UTC)) ...ation engine and auxiliary tools is being developed around the world, both in universities and companies (e.g. Prompsit Language Engineering) and by a gr

6 KB (1,057 words) - 15:34, 28 October 2013
Task ideas for Google Code-in/Getting started
...you can take to get involved with the Apertium project in the Google Code-in. First of all, thanks for reading! We're very enthusiastic about getting ne ...o hang out on IRC, even if no-one is talking when you enter. People can be in different time zones, and channel activity peaks depending on the time.

7 KB (1,091 words) - 19:54, 12 April 2021
Task ideas for Google Code-in/Intersection of ATT format transducers
The objective of these tasks is to write code to intersect two finite-state transducers. One transducer is a [[morphologi ...the set of strings in the morphological dictionary which have translations in the bilingual dictionary.

5 KB (798 words) - 14:01, 17 March 2020
Google Code-in
See our '''[[Task ideas for Google Code-in/Getting started|Getting started guide]]''' if you're a current GCI student! See our '''[[Task ideas for Google Code-in/Mentor guidelines|Mentor guidelines]]''' if you're an Apertium GCI mentor o

3 KB (412 words) - 22:18, 24 December 2019
Google Code-in/Application 2016
;Why does your organisation want to participate in Google Code-in 2016? ...that focuses a lot on marginalised languages. GCI gives us a chance to get in touch with the next generation of speakers, and to show them how they can h

3 KB (516 words) - 21:03, 29 October 2016
Task ideas for Google Code-in (2012)
...s page for Google Code-In 2012 (http://www.google-melange.com/gci/homepage/google/gci2012), here you can find ideas on interesting tasks that will improve yo '''For current GCI task ideas, see [[Task ideas for Google Code-in]]'''

14 KB (2,007 words) - 03:06, 27 October 2013
Task ideas for Google Code-in/Documentation of resources
* Sites with scholarly articles: Google Scholar, jstor, academia.edu, etc. [[Category:Tasks_for_Google_Code-in|Documentation of resources]]

1 KB (202 words) - 19:55, 12 April 2021
Task ideas for Google Code-in/Add nouns from frequency list
#REDIRECT [[Task ideas for Google Code-in/Add words from frequency list]]

73 bytes (11 words) - 20:20, 13 November 2013
Task ideas for Google Code-in/Manually disambiguate text
Words can have more than one possible interpretation, for example, "tie" in English can be a noun denoting an item of clothing "she put on her tie" or ...That is, for each ambiguous word you choose the appropriate interpretation in context.

3 KB (574 words) - 16:30, 11 January 2020
Google Code-in/Application 2015
...North Sámi–Norwegian Bokmål and Kazakh–Tatar among others), and many more in development. ;Why would you organisation like to participate in Google Code-in 2015?*

7 KB (1,111 words) - 10:10, 15 November 2015
Task ideas for Google Code-in/Setup constraint grammar for a pair
...errors (the translation is not adequate because the part-of-speech tagger in Apertium has selected the wrong morphological analysis for a word that had ...nd write 5 constraint grammar rules that select the desired part of speech in the relevant context(s);

1 KB (193 words) - 14:27, 29 October 2013
Task ideas for Google Code-in/Evaluation of translation of an existing pair
You must evaluate each sentence in three ways: # Fluency (0-5): How well-formed is the target sentence in the target language.

478 bytes (73 words) - 19:56, 12 April 2021
Task ideas for Google Code-in/Extracting paradigm sketches from dictionaries
The objective of this task is to take (or make) a dictionary in text format and extract the ''paradigm sketches'' from it. By this we mean ...them into word categories. The suffixes and categories should be described in the dictionary.

2 KB (151 words) - 23:20, 14 November 2013
Task ideas for Google Code-in/Grow bilingual
...m but no reasonable bilingual dictionary (these language pairs are usually in the incubator), for instance apertium-spa-pol ...most frequent unknown words''' (words in the source document which are not in the bilingual dictionaries of the language pair). See below for informatio

2 KB (320 words) - 15:01, 19 January 2020
Task ideas for Google Code-in/Comment XML
And a list in a file (filename given as the first argument to the script) like ...ement. Also, note how :yaa<n> does not comment out the line that has "yaa" in its <l> element.

3 KB (576 words) - 12:57, 2 January 2016
Task ideas for Google Code-in/Add constraint-grammar rules
...rors''' (the translation is not adequate because the part-of-speech tagger in Apertium has selected the wrong morphological analysis for a word that had ...rite 10 constraint grammar rules''' that select the desired part of speech in the relevant context(s);

1 KB (156 words) - 02:19, 21 October 2018
Task ideas for Google Code-in/Print transducer with n cycles
<spectre> URL: https://svn.code.sf.net/p/apertium/svn/trunk/lttoolbox/lttoolbox/lt_print.cc <spectre> URL: https://svn.code.sf.net/p/apertium/svn/trunk/lttoolbox/lttoolbox/transducer.cc

2 KB (247 words) - 19:56, 12 April 2021
Task ideas for Google Code-in/Hand-correct spelling errors
[[Category:Tasks for Google Code-in|Hand-correct spelling errors]]

95 bytes (10 words) - 19:30, 16 November 2013
Task ideas for Google Code-in/Fix using LanguageTool
...you know the the target language, and the target language has good support in LanguageTool (Catalan is one that has support from both Apertium and Langua ...words, you might need to add a multiword so that they translate correctly in that context

1 KB (186 words) - 09:43, 4 November 2014
Task ideas for Google Code-in
...ideas page for [https://developers.google.com/open-source/gci/ Google Code-in], here you can find ideas on interesting tasks that will improve your knowl The people column lists people who you should get in contact with to request further information. All tasks are 2 hours maximum

32 KB (4,862 words) - 06:23, 5 December 2019
Task ideas for Google Code-in/Setup and add lexical selection
...d, and write 5 lexical selection rules that select the correct translation in the relevant context. [[Category:Tasks for Google Code-in|Setup and add lexical selection]]

1 KB (165 words) - 14:19, 29 October 2013
Task ideas for Google Code-in/Russian
http://google-melange.appspot.com/gci/homepage/google/gci2011 ==Что это такое, Google Code-In ?==

56 KB (2,087 words) - 19:57, 12 April 2021
Task ideas for Google Code-in/Check output of word aligner
[[Category:Tasks for Google Code-in|Check output of word aligner]]

96 bytes (12 words) - 20:40, 17 November 2013
Google Code-in/Application 2010
...on engine and auxiliary tools is being developed around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also i ...Occitan, Breton—French, and Basque—Spanish among others), and several more in development.

3 KB (424 words) - 19:24, 29 October 2010
Google Code-in/Application 2017
; Why does your organisation want to participate in Google Code-in 2017? ...that focuses a lot on marginalised languages. GCI gives us a chance to get in touch with the next generation of speakers, and to show them how they can h

2 KB (421 words) - 15:37, 10 October 2017
Task ideas for Google Code-in/Add lexical-select rules
...orrect translation in the relevant context. You'll want to write 10 rules in all. [[Category:Tasks for Google Code-in|Add lexical-selection rules]]

1 KB (199 words) - 21:39, 15 December 2019
Task ideas for Google Code-in/Scrape inflection information from Wiktionary
The equivalent in [[speling format]] would be: Where <code>n.f</code> means "noun, feminine" (this information will also typically be on the Wik

2 KB (214 words) - 12:10, 26 May 2023
Ideas for Google Summer of Code/Apertium in chat clients
* Xchat: write a plugin reads in text from the highlighted channel and, if it contains three asterisks (***) [[Category:Ideas for Google Summer of Code|Apertium in chat clients]]

457 bytes (65 words) - 21:45, 10 March 2014
Ideas for Google Code-in
#REDIRECT [[Task ideas for Google Code-in]]

43 bytes (6 words) - 16:35, 19 October 2010
Retrospective: Google Code-In 2017
Google Code-In 2017 was certainly an overall success for Apertium. Students completed upwa ...an be improved so that mentors and students have an even better experience in the future.

10 KB (1,668 words) - 02:46, 10 February 2018
Task ideas for Google Code-in/Add transfer rule
...ing (local agreement, gender, number, etc. is inadequate, local word order in a phrase is inadequate, there is a word too much or a word missing, etc.). ...text in L₂ through the pair and find a consistent error in the output text in L₁ that isn't grammatical.

1 KB (208 words) - 21:39, 15 December 2019
Task ideas for Google Code-in/Tokenisation for spaceless orthographies
...tokenise sentences in South and East Asian languages into words. Sentences in these languages are usually not written with spaces to show word boundaries ...l the possible ways of splitting up the sentence into words that are found in the dictionary:

3 KB (394 words) - 01:37, 17 June 2023
Google Code-in/Application 2014
...North Sámi--Norwegian Bokmål and Kazakh-Tatar among others), and many more in development. ;Why would you organisation like to participate in Google Code-in 2014?*

6 KB (987 words) - 10:21, 7 November 2014

Page text matches

Ideas for Google Code-In (2011)
...as page for Google Code-in 2011(http://www.google-melange.com/gci/homepage/google/gci2011); here you can find ideas on interesting tasks that will improve yo <b>For current GCI ideas, see [[Ideas for Google Code-in]]</b>

187 KB (21,006 words) - 22:14, 12 November 2012
Task ideas for Google Code-in (2013)
...s page for Google Code-in 2013 (http://www.google-melange.com/gci/homepage/google/gci2013), here you can find ideas on interesting tasks that will improve yo '''For current GCI task ideas, see [[Task ideas for Google Code-in]]'''

68 KB (10,323 words) - 15:37, 25 October 2014
Task ideas for Google Code-in
...ideas page for [https://developers.google.com/open-source/gci/ Google Code-in], here you can find ideas on interesting tasks that will improve your knowl The people column lists people who you should get in contact with to request further information. All tasks are 2 hours maximum

32 KB (4,862 words) - 06:23, 5 December 2019
Task ideas for Google Code-in (2012)
...s page for Google Code-In 2012 (http://www.google-melange.com/gci/homepage/google/gci2012), here you can find ideas on interesting tasks that will improve yo '''For current GCI task ideas, see [[Task ideas for Google Code-in]]'''

14 KB (2,007 words) - 03:06, 27 October 2013
Google Summer of Code/Application 2019
...system. Finally, most of them are not available for most of the languages in the world, as they rely heavily on resources that are not available for the == Why does your org want to participate in Google Summer of Code? ==

8 KB (1,230 words) - 06:02, 5 February 2019
Google Summer of Code/Application 2020
...system. Finally, most of them are not available for most of the languages in the world, as they rely heavily on resources that are available for only a == Why does your org want to participate in Google Summer of Code? ==

8 KB (1,248 words) - 15:51, 17 February 2021
Google Code-in/Application 2015
...North Sámi–Norwegian Bokmål and Kazakh–Tatar among others), and many more in development. ;Why would you organisation like to participate in Google Code-in 2015?*

7 KB (1,111 words) - 10:10, 15 November 2015
Google Code-in/Application 2013
(Second draft in by [[User:Francis Tyers|Francis Tyers]] 15:34, 28 October 2013 (UTC)) ...ation engine and auxiliary tools is being developed around the world, both in universities and companies (e.g. Prompsit Language Engineering) and by a gr

6 KB (1,057 words) - 15:34, 28 October 2013
Google Code-in/Application 2014
...North Sámi--Norwegian Bokmål and Kazakh-Tatar among others), and many more in development. ;Why would you organisation like to participate in Google Code-in 2014?*

6 KB (987 words) - 10:21, 7 November 2014
Begiak's git plugin
plugin are in [https://github.com/goavki/phenny/blob/master/modules/git.py modules/git.py receiving updates about commits from sites like GitHub and Bitbucket. In

8 KB (1,370 words) - 21:21, 22 November 2018
Hectoralos/GSOC 2019 proposal: Catalan-Italian and Catalan-Portuguese
== Why is it you are interested in machine translation? == ...ciolinguist working on language maintenance and shift. I'm very interested in creating resources for minoritised languages.

16 KB (2,285 words) - 06:46, 12 April 2019
Google Summer of Code/Application 2016
..., we might still be interested if we can turn it into something achievable in 3 months. ...ourself familiar with testvoc and other quality controls, and factor those in. If you know of any breaks or absences beforehand, mention them and plan ar

10 KB (1,500 words) - 16:23, 18 February 2016
Google Summer of Code/Application 2015
;Google+ URL ...the downdown above, please summarise your involvement in Google Summer of Code and the successes and challenges of your participation. Please also list yo

8 KB (1,240 words) - 12:03, 20 February 2015
Google Summer of Code/Application 2021
...system. Finally, most of them are not available for most of the languages in the world, as they rely heavily on resources that are available for only a === Why does your org want to participate in Google Summer of Code? ===

10 KB (1,480 words) - 07:00, 23 February 2021
Google Summer of Code/Application 2009
...e Summer of Code. The ideas page can be found [[Ideas for Google Summer of Code|here]]. ...g language data, translation engine and auxiliary tools is being developed in several universities and companies around the world, with the principal par

10 KB (1,543 words) - 19:50, 12 April 2021
Google Code-in/Application 2010
...on engine and auxiliary tools is being developed around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also i ...Occitan, Breton—French, and Basque—Spanish among others), and several more in development.

3 KB (424 words) - 19:24, 29 October 2010
Google Summer of Code/Application 2012
...on engine and auxiliary tools is being developed around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but indepe ...ton—French, and Basque—Spanish among others), and several more in development.

11 KB (1,680 words) - 12:22, 20 June 2019
Press
...st-google-summer.html The Apertium Project's First Google Summer of Code], Google Open Source Blog ...urce A website that hopes to speak the language of freely available data], in The Guardian.

13 KB (1,689 words) - 21:42, 28 February 2021
Google Summer of Code/Application 2014
;Why is your organisation applying to participate in Google Summer of Code 2014? What do you hope to gain by participating?* * Apertium likes Google Summer of Code: it is a programme that supports open-source as much as we do!

7 KB (1,212 words) - 20:10, 4 February 2014
Google Summer of Code/Application 2013
...on engine and auxiliary tools is being developed around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but indepe ...Occitan, Breton—French, and Basque—Spanish among others), and several more in development.

9 KB (1,376 words) - 15:24, 22 March 2013
Grfro3d/proposal apertium cat-srd and ita-srd
== Why is it you are interested in machine translation? == ...n" in which the MT is an intermediate step in the production of a document in the TL, which will be published. To facilitate this process, it is usual to

21 KB (3,171 words) - 14:34, 3 April 2017
Google Summer of Code/Report 2010
...nality mirroring that of Apertium's "lt-proc", but which loads transducers in the formats supported by HFST. The lookup tool tokenises the input text on ...ictionary entries, and some rules. Some rules have been worked on, but are in an incomplete state.

16 KB (2,571 words) - 12:21, 20 June 2019
Apertium for Dummies
...n computer code, or takes input and output in forms very close to computer code. ...uter coding. How much you need to know depends on your interests. Building in Apertium needs an interest and an enthusiasm for languages, and a will to e

17 KB (2,835 words) - 16:16, 24 January 2017
Task ideas for Google Code-in/Russian
http://google-melange.appspot.com/gci/homepage/google/gci2011 ==Что это такое, Google Code-In ?==

56 KB (2,087 words) - 19:57, 12 April 2021
Google Code-in/Application 2018
;Why does your organisation want to participate in Google Code-in 2018? ...that focuses a lot on marginalised languages. GCI gives us a chance to get in touch with the next generation of speakers, and to show them how they can h

3 KB (443 words) - 11:20, 11 September 2018
Top tips for GSOC applications
Here are the main tips to help you in writing your GSOC application with Apertium. ...still interested, but we'll try to find a subset of it which is achievable in the time scale available.

9 KB (1,509 words) - 23:51, 27 February 2023
Google Code-in/Application 2016
;Why does your organisation want to participate in Google Code-in 2016? ...that focuses a lot on marginalised languages. GCI gives us a chance to get in touch with the next generation of speakers, and to show them how they can h

3 KB (516 words) - 21:03, 29 October 2016
Google Summer of Code/Application 2008
...iteria selection criteria], and [http://code.google.com/p/google-summer-of-code/wiki/AdviceforMentors advice for mentors] Fill out the application form [http://code.google.com/soc/2008/org_signup.html here].

8 KB (1,255 words) - 19:50, 12 April 2021
Task ideas for Google Code-in/Getting started
...you can take to get involved with the Apertium project in the Google Code-in. First of all, thanks for reading! We're very enthusiastic about getting ne ...o hang out on IRC, even if no-one is talking when you enter. People can be in different time zones, and channel activity peaks depending on the time.

7 KB (1,091 words) - 19:54, 12 April 2021
Google Summer of Code/Application 2011
...on engine and auxiliary tools is being developed around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also i ...ton—French, and Basque—Spanish among others), and several more in development.

13 KB (2,013 words) - 12:21, 20 June 2019
Google Summer of Code/Application 2018
; Why does your org want to participate in Google Summer of Code? ...ution of existing developers through mentoring and to improve the platform in many ways: improving the engine, generating new tools and user interfaces,

5 KB (833 words) - 15:49, 11 January 2018
Google Summer of Code/Application 2010
...on engine and auxiliary tools is being developed around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also i ...ton—French, and Basque—Spanish among others), and several more in development.

11 KB (1,802 words) - 19:51, 12 April 2021
Google Code-in
See our '''[[Task ideas for Google Code-in/Getting started|Getting started guide]]''' if you're a current GCI student! See our '''[[Task ideas for Google Code-in/Mentor guidelines|Mentor guidelines]]''' if you're an Apertium GCI mentor o

3 KB (412 words) - 22:18, 24 December 2019
Google Code-in/Application 2017
; Why does your organisation want to participate in Google Code-in 2017? ...that focuses a lot on marginalised languages. GCI gives us a chance to get in touch with the next generation of speakers, and to show them how they can h

2 KB (421 words) - 15:37, 10 October 2017
Google Summer of Code/Report 2009
apertium-nn-nb is now in a fairly usable state for translating both ending in -lig in nb typically end in -leg in nn) and checking whether

12 KB (1,886 words) - 12:20, 20 June 2019
Danish and Norwegian
2. svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium-dan-nor The language pair was developed as part of Google Summer of Code 2013 by [http://www.linkedin.com/pub/jonas-fromseier-mortensen/45/69/839 Jo

21 KB (3,367 words) - 15:17, 27 October 2013
Google Summer of Code
...tails about Apertium in the [https://code.google.com/soc/ Google Summer of Code] (GSOC). ...putational linguistics or any combination of the above, then [[contact|get in touch]].

6 KB (674 words) - 14:52, 19 January 2023
Ankush/Application
ankushgupta@students.iiit.ac.in<br /> == Interest in Machine Translation ==

6 KB (923 words) - 17:57, 3 April 2010
PMC proposals/Allow some code under github.com/apertium
** especially relevant for Google Code-in * committing without a net connection / on an airplane / in a boat

4 KB (547 words) - 08:06, 30 January 2015
Narimann/GSOC 2019 proposal: Kazakh-Turkish and Turkish-Kazakh
Turkish - intermediate(5 years of studying in Kazakh-Turkish school) == Why is it that you are interested in Apertium? ==

8 KB (1,094 words) - 13:10, 14 April 2019
Google Summer of Code/Application 2017
; Why does your org want to participate in Google Summer of Code? ...ution of existing developers through mentoring and to improve the platform in many ways: improving the engine, generating new tools and user interfaces,

5 KB (841 words) - 13:52, 23 January 2017
Google Summer of Code/Review process
How the review process works in Apertium: * The ranking period / review process closes some time before Google slot requests are due

1 KB (182 words) - 05:02, 7 April 2019
Google Summer of Code/Application 2023
=== Years previously participated in GSoC === === Link to source code ===

7 KB (1,010 words) - 23:21, 28 January 2023
Google Summer of Code/Application 2022
=== Years previously participated in GSoC === === Link to source code ===

7 KB (1,023 words) - 15:31, 21 February 2022
Swedish and Danish
Udviklingen er sponsoreret af Google Summer of Code (GSOC) og foretaget http://socghop.appspot.com/org/home/google/gsoc2009/apertium.

9 KB (1,406 words) - 20:34, 29 October 2010
Chebrolutejasvi/GSOC 2020 proposal: Hindi-Telugu
E-Mail: tejasvi.chebrolu@research.iiit.ac.in == '''Why is it that I am interested in Apertium?'''==

9 KB (1,391 words) - 16:41, 31 March 2020
Task ideas for Google Code-in/Add words from frequency list
...tion task]]); and its part-of-speech (see the [[Task ideas for Google Code-in/Categorise words from frequency list|categorisation task]]). The next step ...rent depending on the dictionary format and the language in question. When in doubt, ask your mentor for help.

3 KB (519 words) - 19:05, 7 November 2016
GCI-2011 куоҥкуруска кыттыы туһунан
...Google Code-in ситим-сиригэр: [http://www.google-melange.com/gci/age_check/google/gci2011 Register as student]. ...-in ситим-сириттэн көрүөххүн сөп: [http://www.google-melange.com/gci/tasks/google/gci2011 Search for tasks].

7 KB (549 words) - 06:16, 12 January 2012
Task ideas for Google Code-in/Mentor guidelines
...o are able to mentor Apertium-related tasks are eligible to be Google Code-In mentors for Apertium. This can include: ...ive community, but if you don't have any experience with Apertium tools or code, please don't bother asking to be a mentor after it's already been announce

2 KB (345 words) - 13:19, 14 November 2019
Installation
* http://aplica.prompsit.com/ – Prompsit is a company heavily involved in development of the Apertium platform, and also offers a simple web interfac ...versity of Tromsø works on Saami language pairs; this site runs the latest in-development version of Northern Saami→Norwegian Bokmål

6 KB (848 words) - 12:51, 1 April 2024
Sardu abbarra bivu!
'''Why is it you are interested in Machine Translation?''' ...in fact, it is not necessary to include corpora with millions of words as in the statistical methods: it takes only two smaller corpora and a dictionary

15 KB (2,339 words) - 00:41, 4 June 2018
English and Catalan/GSOC 2017
...y of all the work done in the English-Catalan pair during Google Summer of Code 2017. For a more detailed workplan of the project, please check [[English_a ...thanks to supporting work done on automation, which has been very helpful. In addition to new entries (added from frequency lists and crossdics), entries

5 KB (887 words) - 22:24, 31 August 2017
Task ideas for Google Code-in/Tokenisation for spaceless orthographies
...tokenise sentences in South and East Asian languages into words. Sentences in these languages are usually not written with spaces to show word boundaries ...l the possible ways of splitting up the sentence into words that are found in the dictionary:

3 KB (394 words) - 01:37, 17 June 2023
Sardinian and Italian/Final Report
...of the project, following the timing and deadlines of the Google Summer of Code program. ...arcelona and Prompsit, funded by Google via the program ''Google Summer of Code''.

7 KB (1,110 words) - 11:34, 23 August 2016
Sardo e italiano/Rapporto finale
...getto, seguendo la tempistica e le scadenze del programma Google Summer of Code. ...inanziamento da parte di Google per mezzo del programma ''Google Summer of Code''.

13 KB (1,910 words) - 11:34, 23 August 2016
Weighted transfer rules at GSoC 2016
...onducted by [[User:Nikita Medyankin|Nikita Medyankin]] at Google Summer of Code 2016. ...ral input pattern, as opposed to the present situation when the first rule in xml transfer file takes exclusive precedence and blocks out all its ambiguo

9 KB (1,387 words) - 13:37, 23 August 2016
Indirect contribution guide
* How to help "Apertium" in other ways. When in doubt, ask!

9 KB (1,494 words) - 05:58, 18 March 2015
Apertium going SOA
...tributed infrastructure (for example, to collaborate easier with engineers in an offshore country), and so on).]] ...d through a Web Service interface) to implement real-time translation (bot in input and output) of instant messages.]]

24 KB (3,572 words) - 07:37, 8 March 2018
Ideas for Google Summer of Code/Morphology with HFST
...line, whereas lt-proc finds word boundaries based on the <code><alphabet></code> section of the dictionary (non-alphabet characters always separate words). ...s/src/hfst-lookup.cc however, uses line_to_keyvector (calling hfst_getline in hfst-commandline.cc), going line by line with getline. It seems like a good

5 KB (680 words) - 16:10, 13 May 2010
Apertium cat-srd and ita-srd/GSoC 2017
== Google Summer of Code 2017 Gianfranco Fronteddu Final report== You can see my work including the code and a full list of commits here: https://apertium.projectjj.com/gsoc2017/gf

9 KB (1,306 words) - 15:56, 2 September 2017
Курсы машинного перевода для языков России/Session 8
...od of around 12 years. The original <code>interNOSTRUM</code> was released in early 2000 and took around 72 person-months (four people, 18 months) to dev ...stry of Science, Industry and Commerce of the Spanish State to rewrite the code as open-source, and to convert the linguistic data. After one person year,

12 KB (1,679 words) - 12:00, 31 January 2012
Helsinki Apertium Workshop/Session 8
...od of around 12 years. The original <code>interNOSTRUM</code> was released in early 2000 and took around 72 person-months (four people, 18 months) to dev ...stry of Science, Industry and Commerce of the Spanish State to rewrite the code as open-source, and to convert the linguistic data. After one person year,

12 KB (1,683 words) - 08:42, 10 May 2013
Tartu Apertium Course/Session 8
...od of around 12 years. The original <code>interNOSTRUM</code> was released in early 2000 and took around 72 person-months (four people, 18 months) to dev ...stry of Science, Industry and Commerce of the Spanish State to rewrite the code as open-source, and to convert the linguistic data. After one person year,

12 KB (1,683 words) - 11:00, 30 October 2015
Working with Apertium in the Google Code-in
Here are some top-tips for working with Apertium in the [[Google Code-in]]: ...GitHub] account. We use [[git]] and GitHub for collaboratively developing code.

518 bytes (88 words) - 08:11, 16 September 2018
Plugins
...e ([http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/oooapertium/ code]) ...s the API of Google Translate: implemented as part of the Google Summer of Code 2009 projects, see [[Apertium services]]

3 KB (451 words) - 19:20, 18 August 2014
Begiak
'''begiak''' is the IRC bot in the #apertium [[IRC]] channel. It serves several purposes, including to sho ...|sushain]], Qasim, and a number of other GCI students since then. The core code base is the [https://github.com/mutantmonkey/phenny mutantmonkey port] to P

8 KB (1,234 words) - 17:01, 3 December 2020
Ideas for Google Summer of Code/Plain-text formats for Apertium data
...from current XML code. Morphtrans can of course be redesigned a bit, and, in fact, it should. ...onverts a <code>.mode</code> shell-script fragment into a <code>modes.xml</code> file.

2 KB (324 words) - 11:37, 16 February 2016
Google Season of Docs 2022/Organize and Update Apertium User Documentation
| Fill in gaps in formal docs ...to go from a word-order or agreement difference to a working transfer rule in either formalism

6 KB (849 words) - 17:59, 25 March 2022
Bilingual dictionary
...s translation between two languages. It is one of the main five data files in any language pair (see also: [[Apertium New Language Pair HOWTO]]). ...ub]] (https://github.com/apertium). The bilingual dictionary file name are in the form ''apertium-A-B.A-B.dix'' where ''apertium-A-B'' is the name of the

7 KB (1,244 words) - 16:41, 17 March 2018
Ideas for Google Summer of Code
This is the ideas page for [[Google Summer of Code]], here you can find ideas on interesting projects that would make Apertium ...a, add your name to "Interested mentors" using <code><nowiki>~~~</nowiki></code>.

23 KB (3,198 words) - 09:15, 4 March 2024
Kira's project schedule
* Capability in APY to bypass captcha code (for testing) * Finished suggestions feature in APY and html-tools

3 KB (369 words) - 17:06, 1 July 2016
Romanian and Catalan/GSOC 2018
...of all the work done in the Romanian-Catalan pair during Google Summer of Code 2018. It also includes information on the upgrade of four language pairs wh ...monolingual package system and develop it to bring it to release quality. In addition, four other language pairs have been upgraded to the monolingual p

7 KB (1,071 words) - 10:48, 14 August 2018
Frequently Asked Questions
...s or phrases you find that are incorrectly translated, to getting involved in creating a new language pair or programming on tools or user interfaces. He ...en.wikipedia.org/wiki/C++ C++]. The various development helpers are mostly in [https://python.org/ Python].

7 KB (1,139 words) - 06:27, 27 May 2021
Tatar and Russian
...r translating from [[Tatar]] to [[Russian]]. The pair is currently located in [https://github.com/apertium/apertium-tat-rus GitHub]. ! Stems in the bilingual dictionary

8 KB (1,006 words) - 12:48, 9 March 2018
Contributing
...s or phrases you find that are incorrectly translated, to getting involved in creating a new language pair or programming on tools or user interfaces. ...e on the [[IRC|IRC channel]] <code>#apertium</code> on <code>irc.oftc.net</code>.

3 KB (549 words) - 09:17, 26 May 2021
Turkic-Turkic translator
...h]] — [[Tatar]] || [[trunk]] || <code>kaz-tat</code>, <code>tat-kaz</code> || — || ~{{:Apertium-kaz-tat/stats/kaz-average}}%{{slc|kaz}}, ~{{:Aperti ...]] || [[Turkish]] — [[Crimean Tatar]] || [[trunk]] || <code>crh-tur</code> || - || - || {{#lst:Apertium-crh-tur/stats|crh-tur_stems}} || - || -

6 KB (591 words) - 22:50, 30 October 2017
GSOC'16 Kira's results. Apertium website improvements: Docs diff
* <code>translation lookup</code> turns on dictionary lookup mode. <code><pre>

5 KB (712 words) - 21:27, 16 August 2016
Tokenisation for spaceless orthographies
.../Tokenisation for spaceless orthographies]] being worked on [[User:Eiji]] in 2023 https://docs.google.com/document/d/1aTTGoLLCpr2gncq2FJIWG0InUH3tJ6epHxioKEDhNPs/edit?usp=sharin

605 bytes (84 words) - 03:31, 14 July 2023
Completing tasks for Google Code-in
This article will explain the basic process for completing a [[Google Code-in]] task. First, go to the [https://codein.withgoogle.com/tasks/ Google Code-in tasks page]. Next, open the "Organizations" drop-down and click the box nex

444 bytes (73 words) - 02:31, 18 December 2019
Lttoolbox-java (français)
[[Lttoolbox-java|In English]] * Pendant le [[Google Summer of Code]] de 2009 [[User:Rah|Raphaël]] et [[User:Sortiz|Sergio]] ont travaillé de

10 KB (1,597 words) - 13:07, 7 October 2014
Ideas for Google Summer of Code/Automatic blank handling
'''In progress''' A '''superblank''' is something that we don't want to translate, but keep in the output, often things like formatting tags.

8 KB (1,364 words) - 12:15, 14 May 2017
Ideas for Google Summer of Code/lint for Apertium
...make unrecommended changes. A lint tester would help people write standard code for dictionaries and transfer files. * Write a program which parses a <code>.dix</code> file and for each (surface form, lexical form) pair, lists entries/paradig

5 KB (789 words) - 10:36, 31 May 2016
Task ideas for Google Code-in/Language detection in simple-html and apertium-apy
The language detection in the [[simple-html]] interface currently uses a 2.9M javascript file. The ob ===Implement language detection in apertium-apy===

1 KB (205 words) - 20:49, 13 November 2013
Pairviewer
[[File:pairviewer.png|right|thumb|350px|The Pairviewer in action.]] ...ally developed sometime before the [[GCI|Google Code-In]] 2013. Its source code can be [https://github.com/apertium/pairviewer found on GitHub] and an onli

5 KB (702 words) - 01:34, 9 December 2018
Ideas for Google Summer of Code/UD Annotatrix
...ting Universal Dependencies. The objective of this project is to extend it in useful ways: * Server: The code that runs the server (Python)

3 KB (302 words) - 19:03, 17 July 2018
Ideas for Google Summer of Code/Apertium website improvements
===Work in progress=== # Send us a pull request with your code.

2 KB (305 words) - 01:43, 8 March 2018
Ideas for Google Summer of Code/Robust recursive transfer
# Write a number of transfer rules in this formalism for translating between a language pair. # Reimplement an existing language pair in trunk using your new formalism. This will involve rewriting the existing ru

2 KB (307 words) - 19:16, 28 February 2019
Bengali and English
...English Pair is also a candidate for adoption as part of Google Summer of Code 2013 ideas for Apertium. Some ideas that you can base your proposal on: * Adding more words in the bdix and monodix

1 KB (137 words) - 06:41, 27 April 2013
Ideas for Google Summer of Code/Optimise the VM for transfer
...L tree-walking implementation. The job of this task is to optimise the C++ code to make it faster than XML tree-walking. ...nale behind this is that XML tree-walking is quite slow and CPU intensive. In modern (3 or more stage) pairs, transfer takes up most of the CPU. There ar

1 KB (181 words) - 13:41, 21 March 2013
Task ideas for Google Code-in/Categorise words from frequency list
! Part-of-speech !! Code | Noun || <code>n</code>

3 KB (286 words) - 22:00, 8 December 2019
Dutch
Daniel Huang, Google Code In 2012 * C.B. van Haeringen, Netherlandic language research. Men and works in the study of Dutch, 2nd edition, Leiden: Brill 1960

11 KB (1,584 words) - 15:59, 15 December 2012
Maltese and Arabic/Work plan
...nt efforts for the [[Maltese and Arabic]] translator in [[Google Summer of Code]] 2012. ...gual dictionary of the pair, that is, only containing stems which are also in the bilingual dictionary (but omissions leading to generation errors do not

4 KB (404 words) - 15:28, 8 March 2013
Task ideas for Google Code-in/Intersection of ATT format transducers
The objective of these tasks is to write code to intersect two finite-state transducers. One transducer is a [[morphologi ...the set of strings in the morphological dictionary which have translations in the bilingual dictionary.

5 KB (798 words) - 14:01, 17 March 2020
Retrospective: Google Code-In 2017
Google Code-In 2017 was certainly an overall success for Apertium. Students completed upwa ...an be improved so that mentors and students have an even better experience in the future.

10 KB (1,668 words) - 02:46, 10 February 2018
Task ideas for Google Code-in/Apy pipedebug
This task is almost done, see -r59428 and -r57945 in apy SVN. * _e in mode names turns into underlined e in the dropdown, should just be _e

4 KB (652 words) - 12:52, 26 March 2015
Shallow syntactic function labeller
...[http://wiki.apertium.org/wiki/User:Deltamachine/proposal Google Summer of Code 2017 project] ...was built. It works with fastText embeddings for every tag which was seen in the corpus: an embedding for a word is just a sum of all word's tags embedd

5 KB (764 words) - 01:40, 8 March 2018
Task ideas for Google Code-in/Manually disambiguate text
Words can have more than one possible interpretation, for example, "tie" in English can be a noun denoting an item of clothing "she put on her tie" or ...That is, for each ambiguous word you choose the appropriate interpretation in context.

3 KB (574 words) - 16:30, 11 January 2020
Kurmanji and English/Final report
This is the report for my 2016 Google Summer of Code project, Kurmanji-English Machine Translation. ...release quality. I have worked on adding vocabulary, disambiguation rules in CG, transfer rules and lexical selection.

2 KB (335 words) - 10:29, 23 August 2016
Wikipedia Extractor
...odified by a number of people, including by BenStobaugh during Google Code-In 2013, and can be cloned from GitHub at [https://github.com/apertium/WikiExt ...outputs the text to one file. To use it, simply use the following command in your terminal, where dump.xml is the Wikipedia dump

2 KB (360 words) - 18:55, 30 January 2023
Ideas for Google Summer of Code/Weighted transfer rules
* Implement in C++ and integrate into Apertium. * Set up a pair and train the existing weighted transfer rule code.

5 KB (804 words) - 11:54, 9 March 2017
Ideas for Google Summer of Code/Accent and diacritic restoration
...in some places, especially for example. instant messaging, irc, searching in the web etc. these are often not used or untyped. This causes problems as f * Train models for all languages in Apertium.

1 KB (161 words) - 13:39, 21 March 2013
Evaluation
'''Evaluation''' can give you some idea as to how well a language pair works in practice. There are many ways to evaluate, and the test chosen should depen ....aclweb.org/anthology/W/W15/W15-30.pdf#page=412 Character N-gram F-score] (code at https://github.com/Waino/chrF)

6 KB (981 words) - 09:13, 21 November 2021
Omorfi
....sh does not work, do report a bug (autoreconf -i should work just as well in the meantime). To prepare source code for new apertium language pair, use src/scripts/omor2apertium.sh... or just

1 KB (189 words) - 14:53, 2 June 2016
Ideas for Google Summer of Code/More robust recursive transfer
...y don't get too deep or ambiguous, and that they cover full sentences. See in particular issues [https://github.com/apertium/apertium-recursive/issues/97 ...lement solutions to the parse performance issues in the apertium-recursive code-base

1 KB (211 words) - 09:22, 6 February 2024
Anaphora resolution module
...ion for the Anaphora Resolution module created during the Google Summer of Code 2019. ([http://wiki.apertium.org/wiki/User:Khannatanmai Proposal]) Anaphora Resolution is the process of resolving references to earlier items in discourse.

20 KB (3,107 words) - 21:13, 24 June 2022
Ideas for Google Summer of Code/Desktop GUI
...its. APY is not used on Windows/Mac - but you can just reuse the Simpleton code for those two. ** possibly other formats that aren't in the standard [[Format handling]] list?

2 KB (413 words) - 19:21, 17 March 2016
Ideas for Google Summer of Code/Improvements in lexical-selection module
...ptimisations to the lexical selection module. The lexical selection module in Apertium is currently a prototype. There are many optimisations that could * Do proper processing of tags in all scripts.

1 KB (186 words) - 18:06, 22 March 2013
Task ideas for Google Code-in/Syntax tree printing in bison
...of a sentence using GNU Bison. The output could be text, or a a <code>dot</code> file using GraphViz. * [https://svn.code.sf.net/p/apertium/svn/branches/transfer4 An example grammar]

375 bytes (59 words) - 16:11, 14 November 2013
Ideas for Google Summer of Code/Anaphora resolution
* Write a program to find antecedents for anaphora in a stream * Update the transfer code to accept the new format, but be backwards compatible too

485 bytes (76 words) - 12:49, 29 January 2018
Tagging guidelines for English
...er hundreds, of thousands of words) to 'train' the automatic taggers found in some Apertium language pairs. Getting the right tag for a word is important This is why we have many hand-tagging tasks in the Google Code-In.

13 KB (2,076 words) - 12:13, 26 September 2016
Norwegian Nynorsk and Norwegian Bokmål
* Google Summer of Code (2009, første utgåve, v0.6.0) ...a (<code><e></code>) kan vere merka med ein restriksjon, <code><e r="LR"></code>, som viser at oppslaget vil bli analysert, men ikkje generert (dette kan v

23 KB (3,704 words) - 11:56, 16 December 2020
Ideas for Google Summer of Code/Extend lttoolbox to have the power of HFST
...phologies for languages that have features such as the vowel harmony found in [[Turkic languages]] is very hard with the current format supported by ltto Following features in HFST are absent from apertium's lt-toolbox or unsatisfactorily implemented:

4 KB (627 words) - 11:45, 27 February 2019
PMC proposals/Debian package mantainer
Recruiting someone to make sure that Apertium packages in Debian are reasonably up-to-date. If we continue in GSOC in the same way, this would represent 1/6 of the

5 KB (775 words) - 22:26, 3 August 2013
Crimean Tatar and Turkish/GSoC Report
The following is the submission report for the Google Summer of Code 2017 project, RBMT for Crimean Tatar and Turkish. .../gsoc2017/memduhg.html] The pair is in the trunk folder in SVN[https://svn.code.sf.net/p/apertium/svn/trunk/apertium-crh-tur/], and can be checked out with

4 KB (551 words) - 23:52, 28 August 2017
Google Summer of Code/Midterm report 2011
Number of words in reference: 356 Number of words in test: 364

4 KB (404 words) - 00:19, 3 July 2012
Ideas for Google Summer of Code/Template-based bilingual dictionary
** The order of the constituents of gustar can be in various orders ** The verb gustar needs to agree with its subject, which in turn is the object of like

3 KB (456 words) - 18:57, 29 January 2014
Ideas for Google Summer of Code/Adopt a language pair
...ta, including morphological rules and transfer rules — which are specified in a declarative language. A good intro would be to look through [[Apertium Ne ...he URL to any work you do for the coding challenge work should be included in your application.

6 KB (1,024 words) - 15:22, 20 April 2021
Lttoolbox-java
* Read .mode files and execute the steps included in them ...embedding Apertium in a desktop application. Currently Apertium is usable in a local subdir but installation isnt trivial to an end user.

9 KB (1,370 words) - 09:49, 7 April 2020
Kazakh and Tatar/Work plan
...elopment efforts for [[Kazakh and Tatar]] translator in [[Google Summer of Code]] 2012. ...gual dictionary of the pair, that is, only containing stems which are also in the bilingual dictionary.

6 KB (728 words) - 19:47, 8 May 2014
Ideas for Google Summer of Code/Discontiguous multiwords
...support. For example 'liggja ekki fyrir' in Icelandic should be translated in English as 'to be not clear', but we cannot have 'liggja fyrir' as a tradit Another example: in Norwegian, "bryta seg inn" means "break in", while "bryta saman" means "collapse". Both these can have an NP between t

4 KB (632 words) - 12:33, 4 March 2016
PMC proposals/Apertium membership in EAMT
=2010/03/20: Apertium membership in EAMT= ...m would have a profile in that web, which is mainly visited by researchers in EAMT, but not only. Currently the EAMT has about 8 corporate members, but o

2 KB (358 words) - 22:25, 3 August 2013
Missing chemical elements
...). The second goal is to ease the work for integrating the missing entries in Apertium moorphological and bilingual dictionaries. # en-ca: http://www.google-melange.com/gci/task/view/google/gci2012/7968236

76 KB (7,700 words) - 17:38, 2 December 2012
PMC proposals/Stable version of apertium-sh-sl
The language pair seems to work OK in the sh→sl sense but not so well in sl→sh (apparently it has not been ''testvocked''). ...is no available workforce. Also, we cannot schedule it as a GSoC task, or, in any case, it should be scheduled as a very early part of a GSoC task that c

2 KB (380 words) - 22:26, 3 August 2013
Ideas for Google Summer of Code/Corpus-based lexicalised feature transfer
Make a module that sits somewhere in the Apertium pipeline (somewhere after the lexical selection and before mor ...like definiteness, aspect, evidentiality, impersonal/reflexive pronoun use in Romance languages etc.

2 KB (262 words) - 11:19, 9 February 2015
Google Summer of Code/Report 2013
===Improvements in lexical-selection module=== ===A Sliding-Window Drop-in Replacement for the HMM Part-of-Speech Tagger in Apertium===

2 KB (200 words) - 08:21, 13 January 2015
Tagging guidelines for Spanish
...er hundreds, of thousands of words) to 'train' the automatic taggers found in some Apertium language pairs. Getting the right tag for a word is important * [verb] Los domingos ''canto'' en el coro → On Sundays I ''sing'' in the choir.

4 KB (763 words) - 12:14, 26 September 2016
Ideas for Google Summer of Code/Improved bilingual dictionary induction
Let's suppose we find in the alignments: ...p__n</code> and forms in <code>mois__n</code> to forms in <code>vrem/e__n</code> . For example:

4 KB (686 words) - 11:05, 27 September 2016
Ideas for Google Summer of Code/Interface for creating tagged corpora
...released pairs and the ones to come: better part-of-speech (POS) taggers. In my experience, training supervised taggers has never been a waste of time b ...nsupervised tagger training|unsupervised manner]] for one of the languages in your pair.

2 KB (269 words) - 21:26, 5 April 2013
Flag diacritics
...ed at run-time instead of at compile-time. This lets you have fewer states in your FST (at the cost of some run-time overhead). ...t, but has been reimplemented in [[HFST]] (as well as an experimental mode in [[lttoolbox-java/Flag diacritics]]).

768 bytes (114 words) - 06:49, 20 October 2014
Task ideas for Google Code-in/Grow bilingual
...m but no reasonable bilingual dictionary (these language pairs are usually in the incubator), for instance apertium-spa-pol ...most frequent unknown words''' (words in the source document which are not in the bilingual dictionaries of the language pair). See below for informatio

2 KB (320 words) - 15:01, 19 January 2020
Task ideas for Google Code-in/Add transfer rule
...ing (local agreement, gender, number, etc. is inadequate, local word order in a phrase is inadequate, there is a word too much or a word missing, etc.). ...text in L₂ through the pair and find a consistent error in the output text in L₁ that isn't grammatical.

1 KB (208 words) - 21:39, 15 December 2019
Raveesh/Application
Why is it you are interested in machine translation? Why is it that they are interested in the Apertium project?

1 KB (192 words) - 00:56, 16 March 2014
Ideas for Google Summer of Code/User-friendly lexical selection training
...ual dictionaries allow for ambiguous translations, selecting the right one in a context is handled by our [[Lexical selection]] module '''apertium-lex-to ...e should also be regression tests on the driver script, to ensure it works in the face of updates to third-party tools.

4 KB (541 words) - 13:46, 29 March 2021
Ideas for Google Summer of Code/UD and Apertium integration
...-ud` should produce conllu file with LEMMA, POS, FEATs, MISC fields filled in * set up APERTIUM EMBEDDINGS in UDPipe

1 KB (203 words) - 12:56, 29 January 2018
Tagging guidelines for Catalan
...er hundreds, of thousands of words) to 'train' the automatic taggers found in some Apertium language pairs. Getting the right tag for a word is important This is why we have many hand-tagging tasks in the Google Code-In.

2 KB (294 words) - 09:47, 12 November 2017
Ideas for Google Summer of Code/Make a language pair state-of-the-art
...e an Apertium language pair state-of-the-art, or close to state-of-the-art in terms of translation quality. This will involve improving coverage to 95-98 * Find a language pair of your choice in Apertium and install it. (see [[Install language data by compiling]])

2 KB (383 words) - 19:46, 2 March 2023
Ideas for Google Summer of Code/Flag diacritics in lttoolbox
Flag diacritics are a method used in the [[HFST]] tools to allow the writer of a transducer to exclude impossibl Some work on [[Flag diacritics]] has already been made in [[lttoolbox-java]].

1 KB (176 words) - 06:40, 20 October 2014
Ideas for Google Summer of Code/Sliding-window part-of-speech tagger
...pport for unknown words, and also for "forbid" descriptions (not described in the paper). The tagger has a very intuitive interpretation (believe me, eve * Implement the tagger described in the paper.

2 KB (251 words) - 00:37, 6 April 2013
Qashqai
Daniel Huang, Google Code In 2012 <br /> * [http://www.ethnologue.com/show_language.asp?code=qxq Ethnologue report]

2 KB (205 words) - 22:55, 2 December 2012
Ideas for Google Summer of Code/Add weights to lttoolbox
Weights in this page are intuitively defined as larger is worse and two weights can be == Syntaxes for weights in dixes and t?xes and all ==

5 KB (816 words) - 02:32, 13 February 2018
Apertium on SliTaz
Guide prepared by Google Code-In 2014 student Nikita Tsarev. <code>su</code>

2 KB (281 words) - 02:58, 9 March 2018
Task ideas for Google Code-in/Comment XML
And a list in a file (filename given as the first argument to the script) like ...ement. Also, note how :yaa<n> does not comment out the line that has "yaa" in its <l> element.

3 KB (576 words) - 12:57, 2 January 2016
Azerbaijani
Daniel Huang, Google Code In 2012 <br /> ...w.azeri.org/Azeri/az_learn/az_socio/socio_index.html Learning Azerbaijani in Social Context]

5 KB (601 words) - 11:58, 25 October 2019
Ideas for Google Summer of Code/Apertium assimilation evaluation toolkit
...ing) purposes. The evaluation described would measure how helpful they are in the task. Starting from files containing sentences in the source language and reference translations, generate tests for human ev

1 KB (207 words) - 09:03, 23 April 2015
Ideas for Google Summer of Code/Cyclical paths in .dix format
At the moment it is not possible to define cyclical paths in [[lttoolbox]]'s XML-based transducer format. The idea of this project is to .../code> element]] in any analysis, meaning there can be no <code>#</code>'s in the actual cycles.

1 KB (158 words) - 07:23, 2 September 2014
Ideas for Google Summer of Code/Spell checking
...roviding suggestions. Our [[lttoolbox]] based transducers should be usable in the same way. Additionally, we have the beginnings of a spell checking inte * create clean Makefile rules for speller compilation that are usable in our monolingual modules

2 KB (282 words) - 13:47, 17 March 2016
Prerequisites for Slackware
This Tutorial is prepared by Jatin Luthra in Google Code-In. In case you don't have slapt-get installed already do the following in order to install it:

1 KB (200 words) - 02:56, 9 March 2018
Task ideas for Google Code-in/Add words to monolingual dictionary
...most frequent unknown words''' (words in the source document which are not in the dictionary). See below for information about how to do this. Note: th ...nolingual dictionary''' (the appropriate <code>.dix</code> or <code>.lexc</code> file) so that they are not unknown anymore. Make sure to categorise stems

2 KB (299 words) - 19:44, 30 December 2019
Ideas for Google Summer of Code/Complex multiwords
...ord units, for example ''dirección general'' and ''zračna luka''. Although in the Romance languages it is not a big problem, as soon as you start to get Default translation of "ambulantni" would be "ambulatory", but in this case we want to translate "ambulantnih pacijenata" as "outpatients"

2 KB (256 words) - 23:56, 5 April 2013
Ideas for Google Summer of Code/Geriaoueg vocabulary assistant
...panish--Breton. This task would be to develop it to work with any language in our SVN and fix problems with processing and displaying non-standard HTML. * Make it optionally read in dictionaries in [[lttoolbox]] format.

2 KB (248 words) - 19:31, 24 February 2014
Task ideas for Google Code-in/Setup constraint grammar for a pair
...errors (the translation is not adequate because the part-of-speech tagger in Apertium has selected the wrong morphological analysis for a word that had ...nd write 5 constraint grammar rules that select the desired part of speech in the relevant context(s);

1 KB (193 words) - 14:27, 29 October 2013
Task ideas for Google Code-in/Add constraint-grammar rules
...rors''' (the translation is not adequate because the part-of-speech tagger in Apertium has selected the wrong morphological analysis for a word that had ...rite 10 constraint grammar rules''' that select the desired part of speech in the relevant context(s);

1 KB (156 words) - 02:19, 21 October 2018
Task ideas for Google Code-in/Add lexical-select rules
...orrect translation in the relevant context. You'll want to write 10 rules in all. [[Category:Tasks for Google Code-in|Add lexical-selection rules]]

1 KB (199 words) - 21:39, 15 December 2019
Task ideas for Google Code-in/Add words
...c.) detect the 50 most frequent unknown words (source words which are not in the dictionaries of the language pair). [[Category:Tasks for Google Code-in|Add words]]

2 KB (271 words) - 05:34, 17 December 2015
Task ideas for Google Code-in/Extracting paradigm sketches from dictionaries
The objective of this task is to take (or make) a dictionary in text format and extract the ''paradigm sketches'' from it. By this we mean ...them into word categories. The suffixes and categories should be described in the dictionary.

2 KB (151 words) - 23:20, 14 November 2013
Ideas for Google Summer of Code/Integration and debugging tools for Grammatical Framework
...t). e.g. 1-best would have only the morphological analyses which are found in the 1-best parse tree. ...the morphological analysis of a word from the PGF library and print it out in Apertium format. (Build the library from source; the one on hackage has a f

2 KB (243 words) - 11:24, 13 March 2015
Ideas for Google Summer of Code/Apertium separable
** Put in template .lsx files * Add about 6 entries in each direction

343 bytes (51 words) - 18:40, 29 January 2018
Ideas for Google Summer of Code/Python library
...ub.com/apertium/apertium-python apertium] via a <code>setup.py</code> file in a Windows environment. *# Make <tt>show()</tt> return a list of tuples (in, out, in, out)

1 KB (191 words) - 21:12, 19 March 2019
Task ideas for Google Code-in/Fix using LanguageTool
...you know the the target language, and the target language has good support in LanguageTool (Catalan is one that has support from both Apertium and Langua ...words, you might need to add a multiword so that they translate correctly in that context

1 KB (186 words) - 09:43, 4 November 2014
Task ideas for Google Code-in/Scrape inflection information from Wiktionary
The equivalent in [[speling format]] would be: Where <code>n.f</code> means "noun, feminine" (this information will also typically be on the Wik

2 KB (214 words) - 12:10, 26 May 2023
Roadmap
This page gives a roadmap of features which we hope will be available in future Apertium releases. ** Parser (e.g. [[Ideas for Google Summer of Code/Robust recursive transfer]])

1 KB (160 words) - 10:44, 23 September 2016
Ideas for Google Summer of Code/Visual interface to write structural transfer rules
Apertium structural transfer rules are currently encoded in XML-based formats. These are very overt and clear, but clumsy and may be ha ...raphical user interface to write structural transfer rules (one that reads in (a subset of) the current XML-based language, allows for a graphical, intui

1 KB (162 words) - 01:11, 18 August 2015
Task ideas for Google Code-in/Setup and add lexical selection
...d, and write 5 lexical selection rules that select the correct translation in the relevant context. [[Category:Tasks for Google Code-in|Setup and add lexical selection]]

1 KB (165 words) - 14:19, 29 October 2013
Task ideas for Google Code-in/Lemmatise words from frequency list
...requency. The lemma of a word is it's "base form" (the form you might find in a dictionary) ...t. Work from top to bottom. After each asterisk '<code><nowiki>*</nowiki></code>' you should replace the surface form with the lemma.

2 KB (207 words) - 16:21, 14 November 2013
Ideas for Google Summer of Code/Apertium in chat clients
* Xchat: write a plugin reads in text from the highlighted channel and, if it contains three asterisks (***) [[Category:Ideas for Google Summer of Code|Apertium in chat clients]]

457 bytes (65 words) - 21:45, 10 March 2014
Tatar and Bashkir/GSOC 2018
...://summerofcode.withgoogle.com/projects/#5878649350258688 Google Summer of Code 2018 project] — Tatar-Bashkir machine translation. Lexicons in bak.lexc were changed to correspond to the ones in tat.lexc, missing lexicons and tags were added to bak.lexc and new rules we

2 KB (262 words) - 13:17, 14 August 2018
Ideas for Google Summer of Code/Spell checker web interface
...m-html-tools]] has seen some prototypes for spell-checking interfaces (all in stale PRs and branches on GitHub), but none have ended up being quite ready ** Should automatically detect available voikko modes in language modules (might need to standardise how these are done, but check [

1 KB (166 words) - 22:19, 18 January 2021
Ideas for Google Summer of Code/Add a new variety to an existing language
* Find a language pair of your choice in Apertium and install it. (see [[Minimal installation from SVN]]) ...stipend installment(s)]. At least you tried! And, hopefully, learnt a lot in the process.

2 KB (377 words) - 19:18, 25 January 2023
Bilingual dictionary discovery
You could make a graph out of these dictionaries where each node is a word in a language, and each arc is a language pair. For example like: http://i.img * [[Ideas for Google Summer of Code/Improved bilingual dictionary induction]]

3 KB (487 words) - 00:02, 22 March 2018
Indonesian and Malaysian/Work plan
...orts for the [[Indonesian and Malaysian]] translator in [[Google Summer of Code]] 2012. ...| <code><ij> <cnjcoo> <cnjsub> <cnjadv> <det> <pr> <num> <prn> <np> <adv></code>|| ||

3 KB (402 words) - 17:05, 22 August 2012
Task ideas for Google Code-in/Unigram tagger
[[Category:Tasks for Google Code-in|Unigram tagger]]

255 bytes (43 words) - 14:56, 26 October 2014
Gci task ideas
#REDIRECT [[Task ideas for Google Code-in]]

43 bytes (6 words) - 07:49, 27 November 2015
Google Summer of Code/Report 2014
* [[Plugins#In progress]] [[Category:Google Summer of Code|Report 2014]]

3 KB (305 words) - 08:31, 13 January 2015
Uzbek
Daniel Huang, Google Code In 2012 ...archive.org/web/20071101095455/http://www.ethnologue.com/show_language.asp?code=uzn Northern dialect]

5 KB (578 words) - 22:55, 27 November 2012
Apertium on Alpine Linux
This page was written by Jatin Luthra, a 2016 Google Code-in student, which explains how to install Apertium on Alpine Linux

706 bytes (102 words) - 19:50, 15 December 2016
Afshar
Daniel Huang, Google Code In 2012

889 bytes (121 words) - 23:02, 14 March 2013
Apertium on Mageia
This page was written by Jatin Luthra, a 2015 Google Code-in student, which explains how to install Apertium on Maegia

872 bytes (125 words) - 02:58, 9 March 2018
Task ideas for Google Code-in/Add nouns from frequency list
#REDIRECT [[Task ideas for Google Code-in/Add words from frequency list]]

73 bytes (11 words) - 20:20, 13 November 2013
Wordbound blanks
...pertium MT system: Wordbound blanks, developed during the Google Summer of Code 2020. = How to use wordbound blanks in the pipeline =

313 bytes (42 words) - 18:58, 21 August 2020
Task ideas for Google Code-in/Hand-correct spelling errors
[[Category:Tasks for Google Code-in|Hand-correct spelling errors]]

95 bytes (10 words) - 19:30, 16 November 2013
Ideas for Google Summer of Code/Morphological analyser
Present your coding challenge in IRC or on the mailing list and ask [[Category:Ideas for Google Summer of Code]]

681 bytes (107 words) - 15:27, 5 April 2021
Task ideas for Google Code-in/Check output of word aligner
[[Category:Tasks for Google Code-in|Check output of word aligner]]

96 bytes (12 words) - 20:40, 17 November 2013
Kazakh and Sakha/GSoC2018 report
...all the work done in the [[Kazakh and Sakha]] pair during Google Summer of Code 2018. The project consisted mainly of building a bilingual bidix and enrich

2 KB (233 words) - 07:41, 14 August 2018
Prerequisites for FreeBSD
This page was written by Jatin Luthra, a 2016 Google Code-in student, which explains how to install Apertium on FreeBSD

595 bytes (84 words) - 02:55, 9 March 2018
Ideas for Google Summer of Code/Improve integration of lttoolbox in libvoikko
* Writing a method for <code>liblttoolbox</code> which would allow analysis of a string as opposed to a file stream. [[Category:Ideas for Google Summer of Code|Improve integration of lttoolbox in libvoikko]]

932 bytes (130 words) - 14:22, 29 February 2012
Ideas for Google Summer of Code/Improving support for non-standard text input
* Propose ways in which they might be solved. [[Category:Ideas for Google Summer of Code|Improving support for non-standard text input]]

902 bytes (121 words) - 12:50, 10 March 2014
Ideas for Google Summer of Code/Closer integration with HFST
* Fix [https://sourceforge.net/p/hfst/bugs/153/ this bug] in <code>hfst-proc</code> tokenisation. * Make <code>hfst-expand</code> obey flag diacritics.

1 KB (170 words) - 23:58, 5 April 2013
Ideas for Google Summer of Code/Unify the metadix formats
* Write a XSLT sheet that transforms as many entries <e> as possible in the standard section of the dictionary as follows: [[Category:Ideas for Google Summer of Code|Unify the metadix formats]]

1 KB (243 words) - 11:47, 14 February 2014
Ideas for Google Summer of Code/Rule-based finite-state disambiguation
...e transducer. It might be a good idea to express this as constraint rules, in a novel XML-based file format. ...rocessor (see [[Apertium stream format]]) for the output of <code>lt-proc</code> that parses character by character, respecting [[superblanks]].

2 KB (237 words) - 00:53, 24 March 2013

Retrieved from "https://wiki.apertium.org/wiki/Special:Search"

Navigation menu