https://wiki.apertium.org/w/index.php?title=Special:NewPages&feed=atom&limit=50&offset=&namespace=0&username=&tagfilter=&size-mode=max&size=0Apertium - New pages [en]2024-03-29T15:15:23ZFrom ApertiumMediaWiki 1.34.1https://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Dictionary_induction_from_parallel_corporaIdeas for Google Summer of Code/Dictionary induction from parallel corpora2024-03-08T16:05:57Z<p>Popcorndude: /* Coding Challenge */</p>
<hr />
<div>== Coding Challenge ==<br />
<br />
Write a script that reads two parallel corpora, applies the appropriate monolingual taggers and some word-aligner ([https://github.com/robertostling/eflomal eflomal] is pretty straightforward if you don't know where to begin), and then prints a list of paired words.<br />
<br />
$ cat eng.txt<br />
The cat ate the fish.<br />
$ cat spa.txt<br />
El gato comió el pez.<br />
$ alignment-script apertium-eng/ eng.txt apertium-spa/ spa.txt<br />
the<det><def><mf><sp> - el<det><def><m><sg><br />
cat<n><sg> - gato<n><m><sg><br />
...</div>Popcorndudehttps://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Use_preferences_in_pairIdeas for Google Summer of Code/Use preferences in pair2024-03-04T09:31:46Z<p>Unhammer: </p>
<hr />
<div>Language pairs can now have any number of linguistic or stylistic preferences that the user can choose between. Before, we had only fixed sets, e.g. British vs American English, which required compiling a new pipeline for each set of preferences. These days it is possible to turn on/off individual spelling choices like "-ize" vs "-ise" or even word-specific ones like "encyclopaedia", "manoeuvre" vs "encyclopedia", "maneuver", within just one pipeline – as long as the pipeline is set up to allow this. This GsoC task involves setting up an existing pipeline to allow this kind of variation.<br />
<br />
The new preference system is used in nob→nno and cat→spa, but there are other language pairs that could have preferences enabled as well. This requires first of all figuring out what preference variation is possible and useful, systematising it, and then enabling it in the language pair by turning hard restrictions into ambiguity and selectors. We remove [[LR]]/RL's and merge [[Morphological_dictionary|paradigms]], and add simple CG rules to pick the form the user requested.<br />
<br />
<br />
== Documentation about preference variation ==<br />
<br />
* https://wiki.apertium.org/wiki/Dialectal_or_standard_variation#Overlapping_variants<br />
<br />
* https://github.com/apertium/apertium/issues/118<br />
<br />
<br />
== Coding challenge ==<br />
<br />
* initial documentation of possible preferences in a pair of your choice (which doesn't already have preferences enabled)<br />
* enable a single bidix preference<br />
* enable a single generator preference<br />
<br />
[[Category:Ideas_for_Google_Summer_of_Code]]</div>Unhammerhttps://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/More_robust_recursive_transferIdeas for Google Summer of Code/More robust recursive transfer2024-02-06T09:14:13Z<p>Unhammer: </p>
<hr />
<div>{{TOCD}}<br />
<br />
Currently, one has to be very careful in writing recursive transfer rules to ensure they don't get too deep or ambiguous, and that they cover full sentences. See in particular issues [https://github.com/apertium/apertium-recursive/issues/97 97] and [https://github.com/apertium/apertium-recursive/issues/80 80]. We would like linguists to be able to fearlessly write recursive (rtx) rules based on what makes linguistic sense, and have rtx-proc/rtx-comp deal with the computational/performance side.<br />
<br />
<br />
==Tasks==<br />
<br />
* There has been lots of research on parsing. Document relevant methods that might help solve the issues.<br />
* Implement solutions to the parse performance issues in the apertium-recursive code-base<br />
* If there is time, solve other [https://github.com/apertium/apertium-recursive/issues/ open issues]<br />
<br />
==Coding challenge==<br />
<br />
* Install Apertium (see [[Minimal installation from SVN]])<br />
* Compile apertium-recursive from source<br />
<br />
then<br />
<br />
* write a short grammar for a language you know that doesn't have one yet, to get to know the formalism<br />
<br />
or<br />
<br />
* Add a compiler warning, e.g. https://github.com/apertium/apertium-recursive/issues/89 https://github.com/apertium/apertium-recursive/issues/78<br />
<br />
==Frequently asked questions==<br />
* none yet, ''[[contact|ask us]] something!'' :)<br />
<br />
==See also==<br />
<br />
* [[Ideas_for_Google_Summer_of_Code/Robust_recursive_transfer]] The first GsoC recursive transfer project<br />
<br />
<br />
==Further reading==<br />
<br />
<br />
<br />
[[Category:Ideas for Google Summer of Code|Prototype recursive transfer implementations]]</div>Unhammerhttps://wiki.apertium.org/wiki/Apertium-kat/statsApertium-kat/stats2023-12-30T03:20:23Z<p>Firespeaker: /* Corpora */ 7c6f3b</p>
<hr />
<div>== Corpora ==<br />
wp2023<br />
* words: <section begin=wp2023-words />33.0M<section end=wp2023-words /><br />
* coverage: ~<section begin=wp2023-coverage />41.53<section end=wp2023-coverage />%<br />
* as of: 7c6f3b</div>Firespeaker