MediaWiki API result

This is the HTML representation of the JSON format. HTML is good for debugging, but is unsuitable for application use.

Specify the format parameter to change the output format. To see the non-HTML representation of the JSON format, set format=json.

See the complete documentation, or the API help for more information.

{
    "batchcomplete": "",
    "continue": {
        "gapcontinue": "Related_software",
        "continue": "gapcontinue||"
    },
    "warnings": {
        "main": {
            "*": "Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes."
        },
        "revisions": {
            "*": "Because \"rvslots\" was not specified, a legacy format has been used for the output. This format is deprecated, and in the future the new format will always be used."
        }
    },
    "query": {
        "pages": {
            "418": {
                "pageid": 418,
                "ns": 0,
                "title": "ReTraTos",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "[[L'outil ReTraTos|En fran\u00e7ais]]\n\n{{TOCD}}\n'''ReTraTos''' is a toolbox to build linguistic resources useful for machine translation (MT): bilingual dictionaries and transfer rules. The induction systems and open linguistic data can be used with the [[Apertium]] toolbox to build open-source MT systems.\n\n==Bilingual dictionaries==\n\nThis section describes how to use ReTraTos to create a bilingual dictionary for your Apertium language pair. You will need:\n\n* An aligned [[corpus]] of the two languages. For any pair of european languages, the JRC-Acquis corpus is recommended.\n* apertium\n* [[lttoolbox]]\n* ReTraTos\n* [[GIZA++]]\n* a lot of patience\n\n===Preparing the corpus===\n\nThe corpus should be in two files, each with one sentence per line. For example,\n\n;es.txt\n<pre>\nReconociendo que , en particular , ser\u00eda mutuamente beneficioso cooperar mediante el establecimiento de un programa com\u00fan de investigaciones y de \ndesarrollo ;\nConsiderando que un acuerdo que establezca una cooperaci\u00f3n en el \u00e1mbito de las utilizaciones pac\u00edficas de la energ\u00eda at\u00f3mica iniciar\u00eda fruct\u00edferos \nintercambios de experiencia \n...\n</pre>\n;it.txt\n<pre>\nRiconoscendo in particolare che sarebbe loro reciproco vantaggio cooperare con lo stabilire un programma comune di ricerche e di \nsviluppo ;\nConsiderando che un accordo inteso a stabilire una cooperazione nel campo degli usi pacifici dell'energia atomica darebbe inizio ad un proficuo \nscambio di esperienze\n...\n</pre>\n\nDepending on if you want to create entries for proper names, you could lower-case the whole corpus. Make sure that there are spaces in between any punctuation characters, otherwise the punctuation will be counted as part of the word. \n\n===Tagging the corpus===\n\nTag both of the files using the apertium-tagger. So, for example for Spanish:\n\n<pre>\n$ cat es.txt | apertium-destxt | lt-proc es-it.automorf.bin | apertium-tagger -g es-it.prob | apertium-retxt > es.tagged.txt &  \n</pre>\n\nAfter this, strip out the '^' and the '$' symbols from the es-tagged.txt file. This will result in lines that look something like:\n\n<pre>\nReconocer<vblex><ger> que<cnjsub> ,<cm> en<pr> particular<adj><mf><sg> ,<cm> ser<vbser><cni><p3><sg> mutuamente<adv> \nbeneficioso<adj><m><sg> *cooperar mediante<pr> el<det><def><m><sg> establecimiento<n><m><sg> de<pr> \nuno<det><ind><m><sg> programa<n><m><sg> com\u00fan<adj><mf><sg> de<pr> investigaci\u00f3n<n><f><pl> y<cnjcoo> \nde<pr> desarrollo<n><m><sg> ;<sent>\n</pre>\n\n\nReTraTos is very picky about formatting, so before removing '^' and '$' you might want to just remove anything that's ''not'' contained within '^' and '$', ie. something like\n<pre>\n# ^foo$-^bar$ => ^foo$ ^bar$     -^foo$ => ^foo$           ^bar$- => ^bar$\nsed 's%\\$[^^]*\\^%\\$ \\^%g'    | sed  's%^[^^]*\\^%\\^%g' | sed 's%\\$[^^]*$%\\$%g' |\\\n# remove '^' and '$'\nsed 's%[$^]% %g'\n</pre>\n\nThis'll save you lots of time later on... oh, and you should also just remove any lines that don't have at least one <code>^foo<bar>$</code> sequence in them (eg. a single alignment of '%' to '%', or something).\n\n===Aligning the corpus===\n{{see-also|Using GIZA++|Getting started with induction tools}}\n\nUse GIZA++ to align both of the files, the instructions can be found in the page [[using GIZA++]]. Once the alignment has been made, you will end up with a file that ends in <code>.A3.final</code>, this is your alignment file. Next you will need to convert the alignment to the LIHLA format that ReTraTos uses. The script on the [[Talk:ReTraTos|talk page]] serves this purpose. For Spanish--Italian, it would be called thusly:\n\n<pre>\n$ perl giza_to_lihla.pl es_it.aligned.A3.final ./es/ ./it/\n</pre>\n\nThis will put two files into the directories <code>./es/</code> and <code>./it/</code> which correspond to the lines in Spanish and Italian respectively. These LIHLA alignment files will end in <code>.al</code> and will look like the following:\n\n<pre>\n<s snum=6>Reconocer<vblex><ger>:1 que<cnjsub>:0 ,<cm>:0 en<pr>:2 particular<adj><mf><sg>:3 ,<cm>:0 ser<vbser><cni><p3><sg>:5 mutuamente<adv>:7 \nbeneficioso<adj><m><sg>:8 *cooperar:9 mediante<pr>:10 el<det><def><m><sg>:11 establecimiento<n><m><sg>:12 de<pr>:0 uno<det><ind><m><sg>:13 \nprograma<n><m><sg>:14 com\u00fan<adj><mf><sg>:15 de<pr>:16 investigaci\u00f3n<n><f><pl>:17 y<cnjcoo>:18 de<pr>:19 desarrollo<n><m><sg>:20 ;<sent>:0</s>\n</pre>\n\nIn order to use ReTraTos, the surface forms of each lexical unit need to be re-added so that the above example looks like the following:\n\n<pre>\n<s snum=6>Reconociendo/Reconocer<vblex><ger>:1 que/que<cnjsub>:0 ,/,<cm>:0 en/en<pr>:2 particular/particular<adj><mf><sg>:3 ,/,<cm>:0 \nser\u00eda/ser<vbser><cni><p3><sg>:5 mutuamente/mutuamente<adv>:7 beneficioso/beneficioso<adj><m><sg>:8 *cooperar/*cooperar:9 mediante/mediante<pr>:10 \nel/el<det><def><m><sg>:11 establecimiento/establecimiento<n><m><sg>:12 de/de<pr>:0 un/uno<det><ind><m><sg>:13 programa/programa<n><m><sg>:14\ncom\u00fan/com\u00fan<adj><mf><sg>:15 de/de<pr>:16 investigaciones/investigaci\u00f3n<n><f><pl>:17 y/y<cnjcoo>:18 de/de<pr>:19 desarollo/desarrollo<n><m><sg>:20\n;<sent>:0</s>\n</pre>\n\n===Running ReTraTos_lex===\n\nYou will need the header and footer of a bilingual dictionary in two separate files, for example, <code>dic_header.txt</code> and <code>dic_footer.txt</code> (see the examples in the package). Then the program for generating the dictionary (<code>ReTraTos_lex</code>) can be called like this:\n\n<pre>\n$ ReTraTos_lex -s ./es/es_it.aligned.A3.final.al -t ./it/es_it.aligned.A3.final.al -b dic_header.txt -e dic_footer.txt \n\nPRE-PROCESSAMENTO\n\n        Reading the examples ...  100000 examples read\n        Reading the examples ...  100000 examples read\n\nGERANDO LEXICO\n\n        Generating source-target dictionary ... OK\n        Generating target-source dictionary ... OK\n        Processing bilingual dictionary ... OK\n        Generalising bilingual dictionary ... OK\n        Cleaning equal attributes ... OK\n\nIMPRIMINDO LEXICO\n\n        Printing bilingual dictionary ... OK\n</pre>\n\nThe output file will be the <code>.dix</code> file.\n\n===Troubleshooting===\n\n;Wrongly tagged\n\nIf you are getting many messages like this, or even just one, it means that there is an error in one of your alignment files.\n\n<pre>\nWARNING: (Entrada::le_sentenca): String O<cnjcoo>.<sent>V<num><mf><pl> is wrongly tagged\nWARNING: (Entrada::le_sentenca): String )<rpar>.<sent> is wrongly tagged\nWARNING: (Entrada::le_sentenca): String a<pr>)<rpar> is wrongly tagged\nWARNING: (Entrada::le_sentenca): String 1991<num>.<sent> is wrongly tagged\n\n...\n</pre>\n\nThe alignments in the files must be in the format of <code>lemma<tags>:<alignment></code>, e.g. <code>1991<num>:1</code> or <code>pero<cnjcoo>:2</code>.  Forbidden are the alignments which have random strings between tags, e.g. <code>O<cnjcoo>.<sent>V<num><mf><pl></code> is a set of three lemma/tag pairs with the same alignment. Probably the punctuation has not been separated out correctly from the text with spaces.\n\nIt is possible to fix this manually, by going in and either removing punctuation, or inserting spaces and null-alignments, but it might be better if there are many warning messages to re-process the corpus.\n\n==See also==\n\n* [[Using GIZA++]]\n* [[Getting started with induction tools]]\n\n==External links==\n* [http://www.nilc.icmc.usp.br/nilc/projects/retratos.htm ReTraTos: Homepage] \n* [http://retratos.svn.sourceforge.net/viewvc/retratos/ ReTraTos: SVN]\n\n==Further reading==\n\n* Helena M. Caseli, Maria das Gra\u00e7as V. Nunes, Mikel L. Forcada. (2008) \"[http://www.dlsi.ua.es/~mlf/docum/caseli08p.pdf From free shallow monolingual resources to machine translation systems: easing the task]\", in ''Mixing Approaches To Machine Translation'', MATMT2008, proceedings (Donostia, Spain, Feb. 14, 2008), pp. 41--48\n* Helena M. Caseli, Maria das Gra\u00e7as V. Nunes, Mikel L. Forcada. (2008) \"Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation\". ''Machine Translation'' (to appear)\n\n\n[[Category:Tools]]\n[[Category:Documentation in English]]"
                    }
                ]
            },
            "5595": {
                "pageid": 5595,
                "ns": 0,
                "title": "Recursive transfer",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "{{TOCD}}\n\n==Todo==\n\n* <s>Make the parser output optionally original parse tree (SL syntax) and target parse tree (TL syntax).</s>\n* Attribute structures. These are defined in typical .t1x format with <code>def-attrs</code>\n* Make the parser robust &mdash; we should never get parse errors, though our trees may be mangled.\n\n==Process==\n\nThe parser has two trees, both are built simultaneously:\n\n* The '''source''' tree is parser-internal \n* The '''target''' tree is the \"abstract syntax tree\".\n\nWhen a sentence terminal (<code>S</code>) is reached, the target tree is traversed and printed out.\n\n==Questions==\n\n* What to do with a parse-fail.\n** Implicit glue rules\n*** How do we make sure that we never get <code>Syntax error</code> (e.g. really robust glue rules).\n** the glue rules would not compute anything, just allow for partial parses\n* How about unknown words...\n** they would be some non-terminal UNK that would be glued \u00a0by the all-encompassing glue rule from above.\n* Ambiguous grammars -> can be automatically disambiguated ?\n** Learn shift/reduce using target-language information ?\n* Converting right-recursive to left-recursive grammars.\n* How to apply macros in rules which have >1 non-terminal.\n* What on earth to do with blanks / formatting...\n* Do we try and find syntactic relations in the transfer, or do we pre-annotate (e.g. with CG) then use the tags from CG to constraint the parser?\n* Can/should we do unification in the grammar (e.g. to avoid rules like SN -> adj n matching when adj.G and n.G are not the same)?\n*: If a language uses CG, the rule SN -> @A\u2192 @N would only match where CG mapped @A\u2192 (and CG can do unification with less trouble, not mapping @A\u2192 where gender differs)\n** However, if we are to propagate attributes up the tree as well, it makes sense to have unification as well, so we can say <code>NP[gen=X] -&gt; D[gen=X] N[gen=X]</code>\n* Should the transfer allow for >1 possible TL translation ? to allow 'lexical selection' inside transfer as well as outside ?\n* Can we learn transfer grammars from aligned treebanks ?\n\n==Algorithms==\n\n* [http://en.wikipedia.org/wiki/CYK_algorithm CKY] (bottom-up)\n* [http://en.wikipedia.org/wiki/LALR_parser LALR(1)] (bottom-up)\n* [http://en.wikipedia.org/wiki/GLR_parser GLR] (bottom-up)\n* [http://en.wikipedia.org/wiki/Earley_parser Earley] (top-down)\n\n==Usage==\n\n<pre>\n$ svn co https://svn.code.sf.net/p/apertium/svn/branches/transfer4\n\n$ cd transfer4\n\n$ cd eng-kaz\n\n$ make\n</pre>\n\n;Files\n\n* <code>eng-kaz.grammar</code>: Transfer grammar file for English\u2192Kazakh\n* <code>eng-kaz.t1x</code>: Categories (terminals) and attributes for English\u2192Kazakh\n\n;Apply the transfer grammar\n\n<pre>\n$ cat input/input.01.txt | ./eng-kaz.parser \n^you<prn><subj><p2><mf><sp>/\u0441\u0435\u043d<prn><pers><subj><p2><mf><sp>$ ^Kazakhstan<np><top><sg>/\u049a\u0430\u0437\u0430\u049b\u0441\u0442\u0430\u043d<np><top><nom>$ ^to<pr>/$ \n^go<vblex><past>/\u0431\u0430\u0440<v><iv><past>$ ^that<cnjsub>/$ ^I<prn><subj><p1><mf><sg>/\u041c\u0435\u043d<prn><pers><subj><p1><mf><sg>$ \n^know<vblex><pres>/\u0431\u0456\u043b<v><tv><pres>$ ^.<sent>/.<sent>$ \n</pre>\n\n; Print out the source tree\n\n<pre>\n$ cat input/input.01.txt | ./eng-kaz.parser -s -p >/dev/null\n(S (S1 (PRNS (subj_pron (^I<prn><subj><p1><mf><sg>/\u041c\u0435\u043d<prn><pers><subj><p1><mf><sg>$))) \n(SV (V (pers_verb (^know<vblex><pres>/\u0431\u0456\u043b<v><tv><pres>$))))) (Ssub (cnjsub (^that<cnjsub>/$)) \n(S1 (PRNS (subj_pron (^you<prn><subj><p2><mf><sp>/\u0441\u0435\u043d<prn><pers><subj><p2><mf><sp>$))) \n(SV (V (pers_verb (^go<vblex><past>/\u0431\u0430\u0440<v><iv><past>$))) (SP (prep (^to<pr>/$)) \n(SN1 (SN (N (nom (^Kazakhstan<np><top><sg>/\u049a\u0430\u0437\u0430\u049b\u0441\u0442\u0430\u043d<np><top><nom>$))))))))) (X (sent (^.<sent>/.<sent>$))))\n</pre>\n\n; Print out the target tree\n\n<pre>\n$ cat input/input.01.txt | ./eng-kaz.parser -p >/dev/null\n(S (Ssub (S1 (PRNS (subj_pron (^you<prn><subj><p2><mf><sp>/\u0441\u0435\u043d<prn><pers><subj><p2><mf><sp>$))) \n(SV (SP (SN1 (SN (N (nom (^Kazakhstan<np><top><sg>/\u049a\u0430\u0437\u0430\u049b\u0441\u0442\u0430\u043d<np><top><nom>$))))) (prep (^to<pr>/$))) \n(V (pers_verb (^go<vblex><past>/\u0431\u0430\u0440<v><iv><past>$))))) (cnjsub (^that<cnjsub>/$))) \n(S1 (PRNS (subj_pron (^I<prn><subj><p1><mf><sg>/\u041c\u0435\u043d<prn><pers><subj><p1><mf><sg>$))) \n(SV (V (pers_verb (^know<vblex><pres>/\u0431\u0456\u043b<v><tv><pres>$))))) (X (sent (^.<sent>/.<sent>$))))\n</pre>\n\n==References==\n\n* Pr\u00f3sz\u00e9ky & Tihanyi (2002) \"MetaMorpho: A Pattern-Based Machine Translation System\"\n* White (1985) \"Characteristics of the METAL machine translation system at Production Stage\" (\u00a76)\n* Slocum (1982) \"The LRC Machine translation system: An application of State-of-the-Art ...\" (p.18)\n\n==Further reading==\n* [[User:Mlforcada/Robust LR for Transfer]]\n* MUHUA ZHU, JINGBO ZHU and HUIZHEN WANG (2013) \"Improving shift-reduce constituency parsing with large-scale unlabeled data\". ''Natural Language Engineering ''. October 2013, pp. 1--26\n* http://www.cs.cmu.edu/~./alavie/papers/thesis.pdf\n* http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-743.pdf\n\n==See also==\n\n* https://svn.code.sf.net/p/apertium/svn/branches/transfer4\n\n==External links==\n* [http://smlweb.cpsc.ucalgary.ca/start.html CFG tool]\n* [http://erg.delph-in.net/logon LOGON: Parse with the ERG]\n[[Category:Development]]\n[[Category:Transfer]]\n[[Category:Documentation in English]]"
                    }
                ]
            }
        }
    }
}