Difference between revisions of "User talk:Popcorndude/Recursive Transfer"
Jump to navigation
Jump to search
Popcorndude (talk | contribs) (→Implementation Ideas: new section) |
|||
Line 11: | Line 11: | ||
* https://www.aclweb.org/anthology/W/W08/W08-0311.pdf |
* https://www.aclweb.org/anthology/W/W08/W08-0311.pdf |
||
* http://www.cis.upenn.edu/~xtag/koreantag/nasr-et-al-1997.ps |
* http://www.cis.upenn.edu/~xtag/koreantag/nasr-et-al-1997.ps |
||
== Implementation Ideas == |
|||
Using Bison or something like it might be faster than writing a custom parser and it might also be one less source of error to have that component already exist. On the other hand, it would be really nice to allow rules to handle situations like |
|||
S |
|||
/ \ |
|||
/ VP |
|||
/ /\ |
|||
/ V NP |
|||
(N) /\ |
|||
| / \ |
|||
| / \ |
|||
| Adj ^ N |
|||
|________| |
|||
Here the subject is being stuck in the middle of another NP, which I'm really not sure how to deal with in Yacc (except maybe by manually reinserting the subject into the input stream when the object is parsed, but that seems like a bad idea). With writing a custom one, we could make it so that the Reduce operation can produce more than one node as output, so a rule for the above could be something like |
|||
NP.nom NP.acc -> adj.acc n.nom n.acc {2} {3 1}; |
|||
or something more general like |
|||
NP.$case * -> adj.$case * n.$case {3 1(gender=3.gender)} {2}; |
|||
# match an adjective and noun with the same case marking, separate by another word |
|||
# copy the gender marking from the noun to the adjective and output in N-Adj order |
|||
# then deal with the other word |
|||
Questions: |
|||
* Should the parser generate a C file and compile like Bison does or should it just generate a rule table and load that from a file? |
|||
* To what extent is it possible and desirable to put parts of this data in the monolingual repositories? |
|||
** If this were possible to the fullest extent it would substantially decrease the total number of rules that need to be written since the Catalan rules could then be reused in every pair that includes Catalan. |
|||
** This would probably require every language to be parsing to more or less the same abstract syntax tree. |
|||
** In any event, there are probably lexical things that affect syntax and would have to be pair-specific |
|||
[[User:Popcorndude|Popcorndude]] ([[User talk:Popcorndude|talk]]) 18:03, 8 March 2019 (CET) |
Revision as of 17:03, 8 March 2019
General comments and things to look at
- GLR
- PCFGs
Reading list
- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.625.910&rep=rep1&type=pdf
- https://www.aclweb.org/anthology/W/W08/W08-0311.pdf
- http://www.cis.upenn.edu/~xtag/koreantag/nasr-et-al-1997.ps
Implementation Ideas
Using Bison or something like it might be faster than writing a custom parser and it might also be one less source of error to have that component already exist. On the other hand, it would be really nice to allow rules to handle situations like
S / \ / VP / /\ / V NP (N) /\ | / \ | / \ | Adj ^ N |________|
Here the subject is being stuck in the middle of another NP, which I'm really not sure how to deal with in Yacc (except maybe by manually reinserting the subject into the input stream when the object is parsed, but that seems like a bad idea). With writing a custom one, we could make it so that the Reduce operation can produce more than one node as output, so a rule for the above could be something like
NP.nom NP.acc -> adj.acc n.nom n.acc {2} {3 1};
or something more general like
NP.$case * -> adj.$case * n.$case {3 1(gender=3.gender)} {2}; # match an adjective and noun with the same case marking, separate by another word # copy the gender marking from the noun to the adjective and output in N-Adj order # then deal with the other word
Questions:
- Should the parser generate a C file and compile like Bison does or should it just generate a rule table and load that from a file?
- To what extent is it possible and desirable to put parts of this data in the monolingual repositories?
- If this were possible to the fullest extent it would substantially decrease the total number of rules that need to be written since the Catalan rules could then be reused in every pair that includes Catalan.
- This would probably require every language to be parsing to more or less the same abstract syntax tree.
- In any event, there are probably lexical things that affect syntax and would have to be pair-specific
Popcorndude (talk) 18:03, 8 March 2019 (CET)