User:Khannatanmai/Secondary tags features
< User:Khannatanmai
Jump to navigation
Jump to search
Revision as of 20:53, 17 May 2020 by Khannatanmai (talk | contribs)
This page will list all the features being added to the pipe to deal with secondary tags. To follow updates on the development, see User:Khannatanmai/New_Apertium_stream_format. This was done as part of the Google Summer of Code 2020. User:Khannatanmai/GSoC2020Proposal_Trimming. User:Khannatanmai/GSoC2020Progress.
For examples and tests, see the talk page
Module-specific features
Chunker (t1x): Pull Request
- Secondary tags (sectags) are ignored while pattern matching for rules.
- Attribute "tags" (in t1x) gets only primary and not secondary tags. (Ensures no regression)
- "whole" gets the whole LU including secondary tags.
- New attribute "sectags" gets all secondary tags. (can be used in clip).
- Secondary tags are added in the output LU from the LU that the lem/lemh is clipped from.
- If the lem/lemh comes from a variable in the output then the stags come from the LU which the lemma comes from, by tracing its variable assignment in
<let>
. - No regression. Stream without secondary tags work as-is.
- Works with MLUs.
- If there is a lemq in the LU, sectags appear before the lemq. Even if the lemq comes from a variable.
Generator: Pull Request
- Removes all trailing secondary tags from the input before giving it to FST matching.
- For input without secondary tags it works as earlier with no regression.
- All escaped characters are ignored inside secondary tags, as well as unescaped special characters ($,#,etc.) This applies for the tag prefix as well
This is needed for generation.