User-based chunking

Sometimes in translations it would be useful to be able to mark particular segments/chunks not for translation, but to give them an analysis for the purposes of translation.

Some use-cases:

Product names, long named entities

- Microsoft 365 for business is the right choice for your company
- A very popular film adaptation is "Gone with the Wind".

The use of quoted chunks as modifiers

Quotations in a second (or third) language

- And then he came back and said "Bu ne lan?"

Solutions

User-based, online

1. A user could mark in an interface that they don't want a particular span translating, each word would be prefixed then with a symbol. Sequences of these words would be merged by -separable or something like it. And the whole chunk would be either given a default tag, a tag determined by a classifier or a tag determined by rules.

Microsoft 365 for business is right choice for your company

+Microsoft +365 +for +business is right choice for your company

^+Microsoft/Microsoft<np><al>$ ^+365/365<num>$ ^+for/for<pr>$ ^+business/business$ ^is/be<vbser><pres><p3><sg>$ the right choice for your company

^+Microsoft 365 for business/Microsoft 365 for business<np><al>$ ^is/be<vbser><pres><p3><sg>$ the right choice for your company.

...

Engine-based, offline

Discovering these kind of things should be possible using only monolingual corpora, or resources such as Wikipedia.

User-based chunking

Solutions

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools