Semantic tagging

Uses[edit]

Approaches[edit]

Giellatekno
Grammatical Framework

Data sources[edit]

WikiData ?
- https://www.wikidata.org/wiki/Q169
- https://www.wikidata.org/wiki/Q200266

Embeddings ?

Implementation[edit]

Bilingual or multilingual[edit]

Often a word can be disambiguated using its translation in another language, for example the triple (estació, gare, station) defines a building meaning.

Existing examples[edit]

SET MangoFruitWords = ("aguacate"i)  OR ("albahaca"i) OR ("alimentario"i) OR ("alimenticio"i) OR ("aloe"i) OR ("anacardo"i) OR ("ananás"i) OR ("anchoa"i) OR (
"arroz"i) OR ("atún"i) OR ("azúcar"i) OR ("banana"i) OR ("banano"i) OR ("batido"i) OR ("boniato"i) OR ("brocheta"i) OR ("cacahuete"i) OR ("cacao"i) OR ("caram
elizar"i) OR ("caramelo"i) OR ("carpaccio"i) OR ("caviar"i) OR ("cereal"i) OR ("chirimoya"i) OR ("chocolate"i) OR ("<chutney>"i) OR ("clima"i) OR ("coco"i) OR
 ("cocotero"i) OR ("codorniz"i) OR ("comer"i) OR ("comercial"i) OR ("comida"i) OR ("cosecha"i) OR ("crema"i) OR ("cultivar"i) OR ("cultivo"i) OR ("cítrico"i) 
OR ("dátil"i) OR ("deshidratar"i) OR ("ensalada"i) OR ("exportación"i) OR ("foie"i) OR ("fragancia"i) OR ("fresa"i) OR ("fresco"i) OR ("fruta"i) OR ("fruto"i)
 OR ("gamba"i) OR ("gazpacho"i) OR ("guayaba"i) OR ("gustar"i) OR ("helado"i) OR ("hortaliza"i) OR ("ingrediente"i) OR ("jamón"i) OR ("jarabe"i) OR ("jardín"i
) OR ("jengibre"i) OR ("judía"i) OR ("langosta"i) OR ("langostino"i) OR ("lechuga"i) OR ("legumbre"i) OR ("maduro"i) OR ("mandarina"i) OR ("mandioca"i) OR ("m
anzana"i) OR ("maní"i) OR ("maracuyá"i) OR ("maíz"i) OR ("melocotón"i) OR ("melón"i) OR ("mono"i) OR ("naranja"i) OR ("naranjo"i) OR ("orquídea"i) OR ("orégan
o"i) OR ("palma"i) OR ("palmera"i) OR ("papaya"i) OR ("parmesano"i) OR ("patata"i) OR ("piscina"i) OR ("piña"i) OR ("plantación"i) OR ("plátano"i) OR ("pollo"
i) OR ("probar"i)OR ("puré"i) OR ("rodaja"i) OR ("ron"i) OR ("salsa"i) OR ("sorbete"i) OR ("sorgo"i) OR ("subsistencia"i) OR ("sésamo"i) OR ("tabaco"i) OR ("t
empura"i) OR ("tomate"i) OR ("trigo"i) OR ("triturar"i) OR ("tropical"i) OR ("tubérculo"i) OR ("vainilla"i) OR ("vinagre"i) OR ("yogur"i)  OR ("zumo"i) OR ("Á
frica");

SET MangoNotFruitWords = ("acero"i) OR ("azada"i) OR ("levantar"i) OR ("alzar"i) OR ("plata"i) OR ("arpón"i) OR ("azote"i) OR ("cuerno"i) OR ("bastón"i) OR ("
bolsa"i) OR ("brazo"i) OR ("silla"i) OR ("centímetro"i) OR ("cinturón"i) OR ("llave"i) OR ("clavar"i) OR ("cubierto"i) OR ("golpear"i) OR ("cuchillo"i) OR ("c
orazón"i) OR ("cuerda"i) OR ("cuerpo"i) OR ("cocina"i) OR ("cuero"i) OR ("cuchara"i) OR ("corto"i) OR ("hacha"i) OR ("herramienta"i) OR ("emplear"i) OR ("empu
ñar"i) OR ("escoba"i) OR ("espada"i) OR ("estirar"i) OR ("apretar"i) OR ("extremo"i) OR ("trabajo"i) OR ("meter"i) OR ("fuego"i) OR ("forma"i) OR ("látigo"i) 
OR ("hoja"i) OR ("madera"i) OR ("cuchillo"i) OR ("girar"i) OR ("escoba"i) OR ("grabar"i) OR ("grueso"i) OR ("instrumento"i) OR ("marfil"i) OR ("lanza"i) OR ("
lanzar"i) OR ("largo"i) OR ("maza"i) OR ("martillo"i) OR ("metal"i) OR ("mover"i) OR ("movimiento"i) OR ("navaja"i) OR ("limpiar"i) OR ("sartén"i) OR ("paella
"i) OR ("palo"i) OR ("pala"i) OR ("papel"i) OR ("pieza"i) OR ("piedra"i) OR ("pequeño"i) OR ("picar"i) OR ("pistola"i) OR ("plástico"i) OR ("plata"i) OR ("plu
ma"i) OR ("puerta"i) OR ("precioso"i) OR ("punta"i) OR ("puñal"i) OR ("cepillo"i) OR ("cepillo"i) OR ("redondo"i) OR ("ropa"i) OR ("rueda"i) OR ("sujetar"i) O
R ("mesa"i) OR ("atravesar"i) OR ("utilizar"i) OR ("alrededor"i) OR ("marfil"i);

SELECT:mango_fruta ("mango_fruta"i) IF (0 ("mango_fruta"i)) (0*/* MangoFruitWords) (NOT 0* MangoNotFruitWords) ;
REMOVE:mango_0 ("mango_fruta"i) IF (0 ("mango"i)) (0 ("mango_fruta"i)) ;

Ideas and notes[edit]


Thanks Xavi for the ideas...

What I've been thinking about is a module that would go after
biltrans and before lexical selection. It would essentially reweight
the possible translations based on a bag of words over a fixed
window of words or "sentences" (delimited with '.').

You could have source and target components, so e.g. you might
say that "fruit" is a semantic field or domain which includes,

"mango", "manzana", "plátano", "naranja", ...

and

"mango", "taronja", "poma"

In Catalan. These would be in the monolingual pairs. The
module would take both lists and the input

^querer<vblex><pri><p3><sg>/voler<vblex><pri><p3><sg>$
^mango<n><m><pl>/mànec<n><m><pl>/mango<n><m><pl>$
^y<cnjcoo>/i<cnjcoo>$
^manzana<n><f><pl>/poma<n><f><pl>$

And try and maximise semantic coherence, then it could reweight,
so e.g.

^querer<vblex><pri><p3><sg>/voler<vblex><pri><p3><sg>$
^mango<n><m><pl>/mango<n><m><pl><2.0>/mànec<n><m><pl><0.0>$
^y<cnjcoo>/i<cnjcoo>$
^manzana<n><f><pl>/poma<n><f><pl>$

And pass it to the lexical selection module which will choose the
one with the highest weight.

This would mean a new module, but it would require only minor
changes to the bilingual dictionary and lexical selection, and
wouldn't have any effect on transfer.

References[edit]

Semantic tagging

Contents

Uses[edit]

Approaches[edit]

Data sources[edit]

Implementation[edit]

Bilingual or multilingual[edit]

Existing examples[edit]

Ideas and notes[edit]

References[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools