Linguistic Resources Document

From Apertium
Jump to navigation Jump to search

A Linguistic Resources Document (LRD) is an XML document consisting of a set of linguistic resources (dictionaries, cross models, corpora, links to other LRDs, etc.).

This document can be used with apertium-crossdics to indicate which resources (dictionaries and cross models) can be crossed.

Structure

<?xml version="1.0" encoding="UTF-8"?>

<ling-resources>
   <name>...</name>
   <description>...</description>
   
   <resource>
      <property name="..." value="..."/>
      <property name="..." value="..."/>
      <property name="..." value="..."/>
      ...
   </resource>

   <resource-set>
      <name>...</name>
      <description>...</description>

      <resource>
         <property name="..." value="..."/>
         ...
      </resource>
      <resource>
         <property name="..." value="..."/>
         ...
      </resource>
      ...
   </resource-set>  

   <resource>
      <property name="..." value="..."/>
      <property name="..." value="..."/>
      <property name="..." value="..."/>
      ...
   </resource>
   ...  
</ling-resources>

Resource

A resource is defined with a set of properties.

<resource>
   <property name="name" value="apertium-es"/>
   <property name="type" value="mon"/>
   <property name="sl" value="es"/>
   <property name="for-crossing" value="yes"/>
   <property name="src" value="apertium-es-ca.es.dix"/>
   <property name="version" value="stable"/>
</resource>

Resource properties

Possible values for resources are:

  • name: the name of the resource.
  • type: the type of resource. Possible values are:
    • mon: morphological dictionary.
    • bil: bilingual dictionary.
    • crp: corpus.
    • lrd: link to Linguistic Resource Document.
    • cross-model: cross model document.
  • sl: source language (for example, in morphological and bilingual dictionaries)
  • tl: target language (for example, in bilingual dictionaries)
  • src: source (URL or file path)
  • version: version of the resource (for example, for dictionaries: stable, unstable, pre-aplha, etc).
  • ¿more?

Set of resources

It is possible to group a number of resources with the resource-set tag, as follows:

<resource-set>
   <name></name>
   <description></description>
   <resource>
      <property name="" value=""/>
      ...
   </resource>
   <resource>
      <property name="" value=""/>
      ...
   </resource>
   ...
</resource-set>

This organisation can be useful to group linguistic data from certain language pair.

Example of LRD

<?xml version="1.0" encoding="UTF-8"?>

<!-- Linguistic resources-->
<ling-resources>
   <name>My linguistic resources</name>
   <description>My linguistics resources: morphological and bilingual dictionaries, cross models, corpora, etc.</description>
   
   <resource-set>
      <name>My linguistic resources to get English-Spanish language pair.</name>
      <description>A description of this resource set</description>

      <!-- cross model en-ca-es -->
      <resource>
         <property name="name" value="cross-model-en-ca-es"/>
         <property name="type" value="cross-model"/>
         <property name="sl" value="en"/>
         <property name="tl" value="es"/>      
         <property name="for-crossing" value="yes"/>
         <property name="src" value="cross-model-es-ca-en.xml"/>
         <property name="version" value="stable"/>
      </resource>
      
      <!-- cross model es-ca-en -->
      <resource>
         <property name="name" value="cross-model-es-ca-en"/>
         <property name="type" value="cross-model"/>
         <property name="sl" value="es"/>
         <property name="tl" value="en"/>      
         <property name="for-crossing" value="yes"/>
         <property name="src" value="cross-model-es-ca-en.xml"/>
         <property name="version" value="stable"/>
      </resource>
      
      <!-- 'es' morphological dictionary -->
      <resource>
         <property name="name" value="apertium-es"/>
         <property name="type" value="mon"/>
         <property name="sl" value="es"/>
         <property name="for-crossing" value="yes"/>
         <property name="src" value="apertium-es-ca.es.dix"/>
         <property name="version" value="stable"/>
      </resource>
      
      <!-- 'en' morphological dictionary -->
      <resource>
         <property name="name" value="apertium-en"/>
         <property name="type" value="mon"/>
         <property name="sl" value="en"/>
         <property name="for-crossing" value="yes"/>
         <property name="src" value="apertium-en-ca.en.metadix"/>
         <property name="version" value="stable"/>
      </resource>
      
      <!-- 'en-ca' bilingual dictionary -->   
      <resource>
         <property name="name" value="apertium-en-ca"/>
         <property name="type" value="bil"/>
         <property name="sl" value="en"/>
         <property name="tl" value="ca"/>
         <property name="for-crossing" value="yes"/>
         <property name="src" value="apertium-en-ca.en-ca.dix"/>
         <property name="version" value="stable"/>
      </resource>
      
      <!-- 'es-ca' bilingual dictionary -->
      <resource>
         <property name="name" value="apertium-es-ca"/>
         <property name="type" value="bil"/>
         <property name="sl" value="es"/>
         <property name="tl" value="ca"/>
         <property name="for-crossing" value="yes"/>
         <property name="src" value="apertium-es-ca.es-ca.dix"/>
         <property name="version" value="stable"/>
      </resource>
   </resource-set>
   
   <!-- Single corpus file -->
   <resource>
      <property name="name" value="corpus-es"/>
      <property name="type" value="corpus"/>
      <property name="sl" value="es"/>
      <property name="src" value="corpus-es.crp"/>        
   </resource>
   
   <!-- Repository (files like this) -->
   <resource>
      <property name="name" value="other-resources-1"/>
      <property name="type" value="lrd"/>
      <property name="src" value="other-ling-resources-file.xml"/>        
   </resource>
   
</ling-resources>

See also