Difference between revisions of "Listing Apertium element using command-line"

From Apertium
Jump to navigation Jump to search
(→‎Listing a project branche: New command-line, the previous one stopped working)
 
(3 intermediate revisions by the same user not shown)
Line 31: Line 31:
 
<nowiki>https://github.com/apertium/apertium-</nowiki>''<branche_name>''/blob/master/.gitmodules
 
<nowiki>https://github.com/apertium/apertium-</nowiki>''<branche_name>''/blob/master/.gitmodules
   
and extract lines containing '''"submodule"''' , or lines containing '''"path ="'''
+
and extract lines containing '''"submodule"''' , or lines containing '''<nowiki>'<span class="pl-e">'</nowiki>''' (it seems to be the same ones).
   
 
Then, you need to keep only project elements from the extracted lines.
 
Then, you need to keep only project elements from the extracted lines.
   
Code example to get data from lines containing "path =" :
+
Code example to get data from lines containing "submodule" :
   
 
#!/bin/sh<br />
 
#!/bin/sh<br />
  +
<nowiki>motifsed='<span class="pl-e">'</nowiki><br />
 
wget -q <nowiki>https://github.com/apertium/apertium-$1/blob/master/.gitmodules</nowiki><br />
 
wget -q <nowiki>https://github.com/apertium/apertium-$1/blob/master/.gitmodules</nowiki><br />
fgrep "path =" .gitmodules |
+
fgrep "submodule" .gitmodules |
sed "s/.*path = //
+
sed "s/.*$motifsed//
s/<\/td>//" | sort<br />
+
s/<\/span>.*//" | sort<br />
 
rm .gitmodules*
 
rm .gitmodules*
   
Line 71: Line 72:
 
== Listing file and subdirectories of a project element ==
 
== Listing file and subdirectories of a project element ==
   
Pour lister des fichiers et répertoires of a project element, on peut exécuter la commande '''svn list -v''' sur le sous-répertoire '''trunk''' (!!!) de cet élément.
+
To list file and subdirectories of a project element, you can execute the command '''svn list -v''' on the '''trunk''' subdirectory (!!!) of this element.
   
Par exemple, pour le langage apertium-fra :
+
For example, for the apertium-fra language:
   
 
svn list -v <nowiki>http://github.com/apertium/apertium-fra/trunk</nowiki>
 
svn list -v <nowiki>http://github.com/apertium/apertium-fra/trunk</nowiki>
   
  +
The one line message appearing at the beginning may be deleted, as git files and reference to the current directory whose names start by a .
Le message d'une ligne qui apparait au début peut être supprimé, tout comme les fichiers git et la référence au répertoire courant dont les noms commencent par .
 
   
Enfin, si on appelle la commande '''svn list''' sur un élément qui n'existe pas, subversion demande un mot de passe. On peut éviter ce problème en reriigeant l'entrée standard sur /dev/null
+
If you call the command '''svn list''' on a non existing element, subversion will ask for pa password. This problem can be avoided by setting standard input to /dev/null
   
  +
Code example :
Exemple de code :
 
   
 
#!/bin/sh<br />
 
#!/bin/sh<br />
Line 87: Line 88:
 
fgrep -v " ." | tail -n +2
 
fgrep -v " ." | tail -n +2
   
== Last change date for a project element (using svn) ==
+
== Date of the last change for a project element (using svn) ==
   
 
=== Solution using svn list -v ===
 
=== Solution using svn list -v ===
   
Pour chercher la date de dernière modification of a project element, il suffit de chercher dans le résultat d'une commande '''svn list''' le dernier fichier qui a été modifié. Dans le cas du projet Apertium, les modifications importantes sont dans le répertoire racine de l'élément du projet. On n'examinera pas le contenu des sous-répertoires.
+
To find the date of the last change for a project element, you just need to search into the result of a '''svn list''' command the last file which was changed. For the Apertium project, the important changes are in the root directory of the project element. The contents of subdirectories will not be examined.
   
Cependant, avec la commande '''svn list -v''', deux problèmes se posent pour l'affichage des dates :
+
However, with the '''svn list -v''' command, there are two issues for displaying dates:
   
  +
* An abbreviated month name is displayed, that would be easier with a month number
* Un nom de mois abrégé est affiché, ce serait plus facile avec un numéro de mois
 
  +
* For files changed during the last 6 months, we get the time of the change instead of the year. The change may have occurred this year or at the end of the previous year.
* Pour les fichiers modifiés dans les 6 derniers mois, on a l'heure de modification à la place de l'année. la modification peut avoir eu lieu cette année ou à la fin de l'année précédente.
 
   
  +
It will be necessary to change displayed dates to solve these problems. These changes can be done using 2 '''sed''' commands and by creating two sedfiles.
Il faudra transformer les dates affichées pour résoudre ces problèmes. On pourra faire ces transformations en utilisant 2 commandes '''sed''' et en créant deux "sedfiles" (fichiers de commandes sed).
 
   
  +
The first of this sedfile will depend on the language used to display month names. If the display is in English (that can be forced doing '''LANG=en_US.UTF-8'''), it will be possible tu use the following '''numbering_month''' file:
Le premier de ces fichiers dépendra de la langue dans laquelle est affichée les noms de mois. Si l'affichage est en lanque anglaise (ce que l'on peut forcer en faisant '''LANG=en_US.UTF-8'''), on pourra utiliser le fichier
 
'''numerote-mois''' suivant :
 
   
 
s/ Jan / m01 /
 
s/ Jan / m01 /
Line 116: Line 116:
 
s/ Dec / m12 /
 
s/ Dec / m12 /
   
  +
A second sedfile will generate a year number before the month number. The contents of this file will change every month. A first script will be used to generate it:
Un deuxième fichier sed va générer un numéro d'année avant le numéro de mois. Le contenu de ce fichier changera d'un mois à l'autre. On utilisera un premier script pour le générer :
 
   
 
#!/bin/sh<br />
 
#!/bin/sh<br />
an=`date +%Y`
+
year=`date +%Y`
changemois=`date +%m`<br />
+
changemonth=`date +%m`<br />
> an_mois<br />
+
> year_month<br />
mois=01<br />
+
month=01<br />
while [ $mois -le $changemois ]
+
while [ $month -le $changemonth ]
 
do
 
do
echo "s/ m$mois /$an $mois /" >> an_mois
+
echo "s/ m$month /$year $month /" >> year_month
mois=`expr $mois + 101 | cut -c2-`
+
month=`expr $month + 101 | cut -c2-`
 
done<br />
 
done<br />
an=`expr $an - 1`<br />
+
year=`expr $year - 1`<br />
while [ $mois -le 12 ]
+
while [ $month -le 12 ]
 
do
 
do
echo "s/ m$mois /$an $mois /" >> an_mois
+
echo "s/ m$month /$year $month /" >> year_month
mois=`expr $mois + 101 | cut -c2-`
+
month=`expr $month + 101 | cut -c2-`
 
done
 
done
   
Le script principal permettant de chercher la date de dernière modification de l'élément du projet va procéder de la manière suivante :
+
The main script principal to fetch the last modification date of the project element will proceed as follows:
   
* faire un '''svn list -v''' sur le sous-répertoire ''trunk'' de l'élément
+
* do a '''svn list -v''' on the ''trunk'' subdirectory of the element
  +
* take off files and directories whose name starts by a .
* supprimer les fichiers et répertoires dont le nom commence par .
 
  +
* reorder columns to get a year month day filename display (warning, for subdirectories ((identifiable by a / after the name) the field file_size is empty)
* réordonner les colonnes pour faire un affichage an mois jour nom_fichier (attention, pour les répertoire (identifiables par un / après le nom) le champ taille_du_fichier est vide)
 
  +
* replace abbreviated month names by m followed by a month number (using 2 digits)
* remplacer les noms de mois abrégés par m suivi d'un numéro de mois (sur 2 chiffres)
 
   
  +
Then, there are two possibilities:
Deux cas se présentent alors :
 
   
  +
1) All the files are at least 6 months old: every line start with the year.
1) Tous les fichiers sont vieux d'au moins 6 mois : toutes les lignes commencent par l'année.
 
   
  +
In this case, lines are sortered by alphabetical order. The most recent date is the date for the file (or the subdirectory) which appears on the last line.
Dans ce cas, on tri les lignes obtenues par ordre alphabétique. La date de la modification la plus récente est celle du fichier (ou du sous-répetoire) qui apparait en dernière ligne.
 
   
  +
2) Several files were changed during the 6 last months. We can see lines starting by the time of the change which includes : symbol
2) Certains fichiers ont été modifiés dans les 6 dernier mois. On trouve des heures en début de l'igne qu'on peut identifier par la présence du symbole :
 
   
  +
In this case, it is enough to look at theses files. The time at the beginning of the line will be took off and replaced by the year. Then after an alphabetical sort, the date appearing on the last line will be selected.
Dans ce cas, il suffit de s'intéresser à ces fichiers. On enlèvera l'heure en début de ligne pour mettre l'année à la place. Puis après un tri alphabétique, on prendra la date qui apparail en dernier.
 
   
  +
The script doing this process is the following:
Le script qui effectue ce traitement est le suivant :
 
   
 
#!/bin/sh<br />
 
#!/bin/sh<br />
liste1=/tmp/listepaire1_$$
+
list1=/tmp/listpair1_$$
liste2=/tmp/listepaire2_$$<br />
+
list2=/tmp/listpair2_$$<br />
 
svn list -v <nowiki>http://github.com/apertium/$1/trunk</nowiki> < /dev/null |
 
svn list -v <nowiki>http://github.com/apertium/$1/trunk</nowiki> < /dev/null |
fgrep -v " ." | tail -n +2 > $liste1<br />
+
fgrep -v " ." | tail -n +2 > $list1<br />
grep -v "/$" $liste1 | awk '{print $6 " " $4 " " $5}' > $liste2
+
grep -v "/$" $list1 | awk '{print $6 " " $4 " " $5}' > $list2
grep "/$" $liste1 | awk '{print $5 " " $3 " " $4}' >> $liste2<br />
+
grep "/$" $list1 | awk '{print $5 " " $3 " " $4}' >> $list2<br />
sed -f numerote-mois $liste2 | sort > $liste1<br />
+
sed -f numbering_month $list2 | sort > $list1<br />
fgrep : $liste1 > $liste2<br />
+
fgrep : $list1 > $list2<br />
if test -s $liste2
+
if test -s $list2
 
then
 
then
cut -c6- $liste2 | sed -f an_mois | sort | tail -1 |
+
cut -c6- $list2 | sed -f year_month | sort | tail -1 |
 
awk '{print $3 " " $2 " " $1}'
 
awk '{print $3 " " $2 " " $1}'
 
else
 
else
tail -1 $liste1 | sed "s/m//" |
+
tail -1 $list1 | sed "s/m//" |
 
awk '{print $3 " " $2 " " $1}'
 
awk '{print $3 " " $2 " " $1}'
 
fi<br />
 
fi<br />
rm $liste1 $liste2
+
rm $list1 $list2
   
 
=== Solution using svn list --xml ===
 
=== Solution using svn list --xml ===
   
  +
With this option of the svn command, the file name and the last change date appear on separate lines and are surrounded by XML tags.
Avec cette option de la commande svn, le nom du fichier et sa date de mise à jour apparaissent sur des lignes distinctes et sont entourés de balises XML.
 
   
  +
These lines will have to be selected and put together.
Il faudra sélectionner ces lignes et les regrouper.
 
   
  +
As previously, after taking off lines containing files whose name starts by a . keeping only the date and doing an alphabetical sort, the last date apearing is the one to display.
Comme précédemment, après avoir enlevé les lignes concernant des fichiers dont le nom commence par . puis conservé seulement la date et fait un tri par ordre alphabétique, c'est la date qui apparait en dernier qui nous intéresse.
 
   
  +
Here is an example of a script doing the process.
Voici un exemple de script qui fait le traitement.
 
   
 
#!/bin/sh<br />
 
#!/bin/sh<br />
Line 187: Line 187:
 
paste - - | fgrep -v ">." | sed "s/.*<date>//" | sort | tail -1 | cut -c1-10
 
paste - - | fgrep -v ">." | sed "s/.*<date>//" | sort | tail -1 | cut -c1-10
   
  +
In this example, the date appears using year-month-day format.
Dans cet exemple, la date apparait pous la forme an-mois-jour
 
   
  +
It would be easy to change the display as day/month/year and also to display the update time that would appear if display was not truncated to the first 10 characters.
On pourrait continuer le traitement pour avoir un affichage jour/mois/an et également afficher l'heure de la mise à jour qui apparaitrait aussi si on n'avait pas tronqué l'affichage aux 10 premiers caractère.
 
   
 
[[Category:Documentation]]
 
[[Category:Documentation]]

Latest revision as of 21:34, 17 February 2019

En français

Changes in Apertium implementation mode[edit]

Originally, or at least for ten years, the Apertium project was archived on sourceforge using subversion software.

The project was then organised as a tree :

  • The repository apertium directory only countained subdirectories called branches.
  • Each branche included several subdirectories and each of them contained one of the following three elements:
    • Apertium project software,
    • a language pair,
    • the reference files for a language.

Language pairs, could be implemented into 4 distinct subdirectories (incubator, nursery, staging et trunk) according to their progress.

In the current Apertium implementation, any project element is directly located in a first level subdirectory of "apertium" (https://github.com/apertium/) and branches, and branches, which corresponded to first level subdirectories:

  • apertium-incubator
  • apertium-nursery
  • apertium-staging
  • apertium-trunk
  • apertium-languages
  • apertium-tools

only countain a list of project elements.

Listing a project branche[edit]

To get the list of elements in a branche, you just need to download the web page :

https://github.com/apertium/apertium-<branche_name>/blob/master/.gitmodules

and extract lines containing "submodule" , or lines containing '<span class="pl-e">' (it seems to be the same ones).

Then, you need to keep only project elements from the extracted lines.

Code example to get data from lines containing "submodule" :

#!/bin/sh
motifsed='<span class="pl-e">'
wget -q https://github.com/apertium/apertium-$1/blob/master/.gitmodules
fgrep "submodule" .gitmodules | sed "s/.*$motifsed// s/<\/span>.*//" | sort
rm .gitmodules*

Call example (assuming the command is called branchlist)

branchlist trunk

Date of last change for a project element (trivial method)[edit]

This method also consists in retrieving information from a web page. This is the page:

https://api.github.com/repos/apertium/<element_name>

You will need to retrieve date and hour fron line containing "pushed_at"

Code example :

#!/bin/sh
wget -q https://api.github.com/repos/apertium/$1
fgrep "pushed_at" $1 | sed "s/.*: \"// s/T/ / s/Z.*//"
rm $1

This system works fine with recently changed Apertium project elements. However, for the other elements, you will never get a date prior to March 2018. The reason is the Apertium project was transferred from sourceforge to github in March 2018.

For the elements not updated for a long time, it is possible to use the command svn list and to take off referencies to git repository files (their name stard by a . ) before searching the most recently changed.

Listing file and subdirectories of a project element[edit]

To list file and subdirectories of a project element, you can execute the command svn list -v on the trunk subdirectory (!!!) of this element.

For example, for the apertium-fra language:

svn list -v http://github.com/apertium/apertium-fra/trunk

The one line message appearing at the beginning may be deleted, as git files and reference to the current directory whose names start by a .

If you call the command svn list on a non existing element, subversion will ask for pa password. This problem can be avoided by setting standard input to /dev/null

Code example :

#!/bin/sh
svn list -v http://github.com/apertium/$1/trunk < /dev/null | fgrep -v " ." | tail -n +2

Date of the last change for a project element (using svn)[edit]

Solution using svn list -v[edit]

To find the date of the last change for a project element, you just need to search into the result of a svn list command the last file which was changed. For the Apertium project, the important changes are in the root directory of the project element. The contents of subdirectories will not be examined.

However, with the svn list -v command, there are two issues for displaying dates:

  • An abbreviated month name is displayed, that would be easier with a month number
  • For files changed during the last 6 months, we get the time of the change instead of the year. The change may have occurred this year or at the end of the previous year.

It will be necessary to change displayed dates to solve these problems. These changes can be done using 2 sed commands and by creating two sedfiles.

The first of this sedfile will depend on the language used to display month names. If the display is in English (that can be forced doing LANG=en_US.UTF-8), it will be possible tu use the following numbering_month file:

s/ Jan / m01 /
s/ Feb / m02 /
s/ Mar / m03 /
s/ Apr / m04 /
s/ May / m05 /
s/ Jun / m06 /
s/ Jul / m07 /
s/ Aug / m08 /
s/ Sep / m09 /
s/ Oct / m10 /
s/ Nov / m11 /
s/ Dec / m12 /

A second sedfile will generate a year number before the month number. The contents of this file will change every month. A first script will be used to generate it:

#!/bin/sh
year=`date +%Y` changemonth=`date +%m`
> year_month
month=01
while [ $month -le $changemonth ] do echo "s/ m$month /$year $month /" >> year_month month=`expr $month + 101 | cut -c2-` done
year=`expr $year - 1`
while [ $month -le 12 ] do echo "s/ m$month /$year $month /" >> year_month month=`expr $month + 101 | cut -c2-` done

The main script principal to fetch the last modification date of the project element will proceed as follows:

  • do a svn list -v on the trunk subdirectory of the element
  • take off files and directories whose name starts by a .
  • reorder columns to get a year month day filename display (warning, for subdirectories ((identifiable by a / after the name) the field file_size is empty)
  • replace abbreviated month names by m followed by a month number (using 2 digits)

Then, there are two possibilities:

1) All the files are at least 6 months old: every line start with the year.

In this case, lines are sortered by alphabetical order. The most recent date is the date for the file (or the subdirectory) which appears on the last line.

2) Several files were changed during the 6 last months. We can see lines starting by the time of the change which includes : symbol

In this case, it is enough to look at theses files. The time at the beginning of the line will be took off and replaced by the year. Then after an alphabetical sort, the date appearing on the last line will be selected.

The script doing this process is the following:

#!/bin/sh
list1=/tmp/listpair1_$$ list2=/tmp/listpair2_$$
svn list -v http://github.com/apertium/$1/trunk < /dev/null | fgrep -v " ." | tail -n +2 > $list1
grep -v "/$" $list1 | awk '{print $6 " " $4 " " $5}' > $list2 grep "/$" $list1 | awk '{print $5 " " $3 " " $4}' >> $list2
sed -f numbering_month $list2 | sort > $list1
fgrep : $list1 > $list2
if test -s $list2 then cut -c6- $list2 | sed -f year_month | sort | tail -1 | awk '{print $3 " " $2 " " $1}' else tail -1 $list1 | sed "s/m//" | awk '{print $3 " " $2 " " $1}' fi
rm $list1 $list2

Solution using svn list --xml[edit]

With this option of the svn command, the file name and the last change date appear on separate lines and are surrounded by XML tags.

These lines will have to be selected and put together.

As previously, after taking off lines containing files whose name starts by a . keeping only the date and doing an alphabetical sort, the last date apearing is the one to display.

Here is an example of a script doing the process.

#!/bin/sh
svn list --xml http://github.com/apertium/$1/trunk < /dev/null | egrep "<(nam|dat)e>" | paste - - | fgrep -v ">." | sed "s/.*<date>//" | sort | tail -1 | cut -c1-10

In this example, the date appears using year-month-day format.

It would be easy to change the display as day/month/year and also to display the update time that would appear if display was not truncated to the first 10 characters.