Difference between revisions of "Talk:Corpus test"

From Apertium
Jump to navigation Jump to search
(Answer)
 
Line 15: Line 15:
 
: I'm pretty sure that's not the intention; I think "nl" is here used to find the ''corpus'' line numbers, not dix line numbers --[[User:Unhammer|unhammer]] 11:21, 5 January 2012 (UTC)
 
: I'm pretty sure that's not the intention; I think "nl" is here used to find the ''corpus'' line numbers, not dix line numbers --[[User:Unhammer|unhammer]] 11:21, 5 January 2012 (UTC)
 
:: I perfectly agree with you. Let see what I put when I translated the page in French [[Test_de_corpus#Cr.C3.A9ation_d.27un_corpus]]. may be we should ask Francis what he means. I don't do that every time I find something difficult in an English text, I rather put a (?) and generaly Francis texts are more easy to follow than other English texts. <code>nl -s</code> does not work either on my computers. [[User:Bech|Bech]] 11:42, 5 January 2012 (UTC)
 
:: I perfectly agree with you. Let see what I put when I translated the page in French [[Test_de_corpus#Cr.C3.A9ation_d.27un_corpus]]. may be we should ask Francis what he means. I don't do that every time I find something difficult in an English text, I rather put a (?) and generaly Francis texts are more easy to follow than other English texts. <code>nl -s</code> does not work either on my computers. [[User:Bech|Bech]] 11:42, 5 January 2012 (UTC)
  +
  +
It seems that <code>nl</code> numbers lines in a file. The command is in the Debian (and Ubuntu?) package <code>coreutils</code>.
  +
  +
<pre>
  +
$ man nl
  +
  +
NAME
  +
nl - number lines of files
  +
</pre>
  +
  +
- [[User:Francis Tyers|Francis Tyers]] 00:18, 15 January 2012 (UTC)

Latest revision as of 00:18, 15 January 2012

Creation of a corpus[edit]

These 2 lines are not very clear for a non english native :

  • Grep out all lines with # and @ - this will help you find problems in bidix (@) and target language monodix (#).
  • Pipe through nl -s '. ' to get the right line numbers.

An example would be better. And on my computer, nl -s does not work, but the option -n of grep (fgrep, egrep) does.

Why not something like :

  • fgrep -n "#" monodix
  • fgrep -n "@" bidix
I'm pretty sure that's not the intention; I think "nl" is here used to find the corpus line numbers, not dix line numbers --unhammer 11:21, 5 January 2012 (UTC)
I perfectly agree with you. Let see what I put when I translated the page in French Test_de_corpus#Cr.C3.A9ation_d.27un_corpus. may be we should ask Francis what he means. I don't do that every time I find something difficult in an English text, I rather put a (?) and generaly Francis texts are more easy to follow than other English texts. nl -s does not work either on my computers. Bech 11:42, 5 January 2012 (UTC)

It seems that nl numbers lines in a file. The command is in the Debian (and Ubuntu?) package coreutils.

$ man nl

NAME
       nl - number lines of files

- Francis Tyers 00:18, 15 January 2012 (UTC)