Difference between revisions of "One-liners"

From Apertium
Jump to navigation Jump to search
Line 3: Line 3:
==Useful (mostly) bash one-liners==
==Useful (mostly) bash one-liners==


* Perl regular-expression for removing all tags after the initial: perl -pe 's/(\^[^<]+<[^>]+>)(<\w+>)*\$/\1\$/g;'
* Perl regular-expression for removing all tags after the initial:

<pre>
perl -pe 's/(\^[^<]+<[^>]+>)(<\w+>)*\$/\1\$/g;'


^Lemma<V><Pres><Sg>$ -> ^Lemma<V>$
^Lemma<V><Pres><Sg>$ -> ^Lemma<V>$
</pre>

* Get unknown words from chunked text and sort by frequency:

<pre>
sed 's/\$\W*\^/$\n^/g' | grep '@' | sed 's/><.*/>$/g' | sort -f | uniq -ci | sort -gr
</pre>

* Strip newlines:


<pre>
* Get unknown words from chunked text and sort by frequency: sed 's/\$\W*\^/$\n^/g' | grep '@' | sed 's/><.*/>$/g' | sort -f | uniq -ci | sort -gr
sed ':a;N;$!ba;s/\n//g'
</pre>


[[Category:Tools]]
* Strip newlines: sed ':a;N;$!ba;s/\n//g'

Revision as of 19:50, 13 June 2010

Useful (mostly) bash one-liners

  • Perl regular-expression for removing all tags after the initial:
perl -pe 's/(\^[^<]+<[^>]+>)(<\w+>)*\$/\1\$/g;'

^Lemma<V><Pres><Sg>$ -> ^Lemma<V>$
  • Get unknown words from chunked text and sort by frequency:
sed 's/\$\W*\^/$\n^/g' | grep '@' | sed 's/><.*/>$/g' |  sort -f | uniq -ci  | sort -gr
  • Strip newlines:
sed ':a;N;$!ba;s/\n//g'