Text processing

POSIX-Extended Regular Expressions metacharacters

A sed micro-primer

Tasks

Task 1. ROT13 (“rotate by 13 places”, sometimes hyphenated ROT-13) is a simple letter substitution cipher that replaces a letter with the 13th letter after it, in the alphabet. ROT13 is a special case of the Caesar cipher which was developed in ancient Rome. The algorithm provides virtually no cryptographic security, and is often cited as a canonical example of weak encryption. (Wikipedia)

Using the manual try to write an invocation of tr that implements the ROT13 algorithm.

Task 2.

Print all the numbers less than 2000 whose digits are only 0 or 1. Write a list of those files to the file numbers.txt.

Hint: Commands that might be useful: seq, grep.

Task 3.

Write a pipe that will calculate the sum of numbers which are written one per line.

Hint: Commands that might be useful: (paste), (tr, rev, cut), bc.

Task 4. List recursively all the files in the /usr directory. Find the different permissions sets that are used for files and directories in this directory.

Hint: Commands that might be useful: ls, cut, sort, uniq.

Task 5. In the files 03-Proust-1.txt.xz, 03-Proust-2.txt.xz, 03-Proust-3.txt.xz, 03-Proust-4.txt.xz, 03-Proust-5.txt.xz, 03-Proust-6.txt.xz, 03-Proust-7.txt.xz you can find the compressed text of one the longest novels ever written, In Search of Lost Time by Marcel Proust (or rather its English translation). The novel consists of seven volumes.

Make sure to account for commas, colons, periods etc.

Hint: Commands that might be useful: xzcat, tr, sort, uniq, head, tail, grep.

Task 6. Find the longest word in the novel In Search of Lost Time. Depending on the solution you might need to write a very simple program in C.

Hint: Commands that might be useful (non-obviuos solution with ready-made tools): wc, tr, grep.