the empty quarter

Archive for October, 2011

Column-wise text manipulation

GNU/Linux includes many utilities for working with text files through the shell. In this post we take a quick look at accessing and manipulating text files in a “column-wise” mode. Suppose you have the following two files, each with two columns separated by the TAB character. $cat file1 Alice Paris Bob Tokyo Mary London John New York $cat file2 […]

Monday, October 17th, 2011 at 17:02 | 0 comments

Categories: Linux

Tags: commandline, sed

Remove HTML tags with sed

Sed can be used to strip out all HTML or XML tags from a file and get the plain text version. Suppose you have file gnulinux.html with the following contents: <p>The combination of <a href=“/gnu/linux-and-gnu.html“>GNU and Linux</a> is the <strong>GNU/Linux operating system</strong>, now used by millions and sometimes incorrectly called simply “Linux“.</p> Tempting but incorrect […]

Monday, October 17th, 2011 at 12:16 | 0 comments

Categories: Linux

Tags: commandline, sed

TOP

Archive for October, 2011

Column-wise text manipulation

Remove HTML tags with sed

Recent Posts

Categories

Tags

Archives