Standard I/O and Pipes

The coolest thing about the Unix shell is its ability to chain commands together into pipelines. Here's an example:

(~) 65% grep gatttgc big_file.fasta | wc -l
22

There are two commands here. grep searches a file or standard input for lines containing a particular string. Lines which contain the string are printed to standard output. wc -l is the familiar word count program, which counts words, lines and characters in a file or standard input. The -l command-line option instructs wc to print out just the line count. The | character, which is known as the "pipe" character, connects the two commands together so that the standard output of grep becomes the standard input of wc.

What does this pipe do? It prints out the number of lines in which the string "gatttgc" appears in the file big_file.fasta.

More Pipe Idioms

Pipes are very powerful. Here are some common command-line idioms.

Count the Number of Times a Pattern does NOT Appear in a File

The example at the top of this section showed you how to count the number of lines in which a particular string pattern appears in a file. What if you want to count the number of lines in which a pattern does not appear?

Simple. Reverse the test with the grep -v switch:

(~) 65% grep -v gatttgc big_file.fasta | wc -l
2921

Uniquify Lines in a File

If you have a long list of names in a text file, and you are concerned that there might be some duplicates, this will weed out the duplicates:

(~) 66% sort long_file.txt | uniq > unique.out

This works by sorting all the lines alphabetically and piping the result to the uniq program, which removes duplicate lines that occur together. The output is placed in a file named unique.out.

Concatenate Several Lists and Remove Duplicates

If you have several lists that might contain repeated entries among them, you can combine them into a single unique list by cating them together, then uniquifying them as before:

(~) 67% cat file1 file2 file3 file4 | sort | uniq

Count Unique Lines in a File

If you just want to know how many unique lines there are in the file, add a wc to the end of the pipe:

(~) 68% sort long_file.txt | uniq | wc -l

Page Through a Really Long Directory Listing

Pipe the output of ls to the more program, which shows a page at a time. If you have it, the less program is even better:

(~) 69% ls -l | more

Monitor a Rapidly Growing File for a Pattern

Pipe the output of tail -f (which monitors a growing file and prints out the new lines) to grep. For example, this will monitor the /var/log/syslog file for the appearance of e-mails addressed to mzhang:

(~) 70% tail -f /var/log/syslog | grep mzhang
<< Previous
Contents >> Next >>

Lincoln D. Stein, lstein@cshl.org
Cold Spring Harbor Laboratory
Last modified: Thu Sep 16 15:54:39 EDT 1999