Using the Unix Operating System 2

Input/Output Redirection, Command Pipes, Job and Process Control

Simon Prochnik

Genome Informatics

Lecture Notes

Saving Command Output in a File

$ grep ZK105 /net/share/unix1/cosmids?.txt > lines_with_ZK105
We can also append (add more lines) to an existing file:
$ grep ZK105 /net/share/unix1/cosmids?.txt > multiple_lines
$ grep L2 /net/share/unix1/cosmids?.txt >> multiple_lines

Sending Output of One Command Directly to Another

$ grep ZK105 /net/share/unix1/cosmids?.txt > lines_with_ZK105
$ sort -r lines_with_ZK105
versus
$ grep ZK105 /net/share/unix1/cosmids?.txt | sort -r
We can also do
$ grep ZK105 /net/share/unix1/cosmids?.txt | sort -r > reverse_sorted_lines_with_ZK105

Standard Input, Standard Output

Each Unix command can read from its standard input and write to its standard output. By default, standard input comes from the terminal (that is, lines that we type in), but we can use < to feed a command the contents of a file as its standard input. By default, standard output goes to the terminal (that is, the screen), but we can use > to redirect a command's standard output to a file. We can use | to pipe the standard output of the command on the left directly to the standard input of the command on the right.

For many commands

$ grep abc some_file
is equivalent to
$ grep abc < some_file
This is a convient convention: if a file is specified on the command line, the command takes its input from the file; if not, the command takes its input from standard input.

Sometimes this convention can lead to unexpected behavior. For example, if you type

$ grep abc
grep will read its standard input from your terminal. That is, each line you type will go to grep, until you type control-D, which indicates that you are done. grep is only an example. cat works the same way, as will many perl programs that you will write.

Standard Error

Unix commands can also write to their standard error. Standard error is where, by convention, many commands write messages if they encounter errors. For example, When you run
$ ls non_existent_file* > file_list
the error message
ls: non_existent_file*: No such file or directory
still goes to your screen and not into file_list. Why? Because ls writes the error message to standard error, and > grabs only the standard output from ls, while leaving standard error alone.

How can you make all of lines that ls generates go to the file file_list? That depends on which shell you are running. To find that out, type

$ echo $SHELL
If you are running csh or tcsh then
$ ls non_existent_file* >& file_list
will do the trick. If you are running sh or bash then
$ ls -l non_existent_file*> file_list 2>&1 
is the way to go.

Why do we care about standard error? When we run commands in the background we often need to capture their error messages as well as their standard error. In addition, when we start running programs from perl (for example running blast from perl and then taking its output, massaging it, and then feeding it to clustalw) we often want our perl scripts to capture standard error from the programs they are running.

Putting It in the Background

Unix makes it easy to to have many programs running at once. For example, if we have a long-running blast job, we can "put it in the background":
$ blastall -p blastn -i query1 -d nr >& query1_results &
Even if we close our terminal and log out, blastall will continue to put all its output in query1_results (as long as we do not reboot the computer).

When we put commands in the background, we have to solve three problems:

  1. Feeding them input.

  2. Keeping their output (both standard output and standard error) somewhere where you can find it later (using output redirection).

  3. Keeping track of the commands we put in the background (and, sometimes, especially with programs we write ourselves, stopping them without rebooting the computer).

Putting It in the Background Later

$ xemacs
(xemacs is running in the foreground.) Type Control-Z
Suspended
$ bg
[1]    xemacs &
(xemacs is running in the background.)

Putting a command in the background after we started it makes sense only if does not produce any output we need to capture.

Jobs

When we put commands in the background, we can deal with them as jobs.
$ xemacs &
[1] 11682
$ gimp &
[2] 11683
$ jobs
[1]  + Running                       xemacs
[2]    Done                          gimp
Using this information you can bring jobs back into the foreground:
$ fg %2
gimp
brings gimp into the foreground.

Processes

The jobs command only shows us jobs that we started directly from a particular shell. Once we exit that shell, we can no longer use their job numbers, nor can we manipulate them with the jobs, bg, or fg commands. Instead, we have to deal with them as processes.

We can see information about (some) of our processes with ps command.

$ ps -l
  F S   UID   PID  PPID  C PRI  NI ADDR    SZ WCHAN  TTY          TIME CMD
000 S   508 11436  9181  0  70   0    -   542 rt_sig ttyp2    00:00:00 tcsh
000 S   508 11520 11436  0  60   0    -  2386 do_sel ttyp2    00:00:00 xemacs
000 S   508 11620 11436  0  60   0    -  1061 do_sel ttyp2    00:00:00 xterm
000 R   508 11624 11436  0  73   0    -   637 -      ttyp2    00:00:00 ps
The column PID contains process ids, a.k.a. pids.

You can see more of your processes with

$ ps  -lcf -U srozen
  F S UID        PID  PPID   CLS PRI ADDR    SZ WCHAN  STIME TTY          TIME CMD
000 S srozen    9181     1     -  33    -  3034 do_sel 13:54 ?        00:02:12 xemacs index.html
000 S srozen    9402  9181     -  39    -   542 rt_sig 14:41 ttyp0    00:00:00 -bin/tcsh -i
000 S srozen    9547     1     -  25    -  1530 do_pol 15:09 ?        00:24:19 gtop
000 S srozen    9594  9181     -  39    -   600 pipe_r 15:16 ?        00:00:00 /usr/bin/ispell -a -m -B
000 S srozen   10725  9402     -  39    -   420 wait4  16:33 ttyp0    00:00:00 bash
000 S srozen   10736 10725     -  39    -   418 read_c 16:34 ttyp0    00:00:00 sh
100 S srozen   11360 11357     -  39    -   564 read_c 18:47 pts/5    00:00:00 -tcsh
000 S srozen   11436  9181     -  29    -   543 rt_sig 18:49 ttyp2    00:00:00 -bin/tcsh -i
000 S srozen   11520 11436     -  39    -  2386 do_sel 18:57 ttyp2    00:00:00 xemacs
000 S srozen   11620 11436     -  39    -  1061 do_sel 19:03 ttyp2    00:00:00 xterm
000 S srozen   11622 11620     -  39    -   512 read_c 19:03 pts/7    00:00:00 -csh
000 R srozen   11678 11436     -  23    -   637 -      19:11 ttyp2    00:00:00 ps -lcf -U srozen
Drop the -U srozen to see everyone's processes.

The commands top and gtop show all processes (and other information about the system).

Kill

Sometimes we need to kill jobs
$ xeyes &
[1] 11860
$ kill %1
or processes
$ ps -l
  F S   UID   PID  PPID  C PRI  NI ADDR    SZ WCHAN  TTY          TIME CMD
000 S   508 11436  9181  0  70   0    -   544 rt_sig ttyp2    00:00:00 tcsh
000 S   508 11861 11436  1  62   0    -   628 do_sel ttyp2    00:00:00 xeyes
000 R   508 11863 11436  0  75   0    -   637 -      ttyp2    00:00:00 ps
$ kill 11861
$ ps -l
  F S   UID   PID  PPID  C PRI  NI ADDR    SZ WCHAN  TTY          TIME CMD
000 S   508 11436  9181  0  70   0    -   545 rt_sig ttyp2    00:00:00 tcsh
000 R   508 11864 11436  0  74   0    -   636 -      ttyp2    00:00:00 ps
See problem #2 for more on kill.

Workshop Problem Set

Problem #1

  1. Try the
          $ grep abc
          
    example above. Make sure you type in a line containing abc before you end grep's input with Control-D.

  2. You want to keep track of disk space usage from week to week. Try the du -s * command. (Look at the du man page to see what du does.) Run du again, and use > to put its output in a file called du_week0. Don't forget to to do du again in a week to see how your disk usage has grown.

  3. Get a listing of the contents of /usr/local/bin using the ls command. Redirect the output from the ls command to a file, ~/ls_output. (Where is that file located?)
    Use the wc command to count the number of lines in ls_output. (By now you know how to find out what wc does, right?)

  4. Re-do the previous exercise by omitting the redirection to ls_output and use | (a "pipe") to connect the output of ls to the input of wc.

  5. What is the difference between
          $ wc > ls_output
          $ wc > ls_output
          
    and
          $ wc >> ls_output
          $ wc >> ls_output
          
    ?

  6. What is the difference between
          $ wc ls_output
          
    and
          $ wc < ls_output
          
    ?

  7. Try the
          $ ls non_existent_file* > file_list
          
    example above, and then change the command so that the error message goes into file_list.

  8. Figure out how to connect ls -l and grep with a pipe in order to show only files that were last modified on 'Oct 11'.

Problem #2

  1. Start an xterm at the command line (in the foreground). Suspend it. Use the jobs command to find out what jobs you have. Use ps -lcf -U username to find out what processes you have. Can you find the process that corresponds to the suspended xterm job?

  2. Use the bg command to reactive the suspended xterm job in the background. Re-run the jobs and ps -lcf commands to see what has changed.

  3. Kill the xterm process using kill and the pid.

  4. Start xterm in the background again, and this time kill it using the job number (i.e. using the % sign followed by the job number).

  5. What are the main differences between a job and process?

  6. Start xterm in the background again, and this time run kill -STOP %job. Is the process still running? If so, get rid of it.

  7. Look at the kill man page. What is difference between kill and kill -KILL?

  8. Start an xterm in the background the easy way, by putting & at the end of the command line. Then start gtop in the background, and then run ps -lcf -U... to find the pids of the xterm and gtop processes. Can you find those pids in the gtop display?

  9. Run the du command from the first problem in the background, first with redirected output and then without. What happens when output is not redirected?

Genome Informatics


Steve Rozen, rozen@gaiberg.wi.mit.edu
Whitehead Institute for Biomedical Research
Last modified: Wed Oct 20 10:43:50 EDT 2004