Globals and Functions that Affect I/O

Several built-in globals affect input and output:

$/
The input record separator. The value of this global is used by <FILE> to determine where the end of a line is. Normally "\n".

$\
The record output string. Whatever this is set to will appear at the end of everything printed by print. Normally empty.

$,
The output field separator. Appears between all items printed with the print function. Normally empty.
$"
The output list separator. Interpolated between all items of an array when an array is interpolated into a double-quoted string. Normally a space.
$.
The line count. When reading from <>, this will be set to the line number of the "virtual file".

Example use of Input Record Separator

Say you have a text file containing records in the following interesting format:

>gi|5340860|gb|AI793144.1|AI793144 on36f02.y5 NCI_CGAP_Lu5 Homo sapiens cDNA clone
CAAACAGCCCCCGATAACGCTACGTGAGCTGGGCCCTGGGCCTGAGGCAGAAAACGGACGGAAGAAAAGG
TCTGGCCGGAGATGGGTCTCACTCTGTCACCCAGACTGGAGTGCAGTGAGTGGTGCGATCATAGCTTACT
GCAGCCTGAAACTCCTGGGCTCAAGTGATCTTCTCGCCTCAGCCTCCTGAGTAGCTGGAGCTACAGGAAT
GAGCATAGATGAACAATGTTGCATCACGCTTGACATCACCGGNGCTTCTTTCCAGTGTGGATTTGCTCAT
GTAAAATGAGGTGTGAGCTCTGCCTGAAAGCTTTTCCATATGCATCACATTTGCAGGGCTTTTCTCCAGT
GTGGGTTCTTTGGTGTCTCAAAAGATGTGAGCTGTTACTGAAAGCTTTCCCACACACATCACACTCATAG
GGCTTCTCTCTACCGTGGATTCGCTGGTGTCCAACAAGAGCTGAACTGTATCTGAAGGCCTTTCCACGCT
TGTCACATTCATATAGTTTCTTTCCACTGTGGATTNTCTGGTGACAGAAGAGGCCCAAGCACTAGCTAAA
GCTNTTCCCTCACTCACTACACTGCTATGGCTTCTCTTCAGTATGAACTCTGATGTTGTCTCAGATATGA
ACTCAGAGAGGATNTCCCACAATCATTACACTGGTATGGTTCCTTTTCGTGTGAGTTCTCTGGTGTCNAA
ATACATCTGAGCTGTGATGAAAGAACTTNCCACACTCACTACATTGGGAAGG
>gi|4306680|gb|AI451833.1|AI451833 mx13e08.y1 Soares mouse NML Mus musculus cDNA clone
TGAATGTATGCAGTGCGGAAAGACATTCACTTCTGGCCACTGTGCCAGAAGACATTTAGGGACTCACAGT
GGAGCCTGGCCTTACAAATGTGAAGTGTGTGGGAAAGCTTATCCCTACGTCTATTCCCTTCGAAACCACA
AAAAAAGTCACAACGAAGAAAAACTTTATGAATGTAAACAATGTGGGAAAGCCTTTAAATACATTTCTTC
CTTACGCAACCACGAGACTACTCACACTGGAGAGAAGCCCTATGAATGTAAGGAATGTGGGAAAGCCTTT
AGTTGTTCCAGTTACATTCAAAATCACATGAGAACACACAAAAGGCAGTCCTATGAATGTAAGGAGTGTG
GTAAGGTGTTCTCATATTCCAAAAGTCTTCGGAGACACATGACTACACATAGTTAATTAGAGAGGGATAG
TTNTAAGTATAATTTAAATATATAAAAGAGCTCTACACATTCTAGCTCCTCATTAAGAAACAAAAAATTT
CACACTGGAAAACGAGCCTATGAATGCAGTATGTGTGCCAAAGTCTCAGTACATGCCACAGT
>gi|3400733|gb|AI074089.1|AI074089 oq97c08.x1 NCI_CGAP_Co12 Homo sapiens cDNA clone
GAATCTTCTGGGTCCTCTTTATTAAGAGCCCTCTGCCTTCCCAGGGGAGGGAAGCAAATCCTTCAGGGCC
CCCAGAGTTCCTGCACCCCATATCATGGGTGAGTCCTACCAGCCACAGAGCCACCCGTCACCGTGGAGAG
GCTTAAGCTGCACTCAGAGCTCCCCCCGGGCATGCCGAATGTAGTGTTGATGCAGCCCTGCTTCCTGAGC
AAAGTCCTGACCGCACTCTGTGCAGGCGAAGGTGCCAGGAGGGGCACGGACCTCATGCATCTGGCGGTGC
CGCCTCAGAGAAACAGCCTGCCCAAAGGTCTTGCCACAGTCAGGACAAGGGAAGGTGGGCTGGGCAGTAG
TGGTTGCAACCGGCAGGGTGGGCTTGGCGGCTGGACCGTGGCTGCGCTGGTGGGTGATTAGGGCTTTGGA
...

If you use standard <>, you will get a line at a time, and have to figure out where one record ends and a new one starts. However, if you set the input record separator to ">", then each time you read a "line", you will read all the way to the next ">" symbol. Throw away the first record (which is empty), keep the others.

  #!/usr/local/bin/perl
  # file: get_fasta_records.pl

  $/ = '>';

  <>  # throw away the first record (will be empty)

  while (<>) {
    chomp;
    # split up lines of the record.  The first line
    # is the sequence ID.  The second and subsequent lines
    # are the sequence
    my ($id,@sequence) = split "\n";
    my $sequence = join '',@sequence; # reassemble the sequence
  }

Special Uses of the Input Record Separator

The input record separator has two special cases.

Paragraph Mode

If the input record separator ($/) is set to the empty string ("") it goes into paragraph mode. Each <> will read up to the next blank line. Multiple blank lines will be skipped over. This is good for reading text separated into paragraphs.

Slurp Mode

If the input record separator is set to the undefined value (undef) then it goes into slurp mode. The <> operator will read its entire input into a single scalar.

Here's how to read the entire file cosmids.fasta into a scalar variable:

  open IN,"cosmids.fasta" or die "Can't open cosmids.fasta: $!\n";
  $/ = undef;

  $data = <IN>;  # data slurp


<< Previous
Contents >> Next >>

Lincoln D. Stein, lstein@cshl.org
Cold Spring Harbor Laboratory
Last modified: Tue Oct 12 14:15:26 EDT 1999