Rendering Sequence Features

Bio::Graphics can render any Sequence object that follows the BioPerl Bio::SeqFeatureI interface. This includes any object that is returned by the Bio::SeqIO modules.

Rendering an EMBL/GenBank File

This script will render the contents of any valid EMBL or GenBank flat file.

  #!/net/bin/perl
  # file: features1.pl

  use strict;
  use Bio::Graphics;
  use Bio::SeqIO;

  my $file = shift                       or die "provide a sequence file as the argument";
  my $io = Bio::SeqIO->new(-file=>$file) or die "couldn't create Bio::SeqIO";
  my $seq = $io->next_seq                or die "couldn't find a sequence in the file";

  my @features = $seq->all_SeqFeatures;

  # sort features by their primary tags
  my %sorted_features;
  for my $f (@features) {
    my $tag = $f->primary_tag;
    push @{$sorted_features{$tag}},$f;
  }

  my $whole_seq = Bio::SeqFeature::Generic->new(-start=>1,-end=>$seq->length);
  my $panel = Bio::Graphics::Panel->new(
                                        -segment   => $whole_seq,
                                        -key_style => 'between',
                                        -width     => 800,
                                        -pad_left  => 10,
                                        -pad_right => 10,
                                        );
  $panel->add_track($whole_seq,
                    -glyph => 'arrow',
                    -bump => 0,
                    -double=>1,
                    -tick => 2);

  $panel->add_track($whole_seq,
                    -glyph  => 'generic',
                    -bgcolor => 'blue',
                    -label  => 1,
                   );

  # general case
  my @colors = qw(cyan orange blue purple green chartreuse magenta yellow aqua);
  my $idx    = 0;
  for my $tag (sort keys %sorted_features) {
    my $features = $sorted_features{$tag};
    $panel->add_track($features,
                      -glyph    =>  'generic',
                      -bgcolor  =>  $colors[$idx++ % @colors],
                      -fgcolor  => 'black',
                      -font2color => 'red',
                      -key      => "${tag}s",
                      -bump     => +1,
                      -height   => 8,
                      -label    => 1,
                      -description => 1,
                     );
  }

  print $panel->png;
  exit 0;

How This Script Works

  1. Open the sequence file with Bio::SeqIO, and read the first entry into $seq.

  2. Get the list of features in the features table using all_SeqFeatures().

  3. Sort the features by their primary tag (these are keywords like "CDS" and "Intron"). The sorted features will be placed in a hash named %sorted_features, in which the keys are the tag names, and the values are an array ref containing all the features of that particular type.

  4. Create the panel, and add two tracks. One track will contain the scale, and the other will contain a single blue box that shows the full extent of the sequence.

  5. For each tag type, create a new track that contains all the features of that tag type. The color is picked at random.

Result

Using the sample data from factor7.embl, here's the new display:
(~) 51% features1.pl factor7.embl | display -


<< Previous
Contents >> Next >>

Lincoln D. Stein, lstein@cshl.org
Cold Spring Harbor Laboratory
Last modified: Wed Oct 22 22:38:19 EDT 2003