Rendering Sequence Features Better

Not all features are created equal. For example, we would like to represent the CDS features with connected lines in order to make them look like transcripts. We'd also like the labels and descriptions to be more informative than the defaults. We can do this using callbacks.

Improved EMBL/GenBank File Renderer

This script is an improved version of the previous one.

  #!/net/bin/perl
  # file: features2.pl

  use strict;
  use Bio::Graphics;
  use Bio::SeqIO;

  my $file = shift                       or die;
  my $io = Bio::SeqIO->new(-file=>$file) or die;
  my $seq = $io->next_seq                or die;

  my @features = $seq->all_SeqFeatures;

  # sort features by their primary tags
  my %sorted_features;
  for my $f (@features) {
    my $tag = $f->primary_tag;
    push @{$sorted_features{$tag}},$f;
  }

  my $whole_seq = Bio::SeqFeature::Generic->new(-start=>1,-end=>$seq->length);
  my $panel = Bio::Graphics::Panel->new(
                                        -segment   => $whole_seq,
                                        -key_style => 'between',
                                        -width     => 800,
                                        -pad_left  => 10,
                                        -pad_right => 10,
                                        );
  $panel->add_track($whole_seq,
                    -glyph => 'arrow',
                    -bump => 0,
                    -double=>1,
                    -tick => 2);

  $panel->add_track($whole_seq,
                    -glyph  => 'generic',
                    -bgcolor => 'blue',
                    -label  => 1,
                   );

  # special cases
  if ($sorted_features{CDS}) {
    $panel->add_track($sorted_features{CDS},
                      -glyph      => 'transcript2',
                      -bgcolor    => 'orange',
                      -fgcolor    => 'black',
                      -font2color => 'red',
                      -key        => 'CDS',
                      -bump       =>  +1,
                      -height     =>  12,
                      -label      => \&gene_label,
                      -description=> \&gene_description,
                     );
    delete $sorted_features{'CDS'};
  }

  if ($sorted_features{tRNA}) {
    $panel->add_track($sorted_features{tRNA},
                      -glyph     =>  'transcript2',
                      -bgcolor   =>  'red',
                      -fgcolor   =>  'black',
                      -font2color => 'red',
                      -key       => 'tRNAs',
                      -bump      =>  +1,
                      -height    =>  12,
                      -label     => \&gene_label,
                     );
    delete $sorted_features{tRNA};
  }

  # general case
  my @colors = qw(cyan orange blue purple green chartreuse magenta yellow aqua);
  my $idx    = 0;
  for my $tag (sort keys %sorted_features) {
    my $features = $sorted_features{$tag};
    $panel->add_track($features,
                      -glyph    =>  'generic',
                      -bgcolor  =>  $colors[$idx++ % @colors],
                      -fgcolor  => 'black',
                      -font2color => 'red',
                      -key      => "${tag}s",
                      -bump     => +1,
                      -height   => 8,
                      -description => \&generic_description
                     );
  }

  print $panel->png;
  exit 0;

  sub gene_label {
    my $feature = shift;
    my @notes;
    foreach (qw(product gene)) {
      next unless $feature->has_tag($_);
      @notes = $feature->each_tag_value($_);
      last;
    }
    $notes[0];
  }

  sub gene_description {
    my $feature = shift;
    my @notes;
    foreach (qw(note)) {
      next unless $feature->has_tag($_);
      @notes = $feature->each_tag_value($_);
      last;
    }
    return unless @notes;
    substr($notes[0],30) = '...' if length $notes[0] > 30;
    $notes[0];
  }

  sub generic_description {
    my $feature = shift;
    my $description;
    foreach ($feature->all_tags) {
      my @values = $feature->each_tag_value($_);
      $description .= $_ eq 'note' ? "@values" : "$_=@values; ";
    }
    $description =~ s/; $//; # get rid of last
    $description;
  }

Changes

  1. The CDS and tRNA features both get special treatment. They are given the "transcript2" glyph rather than the "generic" glyph.

  2. We define custom functions named gene_label(), gene_description() and generic_description(), and pass them to the -label and -description options when we create new tracks. These functions are called by Bio::Graphics to choose the appropriate label and description for each feature.

Result

Here's the new display.
(~) 51% features2.pl factor7.embl | display -


<< Previous
Contents >> Next >>

Lincoln D. Stein, lstein@cshl.org
Cold Spring Harbor Laboratory
Last modified: Wed Oct 22 22:41:12 EDT 2003