Home C. elegans AcePerl Gramene Reactome GMOD Course DAS WWW

Boulder Data Interchange Format

Version 1.30: 12/13/2002

The Boulder data interchange format is an easily parseable hierarchical tag/value format suitable for applications that need to pipe the output of one program into the input of another. It was originally developed for use in the human genome project at the Whitehead Institute/MIT Center for Genome Research, but has since found use in many other areas including system administration and Web software development.

In addition to its use as a data interchange format, Boulder comes complete with a small database based on the Perl DB_File modules. This database allows you to store arbitrarily complex objects, index them, and later retrieve them using a simple query mechanism.

Boulder is available as Perl or Java Libraries. To find out more about Boulder, you can read its manual page. Or you can download the distribution, which is available free of charge under the same terms as Perl:

Documentation

Boulder documentation derived from the POD pages is available online.

Main Classes

The documentation for the main classes are Boulder Introduction, Stone, Boulder::Stream, Boulder::XML,and Boulder::Store.

Biological Subclasses of Boulder

The Boulder library has been subclassed to provide access to specialized data types. There are currently six specialized classes in the distribution:

  1. Boulder::Blast, specialized for processing and parsing BLAST reports.

  2. Boulder::Genbank, specialized for retrieving, parsing, and manipulating NCBI Genbank entries.
  3. Boulder::Medline, specialized for retrieving, parsing, and manipulating NCBI Medline entries.
  4. Boulder::Swissprot, specialized for retrieving, parsing, and manipulating Swissprot/TREMBL entries.
  5. Boulder::Unigene, specialized for retrieving, parsing, and manipulating NCBI Unigene files.
  6. Boulder::Omim, specialized for retrieving, parsing, and manipulating OMIM (Online Mendelian Inheritance in Man) files.

If installed, Boulder::Genbank takes advantage of the Yank program, developed by William Fitzhugh of the Whitehead Institute.

Applications Built on Boulder

Several standalone applications have been on top of the Boulder library. One is the Primer 3 PCR primer-picking program, which is available for download at the Whitehead/MIT Center for Genome Research. Others are currently part of the Boulder distribution and can be found in the eg/ subdirectory once the distribution is unpacked:

ChangeLog

Version 1.27
Boulder::Genbank should be working now.
Fixes from Michael Peterson to robustify BLAST parsing.
Version 1.26
Partial fixes for Boulder::Genbank to work with the changed NCBI batch entrez, but Entry queries don't seem to be working due to problems at NCBI's end.
Version 1.24
Fixes from Lester Hui to correct occasional parsing problems in feature table of Boulder::Genbank.
Boulder::Genbank now handles aberrant genomic sequence entries that indicate gaps using whitespace.
Version 1.21
Fixed problem in Boulder::Genbank which caused get() not to return undef at end of stream.

Version 1.20
Improved speed.
Boulder::Medline should work now.
Boulder::Store queries should work with heterogeneous Stone types.

Version 1.15
Fixed parameter bug in File accessor for Boulder::Genbank.
Documented problems with flock() across NFS filesystems.
Boulder::Genbank no longer "eats" the list of accession numbers passed to it.
Home C. elegans AcePerl Gramene Reactome GMOD Course DAS WWW

Lincoln D. Stein, lstein@cshl.org
Cold Spring Harbor Laboratory

Last modified: Mon Apr 23 10:49:04 EDT 2001