Boulder Data Interchange Format
Version 1.30: 12/13/2002
The Boulder data interchange format is an easily parseable
hierarchical tag/value format suitable for applications that need to
pipe the output of one program into the input of another. It was
originally developed for use in the human genome project at the Whitehead Institute/MIT Center
for Genome Research, but has since found use in many other areas
including system administration and Web software development.
In addition to its use as a data interchange format, Boulder comes
complete with a small database based on the Perl DB_File
modules. This database allows you to store arbitrarily complex
objects, index them, and later retrieve them using a simple query
mechanism.
Boulder is available as Perl or Java Libraries. To find out more
about Boulder, you can read its manual
page. Or you can download the distribution, which is available
free of charge under the same terms as Perl:
Documentation
Boulder documentation derived from the POD pages is available online.
Main Classes
The documentation for the main classes are Boulder Introduction, Stone, Boulder::Stream, Boulder::XML,and Boulder::Store.
Biological Subclasses of Boulder
The Boulder library has been subclassed to provide access to
specialized data types. There are currently six specialized classes
in the distribution:
- Boulder::Blast, specialized
for processing and parsing BLAST reports.
- Boulder::Genbank,
specialized for retrieving, parsing, and manipulating NCBI
Genbank entries.
- Boulder::Medline,
specialized for retrieving, parsing, and manipulating NCBI
Medline entries.
- Boulder::Swissprot,
specialized for retrieving, parsing, and manipulating Swissprot/TREMBL
entries.
- Boulder::Unigene,
specialized for retrieving, parsing, and manipulating NCBI
Unigene files.
- Boulder::Omim,
specialized for retrieving, parsing, and manipulating OMIM
(Online Mendelian Inheritance in Man) files.
If installed, Boulder::Genbank takes advantage of the Yank program, developed by William Fitzhugh of the
Whitehead Institute.
Applications Built on Boulder
Several standalone applications have been on top of the Boulder
library. One is the Primer
3 PCR primer-picking program, which is available for download at
the
Whitehead/MIT Center for Genome Research. Others are currently
part of the Boulder distribution and can be found in the
eg/ subdirectory once the distribution is unpacked:
- quickblast, a script for rapidly
BLASTing one FASTA file against another in true M X N fashion.
- gb_search, a scriptable
command-line interface to NCBI's Entrez.
- gb_get, a utility for fetching,
parsing and processing Genbank/EMBL entries from local and remote
databases.
ChangeLog
- Version 1.27
- Boulder::Genbank should be working now.
- Fixes from Michael Peterson to robustify BLAST parsing.
- Version 1.26
- Partial fixes for Boulder::Genbank to work with the changed NCBI batch entrez,
but Entry queries don't seem to be working due to problems at NCBI's end.
- Version 1.24
- Fixes from Lester Hui to correct occasional parsing problems in feature
table of Boulder::Genbank.
- Boulder::Genbank now handles aberrant genomic sequence entries that
indicate gaps using whitespace.
- Version 1.21
- Fixed problem in Boulder::Genbank which caused get() not to return
undef at end of stream.
- Version 1.20
- Improved speed.
- Boulder::Medline should work now.
- Boulder::Store queries should work with heterogeneous Stone types.
- Version 1.15
- Fixed parameter bug in File accessor for Boulder::Genbank.
- Documented problems with flock() across NFS filesystems.
- Boulder::Genbank no longer "eats" the list of accession numbers
passed to it.
Lincoln D. Stein, lstein@cshl.org
Cold Spring Harbor
Laboratory
Last modified: Mon Apr 23 10:49:04 EDT 2001