This document provides information on configuring the Generic Genome Browser (GBrowse), part of the Generic Model Organism Systems Database Project (GMOD; http://www.gmod.org). * Table of Contents A. CREATING NEW DATABASES FROM SCRATCH A1. The GFF file format A2. Creating a GFF table A3. Identifying the reference sequence A4. Sequence alignments A5. Aggregators B. ADDING A NEW DATABASE TO THE BROWSER B1. The [GENERAL] Section B2. Track Sections B3. Glyphs and Glyph options B4. Adding features to the overview B5. Semantic Zooming B6. Computed Options B7. Declaring New Aggregators B8. Grouping related features C. ADDING HISTOGRAMS D. INTERNATIONALIZATION E. DISPLAYING GENETIC AND RH MAPS F. CHANGING THE LOCATION OF THE CONFIGURATION FILES G. FURTHER INFORMATION A. CREATING NEW DATABASES FROM SCRATCH This section describes how to create new annotation databases from scratch. A1. The GFF file format GBrowse is based around the GFF file format, which stands for "Gene Finding Format" and was invented at the Sanger Centre. The GFF format is a flat tab-delimited file, each line of which corresponds to an annotation, or feature. Each line has nine columns and looks like this: Chr1 curated CDS 365647 365963 . + 1 Transcript "R119.7" The 9 columns are as follows: 1. reference sequence This is the ID of the sequence that is used to establish the coordinate system of the annotation. In the example above, the reference sequence is "Chr1". 2. source The source of the annotation. This field describes how the annotation was derived. In the example above, the source is "curated" to indicate that the feature is the result of human curation. The names and versions of software programs are often used for the source field, as in "tRNAScan-SE/1.2". 3. method The annotation method. This field describes the type of the annotation, such as "CDS". Together the method and source describe the annotation type. 4. start position The start of the annotation relative to the reference sequence. 5. stop position The stop of the annotation relative to the reference sequence. Start is always less than or equal to stop. 6. score For annotations that are associated with a numeric score (for example, a sequence similarity), this field describes the score. The score units are completely unspecified, but for sequence similarities, it is typically percent identity. Annotations that don't have a score can use "." 7. strand For those annotations which are strand-specific, this field is the strand on which the annotation resides. It is "+" for the forward strand, "-" for the reverse strand, or "." for annotations that are not stranded. 8. phase For annotations that are linked to proteins, this field describes the phase of the annotation on the codons. It is a number from 0 to 2, or "." for features that have no phase. 9. group GFF provides a simple way of generating annotation hierarchies ("is composed of" relationships) by providing a group field. The group field contains the class and ID of an annotation which is the logical parent of the current one. In the example given above, the group is the Transcript named "R119.7". The group field is also used to store information about the target of sequence similarity hits, and miscellaneous notes. See the next section for a description of how to describe similarity targets. The sequences used to establish the coordinate system for annotations can correspond to sequenced clones, clone fragments, contigs or super-contigs. In addition to a group ID, the GFF format allows annotations to have a group class. This makes sure that all groups are unique even if they happen to share the same name. For example, you can have a GenBank accession named AP001234 and a clone named AP001234 and distinguish between them by giving the first one a class of Accession and the second a class of Clone. You should use double-quotes around the group name or class if it contains white space. ---- A2. Creating a GFF table The first 8 fields of the GFF format are easy to understand. The group field is a challenge. It is used in three distinct ways: a. to group together a single sequence feature that spans a discontinuous range, such as a gapped alignment. b. to name a feature, allowing it to be retrieved by name. c. to add one or more notes to the annotation. * Using the Group field for simple features For a simple feature that spans a single continuous range, choose a name and class for the object and give it a line in the GFF file that refers to its start and stop positions. Chr3 giemsa heterochromatin 4500000 6000000 . . . Band 3q12.1 * Using the Group field to group features that belong together For a group of features that belong together, such as the exons in a transcript, choose a name and class for the object. Give each segment a separate line in the GFF file but use the same name for each line. For example: IV curated exon 5506900 5506996 . + . Transcript B0273.1 IV curated exon 5506026 5506382 . + . Transcript B0273.1 IV curated exon 5506558 5506660 . + . Transcript B0273.1 IV curated exon 5506738 5506852 . + . Transcript B0273.1 These four lines refer to a biological object of class "Transcript" and name B0273.1. Each of its parts uses the method "exon", source "curated". Once loaded, the user will be able to search the genome for this object by asking the browser to retrieve "Transcript:B0273.1". The browser can also be configured to allow the Transcript: prefix to be omitted. You can extend the idiom for objects that have heterogeneous parts, such as a transcript that has 5' and 3' UTRs IV curated mRNA 5506800 5508917 . + . Transcript B0273.1; Note "Zn-Finger" IV curated 5'UTR 5506800 5508999 . + . Transcript B0273.1 IV curated exon 5506900 5506996 . + . Transcript B0273.1 IV curated exon 5506026 5506382 . + . Transcript B0273.1 IV curated exon 5506558 5506660 . + . Transcript B0273.1 IV curated exon 5506738 5506852 . + . Transcript B0273.1 IV curated 3'UTR 5506852 5508917 . + . Transcript B0273.1 In this example, there is a single feature with method "mRNA" that spans the entire range. It is grouped with subparts of type 5'UTR, 3'UTR and exon. They are all grouped together into a Transcript named B0273.1. Furthermore the mRNA feature has a note attached to it. *NOTE* The subparts of a feature are in absolute (chromosomal or contig) coordinates. It is not currently possible to define a feature in absolute coordinates and then to load its subparts using coordinates that are relative to the start of the feature. Some annotations do not need to be individually named. For example, it is probably not useful to assign a unique name to each ALU repeat in a vertebrate genome. For these, just leave the Group field empty. * Using the Group field to add a note The group field can be used to add one or more notes to an annotation. To do this, place a semicolon after the group name and add a Note field: Chr3 giemsa heterochromatin 4500000 6000000 . . . Band 3q12.1 ; Note "Marfan's syndrome" You can add multiple Notes. Just separate them by semicolons: Band 3q12.1 ; Note "Marfan's syndrome" ; Note "dystrophic dysplasia" The Note should come AFTER the group type and name. ---- A3. Identifying the reference sequence Each reference sequence in the GFF table must itself have an entry. This is necessary so that the length of the reference sequence is known. For example, if "Chr1" is used as a reference sequence, then the GFF file should have an entry for it similar to this one: Chr1 assembly chromosome 1 14972282 . + . Sequence Chr1 This indicates that the reference sequence named "Chr1" has length 14972282 bp, method "chromosome" and source "assembly". In addition, as indicated by the group field, Chr1 has class "Sequence" and name "Chr". It is suggested that you use "Sequence" as the class name for all reference sequences, since this is the default class used by the Bio::DB::GFF module when no more specific class is requested. ---- A4. Sequence alignments There are several cases in which an annotation indicates the relationship between two sequences. One common one is a similarity hit, where the annotation indicates an alignment. A second common case is a map assembly, in which the annotation indicates that a portion of a larger sequence is built up from one or more smaller ones. Both cases are indicated by using the Target tag in the group field. For example, a typical similarity hit will look like this: Chr1 BLASTX similarity 76953 77108 132 + 0 Target Protein:SW:ABL_DROME 493 544 Here, the group field contains the Target tag, followed by an identifier for the biological object. The GFF format uses the notation Class:Name for the biological object, and even though this is stylistically inconsistent, that's the way it's done. The object identifier is followed by two integers indicating the start and stop of the alignment on the target sequence. Unlike the main start and stop columns, it is possible for the target start to be greater than the target end. The previous example indicates that the the section of Chr1 from 76,953 to 77,108 aligns to the protein SW:ABL_DROME starting at position 493 and extending to position 544. A similar notation is used for sequence assembly information as shown in this example: Chr1 assembly Link 10922906 11177731 . . . Target Sequence:LINK_H06O01 1 254826 LINK_H06O01 assembly Cosmid 32386 64122 . . . Target Sequence:F49B2 6 31742 This indicates that the region between bases 10922906 and 11177731 of Chr1 are composed of LINK_H06O01 from bp 1 to bp 254826. The region of LINK_H0601 between 32386 and 64122 is, in turn, composed of the bases 5 to 31742 of cosmid F49B2. ---- A6. Loading the GFF file into the database Use the BioPerl script utilities bulk_load_gff.pl, load_gff.pl or (if you are brave) fast_load_gff.pl to load the GFF file into the database. For example, if your database is a MySQL database on the local host named "dicty", you can load it into an empty database using bulk_load_gff.pl like this: bulk_load_gff.pl -c -d dicty my_data.gff To update existing databases, use either load_gff.pl or fast_load_gff.pl. The latter is somewhat experimental, so use with care. ---- A5. Aggregators The Bio::DB::GFF module has a feature known as "aggregators". These are small software packages that recognize certain common feature types and convert them into complex biological objects. These aggregators make it possible to develop intelligent graphical representations of annotations, such as a gene that draws confirmed exons differently from predicted ones. An aggregator typically creates a new composite feature with a different method than any of its components. For example, the standard "alignment" aggregator takes multiple alignments of method "similarity", groups them by their name, and returns a single feature of method "alignment". The various aggregators are described in detail in the Bio::DB::GFF manual page. It is easy to write new aggregators, and also possible to define aggregators on the fly in the gbrowse configuration file. It is suggested that you use the sample GFF files from the yeast, drosophila and C. elegans projects to see what methods to use to achieve the desired results. ---------------------------------------------------------------------- B. ADDING A NEW DATABASE TO THE BROWSER Each data source has a corresponding configuration file in the directory gbrowse.conf. Once you've created and loaded a new database, you should make a copy of one of the existing configuration files and modify it to meet your needs. The name of the new configuration file must follow the form: sourcename.conf where "sourcename" is a short word that describes the data source. You can use this name to select the data source when linking to the browser. Just provide a source= CGI argument, as in: http://your.site.org/cgi-bin/gbrowse?source=sourcename It is suggested that you use the same name as the database, although this isn't a requirement. (If no "source=" argument is given, gbrowse picks the first configuration file that occurs alphabetically; you can control this by placing numbers in front of the configuration file, as in "01.yeast.conf".) The configuration file is divided into a number of sections, each one introduced by a [SECTION TITLE]. The [GENERAL] section contains settings that are applicable to the entire application. Other sections define tracks to display. I suggest that you begin with one of the example configuration files provided with the distribution and modify it to suit your needs. --- B1. The [GENERAL] Section The [GENERAL] section consists of a series of name=value options. For example, the beginning of the yeast.conf sample configuration file looks like this: [GENERAL] description = S. cerevisae (via SGD Nov 2001) db_adaptor = Bio::DB::GFF db_args = -dsn dbi:mysql:database=yeast;host=localhost aggregators = transcript alignment user = passwd = Each option is a single word or phrase, usually in lower case. This is followed by an equals sign and the value of the option. You can add whitespace around the equals sign in order to increase readability. If a value is very long, you can continue it on additional lines provided that you put a tab or other whitespace on the continuation lines. For example: description = S. cerevisiae annotations via SGD Nov 2001, and converted using the process_sgd.pl script Any lines that begin with a pound sign (#) are considered comments and ignored. During this discussion, you might want to follow along with one of the example configuration files. The following [GENERAL] options are recognized: * description The description of the database. This will appear in the popup menu that allows users to select the data source and in the header of the page. Don't make it as long as the previous example! (You will want to change this.) * db_adaptor Tells GBrowse what database adaptor to use. By using different adaptors you can attach gbrowse to a variety of different databases. Currently the only stable adaptor you can use is Bio::DB::GFF, which is a standard set of adaptors contained in Bioperl. * db_args Arguments to pass to the adaptor for it to use when making a database connection. The exact format will depend on the adaptor you're using. For Bio::DB::GFF running on top of a MySQL database use a db_args like the following: db_args = -dsn dbi:mysql:database=;host= replacing and with the database and database host of your choice. If the database requires you to log in with a user name and password, use the following db_adaptor: db_args = -dsn dbi:mysql:database=;host= -adaptor dbi::Oracle -user -pass replacing and with the appropriate values. In the example configuration files, we use a username of "nobody" and an empty password. This is appropriate if the database is configured to allow "nobody" to log in from the local machine without using a password. To use the Oracle version of Bio::DB::GFF, use these arguments: db_args = -adaptor oracle -dsn dbi:oracle:database=db_service Where db_description should be replaced with the name of the desired database service definition. See the documentation for the Perl dbd::Oracle database driver for more information about the -dsn format. To use the in-memory version of Bio::DB::GFF, use these arguments: db_args = -adaptor memory -file /path/to/gff_file.gff -fasta /path/to/fasta_file.fa Where gff_file.gff and fasta_file.fa correspond to the locations of GFF and FASTA files containing the features and DNA, respectively. * aggregators This option is only valid when used with Bio::DB::GFF adaptors, and lists one or more aggregators to use for complex features. It is possible to declare your own aggregator here using a special syntax described in "B7. Declaring New Aggregators." To disable the default aggregators, leave this setting blank, as in: aggregators= To activate the default aggregators of "transcript," "clone," and "alignment," comment this setting out entirely: # aggregators = * user The user name for the gbrowse script to log in under if you are not using "nobody". This is exactly the same as providing the -user option to db_args. * pass The password to use if the database is password protected. This is the same as providing the -pass option to db_args. * stylesheet Location of the stylesheet used to create the GBrowse look and feel. (You probably will not need to change this.) * plugins This is a list of plugins that you want to be available from gbrowse. Plugins are a way for third-party developers to add functionality to gbrowse without changing its core source code. Plugins are stored on the gbrowse configuration directory under a subdirectory named "plugins." A good standard list of plugins is: plugins = SequenceDumper FastaDumper RestrictionAnnotator See the contents of conf/plugins and contrib/plugins for more plugins that you can install. * plugin_path By default gbrowse searches for plugins in its standard location of conf/plugins. You can store plugins in a non-standard location by providing this option with a space-delimited list of additional directories to search in. * buttons URL in which the various graphical buttons used by GBrowse are located. (You will probably not need to change this.) * tmpimages URL of a writable directory in which GBrowse can write its temporary images. (You will probably not need to change this.) * glyph height bgcolor fgcolor strand_arrow These options control the default graphical settings for any annotation types that are not explicitly specified. See the section below on controlling the settings. Likewise, any other graphical options found in the [GENERAL] sections are treated as defaults. * label density When there are too many annotations on the screen GBrowse automatically disables the printing of identifying labels next to the feature. "label density" controls where the cutoff occurs. The value in the example files is 25, meaning that labels will be turned off when there are more than 25 annotations of a particular type on display at once. * bump density When there are too many annotations on the screen GBrowse automatically disables collision control. The "bump density" option controls where the cutoff occurs. The value in the example files is 100, meaning that when there more than 100 annotations of the same type on the display, the browser will stop shifting them verticially to prevent them from colliding, but will instead allow them to overlap. * link The link option creates a default rule for creating outgoing links from the GBrowse display. When the user clicks on a feature of interest, he will be taken to the corresponding URL. The link option's value should be a URL containing one or more variables. Variables begin with a dollar sign ($), and are replaced at run time with the information relating to the selected annotation. Recognized variables include: $name The feature's name (group name) $class The feature's class (group class) $method The feature's method $source The feature's source $ref The name of the sequence segment (chromosome, contig) on which this feature is located $start The start position of this feature, relative to $ref $end The end position of this feature, relative to $ref $segstart The left end of $ref displayed in the detailed view $segend The right end of $ref displayed in the detailed view For example, the wormbase.conf file uses this link rule: link = http://www.wormbase.org/db/get?name=$name;class=$class At run time, if the user clicks on an EST named yk1234.5, this will generate the URL http://www.wormbase.org/db/get?name=yk1234.5;class=EST It is possible to override the global link rule on a feature-by-feature basis. See the next section for details on this. It is also possible to declare a subroutine to compute the proper URL dynamically. See COMPUTED OPTIONS for details. * link_target By default links will replace the contents of the current window. If you wish, you can specify a new window to pop up when the user clicks on a feature, or designate a named window or frame to receive the contents of the link. To do this, add the "link_target" option to the general section or to a track stanza. The format is this: link_target = _blank The value uses the HTML targetting rules to name/create the window to receive the value of the link. The first time the link is accessed, a window with the specified name is created. The next time the user clicks on a link with the same target, that window will receive the content of the link if it is still present, or it will be created again if it has been closed. A target named "_blank" is special and will always create a new window. The "link_target" option can also be computed dynamically. See COMPUTED OPTIONS for details. * title The title option controls the "tooltips" text that pops up when the mouse hovers over a glyph in certain browsers. The rules for generating titles are the same as the "link" option discussed above. The "title" option can also be computed dynamically. See COMPUTED OPTIONS for details. * image widths The image widths option controls the set of image sizes to offer the user. Its value is a space-delimited list of pixel widths. The default is probably fine. Note that the height of the image depends on the number of tracks and features, and cannot be controlled. * default width The default width is the image width to start off with when the user invokes the browser for the first time. The default is 800. * default features The default features option is a space-delimited list of tracks to turn on by default. You will probably need to change this. For example: default features = Genes ORFs tRNAs * reference class gbrowse needs to know the class of the reference sequences that other features are placed on. The default is Sequence. If you want to use another class, such as Contig, please indicate the class here (if you don't, certain features such as the keyword search will fail): reference class = contig * max segment The max segment option sets an upper bound on the maximum size DNA segment that will be displayed on the detailed view. Its value is in base pairs. Above this limit, the user will be prompted to select a smaller region on the birds-eye view. You will probably want to adjust this. * default segment The default segment option sets the width of the segment (bp) that will be displayed when the user clicks on the birds-eye view without previously having set a desired magnification. You may want to adjust this value. * zoom levels GBrowse allows unlimited zoom levels. This option selects the width of each level, in bp. For example: zoom levels = 1000 2000 5000 10000 20000 40000 100000 200000 * keyword search max Ty default, gbrowse will limit the number of keyword search results to 1,000. The order in which the 1,000 hits are returned depends on how the database was loaded, and so you may see odd patterns, such as only hits on a particular chromosome being displayed. To raise the limit on keyword search results, set "keyword search max" to the desired maximum value. * overview units This option controls the units that will be used on the scale for the birds-eye view display. Possible values are "bp" (base pairs), "k" (kilobases), "M" (megabases), and "G" (gigabases). If this option is omitted, the browser will guess the most appropriate unit. * overview bgcolor This is the color for the background of the birds-eye view. * detailed bgcolor This is the color for the background of the detailed view. * header This is a header to print at the top of the browser page. It is any valid HTML, and can span multiple lines provided that the continuation lines begin with white space. It is also possible to place an anonymous Perl subroutine here. The code will be invoked during preparation of the page and must return a string value to use as the header. See COMPUTED OPTIONS for details. Example: header =

Welcome to the Volvox Sequence Page

* footer This is a footer to print at the top of the browser page. It is any valid HTML, and can span multiple lines provided that the continuation lines begin with white space. It is also possible to place an anonymous Perl subroutine here. The code will be invoked during preparation of the page and must return a string value to use as the header. See COMPUTED OPTIONS for details. Example: footer =
For the source code for this browser, see the Generic Model Organism Database Project. For other questions, send mail to lstein@cshl.org.
* examples You can provide GBrowse with some canned examples of "interesting regions" for the user to click on. The examples option, if present, provides a space-delimited list of interesting regions. For example: examples = II NPY1 NAB2 Orf:YGL123W * automatic classes When the user types in a search string that is not qualified by a class (as in EST:yk1234.5), GBrowse will automatically search for a matching feature of class "Sequence". You can have it search for the name in other classes as well by defining the "automatic classes" option. Example: automatic classes = Symbol Gene Clone When the user types in "hb3", the browser will search first for a Sequence feature of class hb3, followed in turn by matching features in Symbol, Gene and Clone. The search stops when the first match is found. Otherwise, the browser will proceed to a full text search of all the comment fields. * remote sources This option allows you to add remote annotation sources to the menu of such sources at the bottom of the main window. The format is: remote sources = "Menu Label 1" http://url1.host.com/etc/etc "Menu Label 2" http://url2.host.com/etc/etc * instructions, search_instructions, navigation_instructions You may override the default instructions (as defined in the language-specific configuration files in conf/lang) by setting these options. For example: instructions = "Type in the name of a contig or clone." * html1, html2, html3, html4, html5, html6 These options allow you to insert HTML into the GBrowse page at strategic places. Eventually this will be replaced with an HTML template system, but for now, this is the best we have. Option Where it goes ------ ------------- header between the top and the instructions html1 between the instructions and the navigation bar html2 between the navigation bar and the overview html3 between the overview and the detail view html4 between the detail view and the data source panel html5 between the data source panel and the track list html6 between the track list and the annotation upload footer between the annotation upload and the bottom These can be code references. One useful thing to do is to use the language translator to insert language-specific HTML. Here's an example provided by Marc Logghe: html2 = sub { my $go = $main::LANG->tr('Go'); return qq(
Dump:
); } * keystyle, empty_tracks These two general options control the appearance of the keys printed on the detailed view. keystyle takes one of two values "between" or "beneath". keystyle = between Print the track labels between the tracks themselves. keystyle = beneath Print the track labels at the bottom of the detailed view. The "empty_tracks" option controls what to do when a track has no features in it. Possible values are: empty_tracks = key Print just the key (the track label). empty_tracks = suppress Suppress the track completely. empty_tracks = line Draw a solid line across the track. empty_tracks = dashed Draw a dashed line across the track. The default value is "key." B2. Track Sections Any other [Section] in the configuration file is treated as a declaration of a track. The order of track sections will become the default order of tracks on the display (the user can change this later). Here is a typical track declaration from yeast.conf: [Genes] feature = gene:sgd glyph = generic bgcolor = yellow forwardcolor = yellow reversecolor = turquoise strand_arrow = 1 height = 6 description = 1 key = Named gene The track is named "Genes". You may use a short mnemonic if you prefer. As in the general configuration section, the track declaration contains multiple name=value option pairs. Valid options are as follows: a) feature This relates the track to one or more feature types as they appear in the database. Recall that each feature has a method and source. This is represented in the form method:source. So, for example, a feature of type "gene:sgd" has the method "gene" and the source "sgd". It is possible to omit the source. A feature of type "gene" will include all features whose methods are "gene", regardless of the source field. It is not possible to omit the method. It is possible to have several feature types displayed on a single track. Simply provide the feature option with a space-delimited list of the features you want to include. For example: feature = gene:sgd stRNA:sgd This will include features of type "gene:sgd" and "stRNA:sgd" in the same track and display them in a similar fashion. b) glyph This controls the glyph (graphical icon) that is used to represent the feature. The list of glyphs and glyph-specific options are listed in the section GLYPHS AND GLYPH OPTIONS. The "generic" glyph is the default. c) bgcolor This controls the background color of the glyph. The format of colors is explained in GLYPHS AND GLYPH OPTIONS. d) fgcolor This controls the foreground color (outline color) of the glyph. The format of colors is explained in GLYPHS AND GLYPH OPTIONS. e) fontcolor This controls the color of the primary font of text drawn in the glyph. This is the font used for the features labels drawn at the top of the glyph. f) font2color This controls the color of the secondary font of text drawn in the glyph. This is the font used for the longish feature descriptions drawn at the bottom of the glyphs. g) height This option sets the height of the glyph. It is expressed in pixels. h) strand_arrow This is a true or false value, where true is 1 and false is 0. If this option is set to true, then the glyph will indicate the strandedness of the feature, usually by drawing an arrow of some sort. Some glyphs are inherently stranded, or inherently non-stranded and simply ignore this option. i) label This is a true or false value, where true is 1 and false is 0. If the option is set to true, then the name of the feature (i.e. its group name) is printed above the feature, space allowing. j) description This is a true or false value, where true is 1 and false is 0. If the option is set to true, then the description of the feature (any Note fields) is printed below the feature, space allowing. k) key This option controls the descriptive key that is drawn in the key area at the bottom of the image. It also appears in the checkboxes that the end user uses to switch tracks on and off. If not specified, it defaults to the track name. l) citation If present, this option creates a human-readable descriptive paragraph describing the feature and how it was derived. This is the text information that is displayed when the user clicks on the track name in the checkbox group. The value can either be a URL, in which case clicking on the track name invokes the corresponding URL, or a text paragraph, in which case clicking on the track name generates a page containing the text description. Long paragraphs can be continued across multiple lines, provided that continuation lines begin with whitespace. m) link, title, link_target These options are identical to the similarly-named options in the [GENERAL] section, but change the rules on a track-by-track basis. They can be used to override the global rules. To force a track not to contain any links, use a blank value. n) feature_low If this option is present, GBrowse will use the list of feature types listed here at resolution views. (This is one of the ways that semantic zooming is implemented.) This allows you, for example, to switch off detailed exon, UTR, promoters and other within-the-gene features, and just show the start and stop of the transcription unit. o) global feature If this option is present and set to a true value (e.g. "1"), GBrowse will automatically generate a pseudo-feature that starts at the beginning of the currently displayed region and extends to the end. This is often used in conjunction with the "translation" and "dna" glyphs in order to display global characteristics of the sequence. If this option is set, then you do not need to specify a "feature" option. f) group pattern This option lets you connect related features by dotted lines based on a pattern match in the features' names. A typical example is connecting the 5' and 3' read pairs from ESTs or plasmids. See GROUPING FEATURES for details. A large number of glyph-specific options are also recognized. These are described in the next section. --- B3. Glyphs and Glyph Options A large variety of glyphs are available, and more are being added as the Bio::Graphics module grows. A list of the common glyphs and their options is provided by the GBrowse itself. Click on the "[Help]" link in the section labeled "Upload your own annotations". This page also lists the valid foreground and background colors. The most popular glyph types are: Glyph Description ----- ----------- generic a rectangle arrow an arrow cds shows the reading frame of spliced transcripts; used in conjunction with the "coding" aggregator. diamond a point-like feature represented as a triangle dna DNA and GC content heterogeneous_segments a multi-segmented feature in which each segment can have a distinctive color. For Jim Kent's WABA features, this works with the waba_alignment aggregator. segments a multi-segmented feature such as an alignment triangle a point-like feature represented as a diamond transcript a gene model transcript2 a slightly different representation of a gene model translation 1-, 3- and 6-frame translations wormbase_transcript yet another gene model that can show UTR segments (for features that conform to the WormBase gene schema). Used in conjunction with the "wormbase_gene" aggregator. A more definitive list of glyph options can be found in the Bio::Graphics manual pages. Consult the manual pages for the following modules: Glyph Manual Page ----- ----------- arrow Bio::Graphics::Glyph::arrow cds Bio::Graphics::Glyph::cds crossbox Bio::Graphics::Glyph::crossbox diamond Bio::Graphics::Glyph::diamond dna Bio::Graphics::Glyph::dna dot Bio::Graphics::Glyph::dot ellipse Bio::Graphics::Glyph::ellipse extending_arrow Bio::Graphics::Glyph::extending_arrow generic Bio::Graphics::Glyph graded_segments Bio::Graphics::Glyph::graded_segments heterogeneous_segments Bio::Graphics::Glyph::heterogeneous_segments line Bio::Graphics::Glyph::line primers Bio::Graphics::Glyph::primers rndrect Bio::Graphics::Glyph::rndrect ruler_arrow Bio::Graphics::Glyph::ruler_arrow segments Bio::Graphics::Glyph::segments toomany Bio::Graphics::Glyph::toomany transcript Bio::Graphics::Glyph::transcript transcript2 Bio::Graphics::Glyph::transcript2 translation Bio::Graphics::Glyph::translation triangle Bio::Graphics::Glyph::triangle wormbase_transcript Bio::Graphics::Glyph::wormbase_transcript The "perldoc" command is handy for reading the documentation from the Unix command line. For example: perldoc Bio::Graphics::Glyph::primers This will provide you with a summary of the options that apply to the "primers" glyph. In the manual pages, the glyph options are presented the way they are called from Perl. For example, the documentation will tell you to use the -connect_color option to set the color to use when drawing the line that connects the two inward pointing arrows in the primer pair glyph. This translates to the configuration file as an option named "connect_color". For example: [PCR Products] glyph = primer connect_color = blue When referring to colors, you can use a variety of color names such as "blue" and "green". To get the full list, cut and paste the following magic incantation into the command line: perl -MBio::Graphics::Panel -e 'print join "\n",Bio::Graphics::Panel->color_names' or see this URL: http://www.wormbase.org/db/seq/gbrowse?help=annotation Alternatively, you can use the #RRGGBB notation to specify the red, green and blue components of the color. Refer to any book on HTML for the details on using the notation. --- B4. Adding features to the overview You can make any set of tracks appear in the overview by creating a stanza with a title of the format [