Generic Genome Browser Version 2: A Tutorial for Administrators
Author: Lincoln Stein, 20 January 2010
Modified for Workshop on Comparative Genomics, North America by Sheldon McKay, July 2011
table of contents
0. Expected Learning Outcome
The objective of this learning activity to establish familiarity with, and provide guidance for, the primary steps in configuring GBrowse as a server.
Important: this activity is designed for GBrowse 2.00 and will not work with earlier versions of the software.
1. The Basics
As a starting point it is assumed that you have successfully set up Perl, GD, BioPerl and the other GBrowse dependencies. If you have not, please see the GBrowse HOWTO For this tutorial, we will be using the "in-memory" GBrowse database (no relational database required).
- data_files/: DNA and features files to load into the local database.
- conf_files/: GBrowse configuration files for you to take and modify.
On the Ubuntu filesystem, the locations of the above files are:
/var/www/gbrowse2/tutorial/data_files /var/www/gbrowse2/tutorial/conf_files
The location of the "live" configuration files on the Ubuntu file system is:
/etc/gbrowse2
The location of the "live" data files is:
/var/www/gbrowse2/databases/volvox
We will be working with simulated Volvox genome annotation data. The database will be named "volvox" and GBrowse will be invoked with this URL:
http://localhost/cgi-bin/gb2/gbrowse/volvox
You'll now edit the main GBrowse.conf configuration file to tell it about the new data source. Open /etc/gbrowse2/GBrowse.conf with a text editor.
$ cd /etc/gbrowse2 $ sudo gedit GBrowse.conf
Now paste in the stanza:
[volvox] description = Tutorial database path = volvox.conf
Be sure to leave a blank line between the bottom of the previous stanza and the top of the new one (i.e., there should be a blank line above "[volvox]").
You should now be able to view the data set. Point your web browser at http://localhost/cgi-bin/gb2/gbrowse/volvox and type in "ctgA" in the search box. The result is shown in Figure 1.

Figure 1: volvox_remarks.gff3 data with volvox.conf config file.
If You are Having Problems...
If for some reason you get a blank page or an "Internal server error," there are a couple of things to check. First, open the file volvox.conf with a text editor ("Notepad" on Windows systems, emacs, pico or vi on UNIX systems) and confirm that the path to the volvox database directory in this section is correct:
[GENERAL] db_adaptor = Bio::DB::SeqFeature::Store db_args = -adaptor memory -dir '/var/www/gbrowse2/databases/volvox'
If there is a space in /var/www/gbrowse2 then you must be certain to put single quotes around the path as shown in the example above.
Next check that the volvox_remarks.gff3 file does exist inside the volvox database directory and that it is readable by all users on your system. Similarly, check that the volvox.conf configuration file is in the same directory as yeast.conf, and that it is readable by all users on your system.
Microsoft Windows has an unpleasant tendency to add a .txt extension to files without warning. If something seems to be wrong with the config or GFF file and you can't figure out what, check that the file extension hasn't been modified. To avoid this phenomenon, I suggest that you select All File Types from the popup menu in the File -> Save dialog. You might also want to configure your Folder display to show known file extensions.
If you're still having no luck, check the bottom of the Apache server error log for error messages. This file is located in various places depending on how Apache is installed. Look for the file error_log, typically located in /usr/local/apache/logs, C:\Program Files\Apache Group\Apache2\logs, /var/log/www, or /var/log/httpd. The error message will usually point you in the right direction.
If this doesn't fix the problem, please ask an instructor for help or send an e-mail to GBrowse support at gmod-gbrowse@lists.sourceforge.net. Someone will be happy to assist you.
1.1 The Data File
Let's look at the data file we loaded in detail, now. If you open the volvox_remarks.gff3 file in a text editor, you will see that it contains a series of 15 genome "features" that look like this:
ctgA example contig 1 50000 . . . Name=ctgA ctgA example remark 1659 1984 . + . Name=f07;Note=This is an example ctgA example remark 3014 6130 . + . Name=f06;Note=This is another example ctgA example remark 4715 5968 . - . Name=f05;Note=Ok! Ok! I get the message. ctgA example remark 13280 16394 . + . Name=f08 ...
Each feature has a "source" of "example", a type of "remark", and occupies a short range (roughly 1.5k) on a contig named "ctgA." In addition to the features themselves, there is an entry for the contig itself (type "contig"). This entry is needed to tell GBrowse what the length of ctgA is.
The load file uses a standard known as GFF3 (General Feature Format version 3). Each line of the file corresponds to a feature on the genome, and the nine columns are separated by tabs.
The 9 columns are as follows:
- reference sequence
This is the name of the feature that will be used to establish the coordinate system for the annotation. This is usually the name of a chromosome, a clone, or a contig. In our example, the reference sequence is "ctgA". A single GFF file can refer to multiple reference sequences. - source
The source of the annotation. This field describes how the feature was derived. In the example, the source is "example" for want of a better description. Many people find the source as a way of distinguishing between similar features that were derived by different methods, for example, gene calls derived from different prediction software. You can leave this column blank by replacing the source with a single dot ("."). - type
This column describes the feature type. Although you can choose anything you like to describe the feature type, you are strongly encouraged to use well-recognized sequence ontology (SO) terms such as "gene", "repeat_region", "exon", and "CDS." You can find a list of the recognized SO terms at the Sequence Ontology Project web site. For lack of a better name, the features in the volvox example are of type "remark." - start position
The position that the feature starts at, relative to the reference sequence. The first base of the reference sequence is position 1. - end position
The end of the feature, again relative to the reference sequence. End is always greater than or equal to start. - score
For features that have a numeric score, such as sequence similarities, this field holds the score. Score units are arbitrary, but most people use the expectation value for similarity features. You can leave it blank by replacing the column with a dot. - strand
For features that are strand-specific, this field is the strand on which the annotation resides. It is "+" for the forward strand, "-" for the reverse strand, or "." for annotations that are not stranded. If you are unsure of whether a feature is stranded, it won't hurt to use a "+" here. - phase
For CDS features that encode proteins, this field describes where the next codon starts. The phase is one of the integers 0, 1, or 2, indicating the number of bases that should be removed from the beginning of this feature in order to reach the first base of the next codon. In other words, a phase of "0" indicates that the next codon begins at the first base of the region described by the current line, a phase of "1" indicates that the next codon begins at the second base of this region, and a phase of "2" indicates that the next codon begins at the third base of this region. This information is used by the "cds" glyph to show how the reading frame changes across splice sites. For all other feature types, use a dot here. - attributes
A list of feature attributes in the format tag=value. Multiple tag=value pairs are separated by semicolons. URL escaping rules are used for tags or values containing the following characters: ",=;". Spaces are allowed in this field, but tabs must be replaced with the %09 URL escape.These tags have predefined meanings:
- ID: gives the feature a unique identifier. Useful when grouping features together (such as all the exons in a transcript).
- Name: display name for the feature. This is the name to be displayed to the user.
- Alias: A secondary name for the feature. It is suggested that this tag be used whenever a secondary identifier for the feature is needed, such as locus names and accession numbers.
- Note: A descriptive note to be attached to the feature. This will be displayed as the feature's description.
Alias and Note fields can have multiple values separated by commas. For example:
Alias=M19211,gna-12,GAMMA-GLOBULIN
Other good stuff can go into the attributes field, as we shall see later.
It is very important to have a full-length entry (such as the one for ctgA) for each reference sequence mentioned in the first column of the GFF3 file. However, the reference sequence can have any source and type you choose. Commonly used types are "clone", "chromosome" and "contig."
1.2. Defining Tracks
Now we'll look at the configuration file in more detail. Using a text editor, open the volvox.conf file from its location in the gbrowse.conf configuraton directory. (If you mess up, you can always copy a fresh version from volvox.conf in the tutorial directory.)
At the top is a [GENERAL] section which defines basic things such as the database backend to use, the path to the database files, which plugins to activate, and which tracks to show by default. In the volvox.conf example:
[GENERAL] db_adaptor = Bio::DB::SeqFeature::Store db_args = -adaptor memory -dir '/var/www/gbrowse2/databases/volvox' # just the basic track dumper plugin plugins = TrackDumper # list of tracks to turn on by default default features = ExampleFeatures # size of the region region segment = 10000 # examples to show in the introduction examples = ctgA # feature to show on startup initial landmark = ctgA:5000..10000
We'll discuss how to customize these options later. For now, focus on the section at the bottom of the file, which starts with the line:
### TRACK CONFIGURATION ###:
[ExampleFeatures] feature = remark glyph = generic stranded = 1 bgcolor = blue height = 10 key = Example Features
This is a "stanza" that describes one of the tracks displayed by GBrowse. The track has an internal name of "ExampleFeatures" which you can use in the URL to turn the track on. The internal name is enclosed by square brackets. Names can be any combination of printable characters and whitespace, but can not contain the hyphen ("-")
character (you can use the underscore "_" character instead).
Following the track name are a series of options that configure the track. The "feature" option indicates what feature type(s) to display inside the track. It's currently set to display the "remark" feature type. The "glyph" option specifies the shape of the rendered feature. The default is "generic", which is a simple filled box, but there are dozens of glyphs to choose from. The "stranded" option tells the generic glyph to try to display the strandedness of the feature -- this is what creates the little arrow at the end of the box. "bgcolor" and "height" control the background color and height of the glyph respectively, and "key" assigns the track a human-readable label.
Let's experiment with changing the track definition. First, let's change the color of the glyph. With your text editor, change the bgcolor option from blue to "orange", save it, and reload the page. The features should change immediately, as shown in Figure 2:

Figure 2: A Feature of a Different Color
Note: Many of the screenshots in this tutorial are from earlier versions of GBrowse and may not look exactly the same as the current version.
Please experiment with other changes! Try changing the height to 5, the key to "Skinny features" and the stranded option to 0 (which means "false"). Just by changing a few options, you can create a very distinctive track.
Now let's try changing the glyph. One of the standard glyphs was designed to show PCR primer pairs and is called "primers". Change "glyph = generic" to "glyph = primers" and reload the page. Depending on other changes that you might have made earlier, the result will look something like Figure 3.

Figure 3: Using the primers Glyph
We'll see other examples of glyphs later on. To get a list of the most popular glyphs and the options that are available for them, see the file CONFIGURE_HOWTO.txt, located in the docs/ subdirectory of the GBrowse distribution. Or for the gory and bleeding edge details, run the command:
$ glyph_help.pl -l
This will give you a list of all the glyphs. Running the command with the name of the glyph will give you copious documentation on all the options the glyph recognizes.
1.3. Adding Descriptions to a Feature
By default, GBrowse will display the name of the feature above its glyph provided that there is sufficient space to do this. Optionally, you can also attach some descriptive text to the feature. This text will be displayed below the feature, and can also be searched.
You can place descriptions, notes and other comments into the ninth column of the GFF load file. The example file volvox_domains.gff3 shows how this is done. An excerpt from the top of the file looks like this:
ctgA example polypeptide_domain 11911 15561 . + . Name=m11;Note=kinase ctgA example polypeptide_domain 13801 14007 . - . Name=m05;Note=helix loop helix ctgA example polypeptide_domain 14731 17239 . - . Name=m14;Note=kinase ctgA example polypeptide_domain 15396 16159 . + . Name=m03;Note=zinc finger
This defines several new features of type "polypeptide_domain". The ninth column, in addition to giving each of the motifs names, adds a "Note" attribute to each feature. As described earlier, each attribute is a name=value pair separated by semicolons.
The attribute named Note is automatically displayed and made searchable. To see this work, add volvox_domains.gff3 to the volvox database directory. You can do this just by copying the file into /var/www/gbrowse2/databases/volvox so that the directory contains both the original volvox_remarks.gff3 and the new volvox_domains.gff3 files.
To display this newly-loaded data set, open up volvox.conf and add the following new stanza to the config file:
[Motifs] feature = polypeptide_domain glyph = span height = 5 description = 1 key = Example motifs
This defines a new track whose internal name is "Motifs." The corresponding feature type is "motif" and it uses the "span" glyph, a graphic that displays a horizontal line capped by vertical endpoints. The height is set to five pixels, and the human-readable key is set to "Example motifs." A new option, "description", is a flag that tells GBrowse to display the Note attribute, if any. Any non-zero value means true.
After updating the configuration file, you will need to reload the browser page and turn on the "Example motifs" checkbox below the main image. The result is shown in Figure 4.

Figure 4: Showing the Notes attribute
A copy of this config file is also available for you to use in volvox2.conf.
To show that GBrowse will search the notes for keyword matches, try typing in "kinase." You will be presented with a list of all the motifs whose Note attribute contains the word "kinase."
1.4. Adjusting GBrowse Name Searches
GBrowse has a very flexible search feature. You can type in the name of a reference sequence, such as "ctgA", and it will display the entire thing, or you can type in a range in the format "ctgA:start..stop". Try "ctgA:5000..8000" to see this at work.
In addition, GBrowse can search for features by name. Anything that has a Name or Alias attribute in the GFF3 file can be searched for by name. For example, try searching for "f10" or even "f1*".
The only drawback to this is that you may have name collisions. For example, some research communities distinguish genes from their products using differences in capitalization, for example hga and HGA. However, GBrowse's searches are case insensitive. To avoid name collisions, you can give each type of feature a distinctive naming prefix, for example "Gene:hga" and "Protein:HGB".
To illustrate how this works, have a look at volvox_geneproducts.gff3:
ctgA example remark 1000 2000 . . . Name=hga ctgA example protein_coding_primary_transcript 1100 2000 . + . Name=Gene:hga ctgA example polypeptide 1200 1900 . + . Name=Protein:HGA ctgA example protein_coding_primary_transcript 1600 3000 . - . Name=Gene:hgb ctgA example polypeptide 1800 2900 . - . Name=Protein:HGB
Copy volvox_geneproducts.gff3 into the databases/volvox folder. Now add the following configuration stanza to volvox.conf to create a track that displays both protein_coding_primary_transcript and polypeptide features:
[NameTest] feature = protein_coding_primary_transcript polypeptide glyph = generic stranded = 1 bgcolor = green height = 10 key = Name test track
This stanza creates a new track named "Name test track" and displays features of type "protein_coding_primary_transcript" and "polypeptide" using green generic glyphs that are 10 pixels high. When you look at the data file, you'll see that there are three things potentially named "HGA", a remark which uses the unqualified name, a gene which uses the qualified name "Gene:hga", and a polypeptide region which uses the qualified name "Protein:HGA." There is also a protein_coding_primary_transcript named "Gene:hgb" and a protein named "Protein:HGB." (Note, in this track we are using slightly awkward sequence ontology terms, like "protein_coding_primary_transcript," rather than more natural terms like "gene" in order to avoid these example features from appearing in the real "gene" track that we create later on in this tutorial.)
To see how GBrowse searches for names, type "HGA" (either upper or lowercase) in the search textbox and press "Search." Because the search term matches three remark whose unqualified name is HGA, GBrowse will bring up the region between 1000..2000 and highlight the HGA remark.
Now search for "Protein:HGA." Because you searched with the qualified name, GBrowse will find and highlight the protein feature.
Now try to search for "HGB." This search fails because HGB only exists in qualified form in the database. You can still, however, search for "Gene:HGB" or "Protein:Hgb" (capitalization doesn't matter). This may or may not be the behavior that you desire. If you would like GBrowse to search through qualified names when the user types the unqualified version, you can configure this easily by adding the following line to volvox.conf under the [General] section:
automatic classes = Gene Protein
This option directs GBrowse to search for the unqualified name first, followed by names prefixed with "Gene:" and then names prefixed with "Protein:". Whichever is found first will be displayed. Now searching for "HGB" will find "Gene:hgb". Swapping the order of Gene and Protein on this line will cause the "Protein:HGB" to be found.
Another way to approach this is to make liberal use of the Alias attribute. For example:
ctgA example remark 1000 2000 . . . Name=Remark:HGA;Alias=hga ctgA example protein_coding_primary_transcript 1100 2000 . + . Name=Gene:hga;Alias=hga ctgA example polypeptide 1200 1900 . + . Name=Protein:HGA;Alias=hga ctgA example protein_coding_primary_transcript 1600 3000 . - . Name=Gene:hgb;Alias=hgb ctgA example polypeptide 1800 2900 . - . Name=Protein:HGB;Alias=hgb
This assigns the alias of "hga" to each of the three HGA features, and an alias of "hgb" to each of the two HGB features. This keeps the identities of these features distinct so that you can find particular ones by typing in the fully qualified name ("Gene:hga"), but find all candidates when you type in the unqualified name. For instance, when you search with "hga", GBrowse will now offer you three matches:

Figure 5: Searching for aliases
To try this out, simply open the installed volvox2b.gff3 file with a text editor and edit it to match the example above.
Note: GBrowse caches track images for performance reasons. If you make some changes to the data or config files and don't get the expected behavior, try turning off caching by going to the Preferences link, and turning off the checkbox marked "Cache tracks."
1.5. Linking
The next topic we'll cover in this tutorial is configuring GBrowse's outgoing links. When the user clicks on a glyph in the details image, they will be taken to another page by following a URL. The URL to follow is generated from the link option. The default link option is located in the [TRACK DEFAULTS] section of the config file; you can specify track-specific links by placing a link option in one or more of the individual track stanzas.
The volvox.conf track defaults looks like this:
[TRACK DEFAULTS] glyph = generic height = 10 bgcolor = lightgrey fgcolor = black font2color = blue label density = 25 bump density = 100 # where to link to when user clicks in detailed view link = AUTO
In this case, we've been using a special link URL of "AUTO." This generates an automatic link to a helper script named "gbrowse_details." If you click on some of the features in the current volvox page, you'll get an idea of what this script displays.
We're going to override the default link rule for the motif track. There's nothing sensible to link to, so we'll link to Google using first the motif's name, and then the motif's description.
Go to the [Motifs] stanza in the volvox.conf config file and modify it so that it looks like this:
[Motifs] feature = polypeptide_domain glyph = span height = 5 description = 1 link = http://www.google.com/search?q=$name key = Example motifs
The only change we've made is to add a "link" option to the stanza, where the value is a Google search URL. "$name" is a Perl variable. GBrowse will fill in this variable with the name of the motif. Reload the page and click on a motif to see that this works as advertised ("m01," "m02" and the other example motifs are similar to the names for galactic clusters, so be prepared for some astronomy hits).
It would be more sensible to link to the description of the motif, for example "helix loop helix." Fortunately we can do that too. Just change the link option to:
link = http://www.google.com/search?q=$description
There are a large number of possible variables that you can use inside link rules. See the GBrowse_2.0_HOWTO page on the GMOD wiki for the full list. You can also construct links using Perl callbacks as described in the section on displaying ESTs. This gives you the ability to generate any arbitrary URL.
If you want nothing to happen when the user clicks on a feature, just set link to empty ("link = ").
The last thing we'll do is to change the behavior of the [Motif] track so that:
- a new window pops up with the google search rather than replacing the contents of the current window
- when the user mouses over a motif, a hints box will appear telling them that clicking there will initiate a google search
These changes are easy:
[Motifs] feature = polypeptide_domain glyph = span height = 5 description = 1 link = http://www.google.com/search?q=$description link_target = _blank title = Search Google for $description. key = Example motifs
There's now a link_target option. This contains the name of a browser window in which to load the content when the user clicks on the feature. If there's no window of that name, the browser will create a new window and give it the desired name. Choose an ordinary name like "Google" if you want the Google content to be loaded into the same window each time, or choose "_blank" as we've done here in order to pop up a new fresh window each time the user clicks.
The title option contains a bit of text that will be displayed whenever the user hovers the mouse over the feature for a second or two. The same variable substitution rules apply, so when the user mouses over feature "m06", a balloon will pop up that says "Search Google for SUSHI repeat." Give it a try!
1.6. Adding Popup Balloons to Tracks
The title option is a simple way to add popup balloons to tracks. In addition to this feature, GBrowse can display popup balloons when the user hovers over or clicks on a feature. The balloons can display arbitrary HTML, either provided in the config file, or fetched remotely via a URL. You can use this feature to create multiple choice menus when the user clicks on the feature, to pop up images on mouse hovers, or even to create little embedded query forms. See http://mckay.cshl.edu/balloons.html for examples.
To activate custom balloons, add "balloon hover" and/or ""balloon click" options to the track stanzas that you wish to add buttons to. You can also place these options in [TRACK DEFAULTS] to create a default balloon.
"balloon hover" specifies HTML or a URL that will be displayed when the user hovers over a feature. "balloon click" specifies HTML or a URL that will appear when the user clicks on a feature. The HTML can contain images, formatted text, and even controls. Examples:
balloon hover = <h2>Gene $name</h2>
balloon click = <h2>Gene $name</h2>
<a href='http://www.google.com/search?q=$name'>Search Google</a><br>
<a href='http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&term=$name'>Search NCBI</a><br>
For example, to add a balloon to the motifs track of the Volvox browser, add "balloon tips = 1" near the top of the volvox.conf file, and then add balloon hover and balloon click options like this:
[Motifs]
feature = polypeptide_domain
glyph = span
height = 5
description = 1
balloon hover = <h2>Gene $name</h2>
balloon click = <h2>Gene $name</h2>
<a href='http://www.google.com/search?q=$name'>Search Google</a><br>
<a href='http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&term=$name'>Search NCBI</a><br>
key = Example motifs
You can also populate the balloon contents dynamically using data from a local or remote web server. This facility, as well as options for customizing balloon appearance, is described in the GBrowse2 HOWTO.
2. Displaying Common Types of Features
Now that you've seen the basics, we'll discuss techniques to display multi-part features, genes, alignments, quantitative data and other special feature types.
2.1. Multi-segmented features
Many features are discontinuous. Examples include spliced transcripts, and gapped sequence similarity alignments, such as the alignment of cDNAs to the genome. GBrowse can deal with such features easily provided that you take a little care in setting them up.
The data file volvox_matches.gff3 contains a simulated data set of a series of gapped nucleotide alignments. An excerpt from the file is here:
ctgA example match 32329 32359 . + . ID=match-seg01;Name=seg01 ctgA example match 26122 26126 . + . ID=match-seg02;Name=seg02 ctgA example match 26497 26869 . + . ID=match-seg02;Name=seg02 ctgA example match 27201 27325 . + . ID=match-seg02;Name=seg02 ctgA example match 27372 27433 . + . ID=match-seg02;Name=seg02 ctgA example match 27565 27565 . + . ID=match-seg02;Name=seg02 ctgA example match 27813 28091 . + . ID=match-seg02;Name=seg02 ctgA example match 28093 28201 . + . ID=match-seg02;Name=seg02 ctgA example match 28329 28377 . + . ID=match-seg02;Name=seg02 ctgA example match 28829 29194 . + . ID=match-seg02;Name=seg02 ctgA example match 6885 7241 . - . ID=match-seg03;Name=seg03 ctgA example match 7410 7737 . - . ID=match-seg03;Name=seg03 ctgA example match 8055 8080 . - . ID=match-seg03;Name=seg03 ctgA example match 8306 8999 . - . ID=match-seg03;Name=seg03
This file uses a new GFF3 attribute, "ID". The ID attribute is used to group features together and to indicate when a single feature occupies multiple discontinuous locations. In the case of a gapped alignment, each ungapped segment is represented by a single GFF3 line. The segments of a single alignment are then grouped together by using the
same ID. For example "match-seg03" starts at position 6885 and ends at 8999. It has four subsegments, one from 6885..7241, another from 7410..7737, and so forth.
The ID attribute is not the same as the Name attribute. If you give three lines the same ID, they will be grouped together into a single displayed feature. If you give three lines the same Name you will end up with three distinct features that all happen to share the same name. Also note that except for the coordinates and the score (which we'll discuss later) all columns for each of the parts of a multisegmented feature should be the same. For example, you can't have one part of a feature on the (+) strand and another part on the (-) strand.
Copy volvox_matches.gff3 into the volvox database directory. Then edit volvox.conf to add the following track definition:
[Alignments] feature = match glyph = segments key = Example alignments
This is declaring a new track named "Alignments" which displays features of type "match" using a glyph named "segments". The segments glyph is specialized for displaying objects that have multiple similar subparts. Reload the page and activate the "Example alignments" track. You should see a track similar to Figure 6.

Figure 6: Use the "segments" glyph to display discontinuous multipart features.
2.2. Protein-Coding Genes
GBrowse can display protein-coding genes in various shapes and styles. The easiest way to set this up is to use the
sequence ontology's canonical description of a gene along with the "gene" glyph. Take a look at the file volvox_genes.gff3, which defines a gene named EDEN, and its three spliced forms named EDEN.1, EDEN.2 and EDEN.3. Here is the contents of the file:
ctgA example gene 1050 9000 . + . ID=EDEN;Name=EDEN;Note=protein kinase ctgA example mRNA 1050 9000 . + . ID=EDEN.1;Parent=EDEN;Name=EDEN.1;Index=1 ctgA example five_prime_UTR 1050 1200 . + . Parent=EDEN.1 ctgA example CDS 1201 1500 . + 0 Parent=EDEN.1 ctgA example CDS 3000 3902 . + 0 Parent=EDEN.1 ctgA example CDS 5000 5500 . + 0 Parent=EDEN.1 ctgA example CDS 7000 7608 . + 0 Parent=EDEN.1 ctgA example three_prime_UTR 7609 9000 . + . Parent=EDEN.1 ctgA example mRNA 1050 9000 . + . ID=EDEN.2;Parent=EDEN;Name=EDEN.2;Index=1 ctgA example five_prime_UTR 1050 1200 . + . Parent=EDEN.2 ctgA example CDS 1201 1500 . + 0 Parent=EDEN.2 ctgA example CDS 5000 5500 . + 0 Parent=EDEN.2 ctgA example CDS 7000 7608 . + 0 Parent=EDEN.2 ctgA example three_prime_UTR 7609 9000 . + . Parent=EDEN.2 ctgA example mRNA 1300 9000 . + . ID=EDEN.3;Parent=EDEN;Name=EDEN.3;Index=1 ctgA example five_prime_UTR 1300 1500 . + . Parent=EDEN.3 ctgA example five_prime_UTR 3000 3300 . + . Parent=EDEN.3 ctgA example CDS 3301 3902 . + 0 Parent=EDEN.3 ctgA example CDS 5000 5500 . + 1 Parent=EDEN.3 ctgA example CDS 7000 7600 . + 1 Parent=EDEN.3 ctgA example three_prime_UTR 7601 9000 . + . Parent=EDEN.3
GFF3 uses a three-tiered structure to represent the gene, descending from gene to mRNA to CDS and UTR features. A gene has potentially many mRNAs, and each mRNA has potentially several CDS and UTR features. To describe how the parts fit together, we use ID and Parent features.
We start with a feature of type "gene" with the ID "EDEN". This has three alternative splice forms named EDEN.1, EDEN.2 and EDEN.3. To tell GBrowse that each of these splice forms are part of the same gene, we give each one a Parent attribute of "EDEN" corresponding to the ID of the parent gene. Now consider mRNA EDEN.1. It has a five_prime_UTR feature, a three_prime_UTR feature, and four CDS features. To indicate that the CDS and UTR features belong to the
mRNA, we give the mRNA a unique ID of "EDEN.1" and give each of the subfeatures a corresponding parent. This pattern repeats for each of the other two splice forms. Note how the five_prime_UTR of EDEN.3 is split in two parts.
As before, we use "Name" to give the gene and its alternative splice forms a human-readable name, and use Note to provide a description for the gene as a whole (you can add notes to the individual mRNAs but they won't display by default). The Index=1 attribute is a hint to the database to make the mRNAs searchable by name. This lets users find
the gene by searching for the mRNA names ("EDEN.1") as well as by the gene name ("EDEN"). However, it is usually unnecessary to do this. Also notice that we are using the Phase column for the CDS features to describe how the CDS is translated into protein. See the description of phase in the data file section.
This is the full way to describe genes. Simpler ways are described later in this section.
Go ahead and add volvox_genes.gff3 to the database. Then add the following new stanza to the bottom of the file:
[Genes] feature = gene glyph = gene bgcolor = peachpuff label_transcripts = 1 draw_translation = 1 category = Genes label_transcripts = 1 key = Protein-coding genes
The new Genes track associates "gene" features with the "gene" glyph, sets its background color to peachpuff (yes, there really is a color by this name!), turns on the description lines, and sets the human readable track name to "Protein-coding genes." Also, since our track table is starting to get a little crowded, this stanza uses the "category" option to start a separate section in the track table for tracks having to do with genes.
Upon reloading the page, turning on the new "Protein-coding genes" track, and viewing the region around 1..10K, you'll see this:

Figure 7: The canonical gene
The gene glyph has a number of options that you can use to customize its appearance:
| Option Name | Possible values | Description |
|---|---|---|
| thin_utr | 0 (false), 1 (true) | If true, makes UTRs half-height. |
| utr_color | a color name ("gray" by default) | Changes the UTR color. |
| decorate_introns | 0 (false), 1 (true) | If true, puts little arrowheads on the introns to indicate direction of transcription. |
Using these options, we can make the track look like the UCSC Genome Browser (Figure 8).
[Genes] feature = gene glyph = gene height = 9 bgcolor = black utr_color = black thin_utr = 1 decorate_introns = 1 description = 1 label_transcripts= 1 category = Genes key = Protein-coding genes

Figure 8: A UCSC Genome Browser lookalike
2.3. Reading Frames
Continuing with the example from section 2.2, the third exon of EDEN.1 is shared with EDEN.3. But is the reading frame preserved? The "cds" glyph will create a display that will visualize each CDS's reading frame.
To see this work, add the following stanza to the bottom of the configuration file:
[ReadingFrame] feature = mRNA glyph = cds ignore_empty_phase = 1 category = Genes key = Frame usage
When you reload the page and turn this track on, you'll see a "musical staff" representation of the frame usage (Figure 10). From this we can see that the alternative splicing in fact changes the reading frame of the second exon.
The "feature" option tells the glyph to take its data from the mRNA subfeatures of the main gene features. Note that depending on which data adaptor you use, you may need to specify the attribute "Index=1" for each of the mRNA subfeatures in order for the glyph to find them inside the gene object. However, this is usually unnecessary.

Figure 10: The "cds" glyph shows the reading frame using a musical staff notation
2.4. Grouped Features
In some circumstances you may wish to group features together to create a multipart feature. The gene object is actually just a special case of this. To show you the general case, we'll creature a feature of type "BAC", whose subparts are of type "clone_start" and "clone_end" (possibly corresponding to a BAC clone mapping experiment). Here is the GFF3 representation of this:
ctgA example BAC 1000 20000 . . . ID=b101.2;Name=b101.2;Note=Fingerprinted BAC with end reads ctgA example clone_start 1000 1500 . + . Parent=b101.2 ctgA example clone_end 19500 20000 . - . Parent=b101.2
As you can see, we've created a top-level feature of type "BAC" with two children of type "clone_start" and "clone_end" respectively. The start and end have opposite strands, indicating that they were sequenced off different strands of the BAC. The three features are tied together using the ID and Parent attributes that should be familiar to you from the gene examples.
This data lives in volvox_bacs.gff3. To visualize this, add the appropriate stanza to the bottom of volvox.conf:
[Clones] feature = BAC glyph = segments bgcolor = yellow connector = dashed strand_arrow = 1 description = 1 category = Clones key = Fingerprinted BACs
With this new track turned on, look at ctgA:1..24200. It will show that GBrowse has correctly picked up and rendered the relationship between the whole BAC and its two end reads (Figure 11). We have seen all these display options before with the exception of the "connector" option. This controls the appearance of the connecting line between subparts of a feature and can be one of "none", "solid", "dashed", "hat" or "quill". Try them and see what happens! (Note, you will have
to change the strandedness of the BAC parent feature from "." to "+" in order to see anything special happen with the quill connector.)

Figure 11: Displaying a simple multipart feature
2.5 Showing Quantitative Data (basic)
GBrowse can plot quantitative data such as alignment scores, confidence scores from gene prediction programs, and microarray intensity data. The data can be displayed either with glyphs that change color to indicate score levels (see the
"heterogeneous_segments", "graded_segments" and "redgreen_box" glyphs), or using a general-purpose XY-plot glyph.
Congratulations, Affymetrix has built a tiling array for the volvox genome! There's now a transcriptional profile for volvox, with an intensity reading every 100 bp across all of ctgA. The simulated data for this is in the file volvox_array.gff3, an excerpt of which is shown here:
ctgA affy microarray_oligo 1 100 281 . . Name=Expt1 ctgA affy microarray_oligo 101 200 183 . . Name=Expt1 ctgA affy microarray_oligo 201 300 213 . . Name=Expt1 ctgA affy microarray_oligo 301 400 191 . . Name=Expt1 ctgA affy microarray_oligo 401 500 288 . . Name=Expt1 ctgA affy microarray_oligo 501 600 184 . . Name=Expt1 ...
The file contains 500 features, each of which is exactly 100 bp long. The features are of type "microarray_oligo" and of
source "affy." Each one has a score (column 6) between 0 and 1000, where higher scores means more transcriptional activity. This is the first time we've used the score column.
All of the 500 features share the same Name (column 9) of "Expt1". Sharing the same name will allow us to group them together into a single transcriptional profiling experiment. However, we do not give them the same ID for reasons that are explained later. If we had multiple experiments to show, they would be named Expt1, Expt2 and so on.
We would like to generate a line graph that shows the transcriptional profile level across the current region. To do this, we need to group all members of the same experiment together into a single graph, and then to assign the "xyplot" glyph to the data. The following configuration stanza will do this:
[TransChip] feature = microarray_oligo glyph = xyplot graph_type = line fgcolor = black bgcolor = black height = 50 min_score = 0 max_score = 1000 scale = right group_on = display_name category = Quantitative Data key = Transcriptional Profile
The options shown here create a track named TransChip to display the tprofile feature with the xyplot glyph. The "graph_type", "height", "scale", "min_score", and "max_score" options all configure various aspects of the xyplot glyph's appearance.
(You can read all about xyplot's options using perldoc Bio::Graphics::Glyph::xyplot.)
When you reload the page and turn on the Transcriptional Profile track, you should see something like that shown in Figure 12.

Figure 12: A transcriptional profile rendered with the xyplot glyph
Using the info that perldoc provides, play around with the xyplot options a bit. For example, see what happens when you change graph_type to "boxes."
2.6 Quantitative Data (advanced)
The recipe in 2.5 works well for several thousand data points, but if you have very dense data, such as that produced by genomic tiling arrays, then you will want to use a specialized binary representation known as "wiggle" format. A wiggle track consists of a file that contains all the quantitative data, and a single feature in the database proper that points at that data file. Loading this data is a three-step process:
- Create a WIG file.
- Convert the WIG file into the wiggle binary file and a gff3 line.
- Load the gff3 line.
WIG format is a specialized format for describing quantitative data. It was created by Jim Kent for use in the UCSC genome browser. Details on creating WIG files are described at http://genome.ucsc.edu/goldenPath/help/wiggle.html.
WIG files are plain text files. They always begin with a "track" header, which, at a minimum, looks like this:
track type=wiggle_0 name="ArrayExpt1" description="20 degrees, 2 hr"
The "type" attribute is required, and must have a value of "wiggle_0". "name" and "description" are optional, but suggested, and indicate the name and description of the data series -- these will become the "Name" and "Note" fields of the generated GFF3 feature.
Following the track line comes the data for one or more chromosomal regions. As described in the UCSC documentation, there are three ways of formatting the data: (1)"Bed Format", (2) "variableStep", and (3) "fixedStep" format. The first format is essentially the same as GFF3 and does not give you any performance advantages over using straight GFF3. variableStep format describes intervals of the genome that have a fixed width, but begin at arbitrary locations, while fixedStep format describes features of the genome that are evenly spaced and have a fixed width (e.g. tiling array features).
For variableStep data, the format is:
variableStep chrom=chr19 span=150 59304701 10.0 59304901 12.5 59305401 15.0 59305601 17.5 59305901 20.0 59306081 17.5 59306301 15.0 59306691 12.5 59307871 10.0
The data is introduced by a line beginning with the keyword "variableStep", and the arguments "chrom" and "span", which indicate the chromosome on which the features are located, and the width of each feature, in base pairs. This is followed by a series of two-element lines indicating the start position of each feature, and its quantitative value. Values can be any sort of numeric data, including integers, negative numbers and floating point.
For fixedStep data, the format is:
fixedStep chrom=chr19 start=59307401 step=300 span=200 1000 900 800 700 600 500 400 300 200 100
The data is introduced by a line beginning with the keyword "fixedStep", and the arguments "chrom", "span", "start" and "step". The first two arguments are the same as before, while "start" and "step" indicate the starting position of the first feature, and the spacing between each feature. This is followed by a numeric value for each step. In this case, we have described 10 features beginning at position 59307401. Each feature begins 300 bp from the next and is 200 bp wide. In practice, this means that the first 200 bp of each interval is filled with known data, while information on the last 100 bp is "missing."
To see how this works in practice, let us reformat our example microarray data using the fixedStep version of WIG format. The complete data for this is in the file volvox_microarray.wig. It begins like this:
track type=wiggle_0 name="example" description="20 degrees, 2 hr" fixedStep chrom=ctgA start=1 step=100 span=100 281 183 213 191 288 ...
Compare this to the microarray data in Showing Quantitative Data (basic), and you will see that the five entries in the WIG file correspond to the first five features in the GFF3 files.
We'll now create the binary file for the data using the wiggle2gff3.pl script. First, copy volvox_microarray.wig into
the volvox database directory and then cd into that directory. Also, delete the volvox_array.gff3 file so we don't see the same set of data twice.
We want it to live in the volvox database directory, so we have to specify this path when creating it:
wiggle2gff3.pl --path=/var/www/gbrowse2/databases/volvox volvox_microarray.wig \
> volvox_microarray.gff3
After this script runs, it will write out a line of GFF3 data, which we save to volvox_microarray.gff3. This file will look like this:
##gff-version 3 ctgA . microarray_oligo 1 50000 . . . Name=example;Note=20%20degrees%2C%202%20hr;wigfile=/var/www/gbrowse2/databases/volvox/track001.ctgA.1200440492.wig
This file contains a single feature that spans the region indicated by the WIG file. The feature has the indicated name and description, and has a new attribute "wigfile" that points to the place where the quantitative data within the region can be found. You are free to edit this file to change the source or type. You can also set the source and type in wiggle2gff3.pl by passing it --source and --type options on the command line. If you move the binary wiggle file, please change the value of the "wigfile" attribute to indicate its new location.
One last step is needed to make the data display properly, however. You must set the glyph type to either "wiggle_xyplot" or "wiggle_density." These are the only glyphs that recognize and properly format wiggle-style data. You can also remove the min and max options, since the wiggle binary files store this information internally and it is no longer needed.
In the config file, change the [TransChip] stanza to look like this:
[TransChip] feature = microarray_oligo glyph = wiggle_xyplot graph_type = boxes height = 50 scale = right description = 1 category = Quantitative Data key = Transcriptional Profile
When you reload the page, the quantitative data should display correctly. You might notice a speed improvement; this becomes much more noticeable on large data sets.
Now, for some fun, change the [TransChip] section to use the "wiggle_density" glyph. Also set the bgcolor to "blue" and delete the unneeded graph_type and scale options.
[TransChip] feature = microarray_oligo glyph = wiggle_density height = 30 bgcolor = blue description = 1 category = Quantitative Data key = Transcriptional Profile
This is what the modified track will look like:

Figure 13: A transcriptional profile rendered with the wiggle_density glyph
2.7. DNA and 3-frame translations
GBrowse can take advantage of DNA sequence data in several ways:
- It can display a GC content graph of the reference sequence at low magnifications and the DNA sequence itself at higher magnifications.
- It can display three and six-frame translations of the reference sequence DNA.
- It can display the protein translation of coding regions.
- It can display aligned nucleotide sequences, creating a poor man's multiple alignment.
- It can directly display next generation sequencing data that has been stored in SAM or BAM format (see the GBrowse NGS Tutorial).
We've been working with feature coordinates, but no actual DNA sequence has been loaded into the volvox database. We will again rebuild the database, this time loading a simulated DNA file in fasta format. Download the file volvox.fa, and copy it into the volvox database directory. At this point in the tutorial, when you do a directory listing of the volvox database directory (with ls on UNIX systems, or dir/w on Windows systems), it should look like this:
% ls /var/www/gbrowse2/databases/volvox/ track001.ctgA.1202327456.wig volvox_domains.gff3 volvox_genes.gff3 volvox.fa volvox_remarks.gff3 volvox_matches.gff3 volvox_bacs.gff3 volvox2b.gff3 volvox_genes_simple.gff3 volvox_est.gff3 volvox_microarray.gff3
If you haven't done so already, please be sure that you have made the database directory writeable by the web server user, either by making it world writeable or by changing the directory's group ownership to match the Apache web server's group account (it varies from system to system, but "nobody", "www", "apache" and "www-data" are the most common possibilities).
This is all you need to do to load the DNA. To see that the DNA is indeed being loaded, add two new stanzas to the volvox.conf configuration file:
[DNA] glyph = dna global feature = 1 height = 40 do_gc = 1 gc_window = auto fgcolor = red axis_color = blue strand = both category = DNA key = DNA/GC Content [Translation] glyph = translation global feature = 1 height = 40 fgcolor = purple start_codons = 0 stop_codons = 1 translation = 6frame category = DNA key = 6-frame translation
The "DNA" track uses a specialized glyph called "dna". At low magnifications (zoomed way out), this glyph draws a GC content plot. At high magnifications (zoomed way in), this glyph draws the dna. Of the various options given in the example stanza, the most important one is "global feature", which is set to a true value (1). This tells GBrowse that the stanza doesn't correspond to a specific feature type, but should be displayed globally. Other options control whether to draw one or both strands, whether to draw the GC content histogram, the window size to use when smoothing the histogram, and what colors to use.
Similarly, the "Translation" track uses a glyph called "translation", which draws three or six-frame conceptual translations. At low magnifications (zoomed way out), this glyph draws little symbols indicating where start and stop codons are. At high magnifications, the actual amino acid sequence comes into view. Again, the most important option is "global feature", which is set to a true value to tell GBrowse that the track isn't attached to a particular feature type, but is to be generated automatically. Other options control the height of the glyph, whether to draw start and/or stop codon symbols, and whether to generate a 3frame or 6frame translation.
Figures 14a and 14b show the browser at low and high magnification, with both tracks activated. Notice that the coding track (the "cds" glyph) detects that the DNA is available and generates the transcripts' protein translations automatically.
(14A)

(14B)

Figure 14: Viewing DNA/GC content and 6-frame translation. (a) low magnification; (b) high magnification
If you happen to do a listing of the volvox database directory after adding the DNA file, you might notice that a new file named directory.index has appeared. This index directory is created automatically by GBrowse in order to speed up access to the .fa file and to reduce memory requirements. If the database directory is
not writable by all users, GBrowse will not be able to create this directory, and the display will be somewhat slower whenever a DNA track is turned on.
2.8 ESTs and Other Alignments
This section will lead you through creating a plausible EST track, and show you how grouping of 5' and 3' EST reads works.
We'll start with a simple data set containing information on three pairs of EST reads. You'll find this data set in volvox_est.gff3. Here is the first pair described in the data file:
ctgA est EST_match 1050 1500 . + . ID=Match1;Name=agt830.5 ctgA est EST_match 3000 3202 . + . ID=Match1;Name=agt830.5 ctgA est EST_match 5410 5500 . - . ID=Match2;Name=agt830.3 ctgA est EST_match 7000 7503 . - . ID=Match2;Name=agt830.3 ctgA est EST_match 1050 1500 . + . ID=Match3;Name=agt221.5 ctgA est EST_match 5000 5500 . + . ID=Match3;Name=agt221.5 ctgA est EST_match 7000 7300 . + . ID=Match3;Name=agt221.5 ...
What's going on here is the same as the alignments shown in volvox_matches.gff3. There are two EST reads named agt830.5 (the 5' read) and agt830.3 (the 3' read). Each of them matches the ctgA genome in two discontinuous regions because, presumably, they cross a splice site. As in the earlier example, we represent each EST as a single "EST_match" feature that spans several lines. The lines are linked together by sharing the same ID attribute.
There are two other things to notice. One is that the source field (column 2) is "est" and the type (column 3) is "EST_match." Either of these fields can be used to distinguish the EST matches in this file from the generic "match" matches used in the earlier example. The second item of interest is that the strand field (column 7) is + for the 5' EST and - for the 3' EST, indicating that the 3' EST aligned to the reverse complement of ctgA.
Add this file to the volvox database directory, and add the following to the configuration file:
[EST] feature = EST_match:est height = 6 glyph = segments bgcolor = orange category = Genes key = ESTs
This will give a display similar to that shown in Figure 15.

Figure 15: A simple representation of EST matches.
For reasons described earlier, the feature option reads "EST_match:est" rather than simply "match" in order to distinguish the EST matches from the example matches that we loaded previously.
This display is OK, but it could be better. One problem is that the relationship between the 5' and 3' EST read pairs is not shown. We'd like to place the two members of the pair together on the same line, and connect them with a dotted line to show that they are the two ends of the same cDNA clone. An easy way to do this is to add a "group_pattern" option to the [EST] stanza:
[EST] feature = EST_match:est glyph = segments height = 6 bgcolor = orange group_pattern = /\.[53]$/ category = Genes key = ESTs
The new group_pattern option tells GBrowse to use a Perl regular expression pattern matching operation to find and group related EST matches based on their names. It helps to understand how Perl regular expressions work, but basically the pattern match breaks down this way:
/ begin the pattern match \. match a dot [53] match either the numbers 5 or 3 $ match the end of the string / end the pattern match
What this is saying is to look for pairs of EST names that are similar except for the terminal .5 or .3, and pair them. When we reload the page, we get Figure 16.

Figure 16: The group_pattern option allows EST pairs to be grouped
Here are regular expressions that will work for other common EST pairing schemes:
| 5' EST | 3' EST | group_pattern |
|---|---|---|
| agt123f | agt123r | /[fr]$/ |
| agt123p | agt123q | /[pq]$/ |
| f.agt123 | r.agt123 | /^[fr]\./ |
| 5.agt123 | 3.agt123 | /^[53]\./ |
| agt123.for | agt123.rev | /\.(for|rev)$/ |
Another nice enhancement would be to give the 5' and 3' ESTs different colors so as to distinguish one from another. This can be accomplished using a Perl callback. Open up volvox.conf once more, and find the bgcolor option in the [EST] track. Replace it with this (you may want to cut and paste from here in order to avoid introducing any typos):
bgcolor = sub {
my $feature = shift;
my $name = $feature->display_name;
if ($name =~ /\.5$/) {
return 'red';
} else {
return 'orange';
}
}
You'll need to know the basics of the Perl programming language in order to do this type of thing yourself. Suffice to say that instead of hard-coding the color "orange" into the bgcolor option, we are asking GBrowse to run a Perl subroutine each time it needs to render an EST. The subroutine is passed the feature that is about to be drawn. It asks the feature for its human-readable name (display_name) and assigns that name to a variable named $name. It then performs a pattern match on the name to see if it ends in a "5". If the name matches, the subroutine returns the color "red" to GBrowse. Otherwise it returns the color "orange."
The effect is shown in Figure 17.

Figure 17: Using a callback to distinguish 5' and 3' ESTs
For your convenience, a configuration file with all the stanzas defined up to this point can be found in volvox_quarter.conf.
2.8.1. Adding DNA to Alignments
The last thing we'll do with the EST data set is to add DNA to the ESTs so that at high magnification GBrowse will show the multiple alignment. This information is also used by the "dump alignments" plugin to generate a text-based multiple alignment.
NOTE: Currently only nucleotide to nucleotide alignments can be displayed at the level of individual nucleotides (e.g. BLASTN, BLAT, Exonerate). Protein to nucleotide alignments, such as those produced by Genewise or BLASTX, are not supported at the residue level.
To make this work, we need to add two additional pieces of information to the EST alignment data:
- The DNA sequences of the volvox ESTs.
- The alignment positions in EST coordinates.
In case the need for item (2) isn't immediately clear, consider this blow-up of an alignment:
ctgA 1050 gattgccattgaccttggccattggccaagctgaa 1086
|||||||||| ||||||| ||||||||||||||||
agt830.5 1 gattgccattcaccttgggcattggccaagctgaa 135
What we currently have in the GFF file are the source genomic positions of the alignments (in ctgA-relative coordinates). We need to add the target positions in agt830.5-relative coordinates in order for GBrowse to fetch and display the appropriate segments of the EST DNA.
The fasta file ests.fa provides the DNA sequences for the six EST reads. The GFF load file volvox_est_targets.gff3 contains the revised coordinates. If you look at this file you'll see that it is dissimilar to previous load files:
ctgA est EST_match 1050 1500 . + . ID=Match1;Name=agt830.5;Target=agt830.5 1 451 ctgA est EST_match 3000 3202 . + . ID=Match1;Name=agt830.5;Target=agt830.5 452 654 ctgA est EST_match 5410 5500 . - . ID=Match2;Name=agt830.3;Target=agt830.3 505 595 ctgA est EST_match 7000 7503 . - . ID=Match2;Name=agt830.3;Target=agt830.3 1 504 ctgA est EST_match 1050 1500 . + . ID=Match3;Name=agt221.5;Target=agt221.5 1 451 ctgA est EST_match 5000 5500 . + . ID=Match3;Name=agt221.5;Target=agt221.5 452 952 ctgA est EST_match 7000 7300 . + . ID=Match3;Name=agt221.5;Target=agt221.5 953 1253 ...
The first eight columns are identical to what we've been using before, but the ninth column follows a new convention used for nucleotide to nucleotide and protein to nucleotide alignments. There is now a special attribute, "Target", that tells GBrowse specifies the name of the EST sequence (found in a FASTA file), the start position of the alignment in EST coordinates, and the end position of the alignment in EST coordinates. The combination of a target sequence and its coordinates. For example, the first segment of the first alignment, agt830.5, spans positions 1050 to 1500 in genome coordinates, and positions 1-451 in EST sequence coordinates.
There is a subtlety here. Notice that for minus strand ESTs, the target coordinates are not reversed; the start position is always less than the end position. For example, for the first agt830.3 HSP, we are told that genomic region 5410..5500 aligns to EST region 505..596. The strand field is used to determine the direction of the alignment.
Since this data file contains a revised version of volvox7.gff, remove volvox_est.gff3 from the database directory and replace it with volvox_est_targets.gff3. Also copy ests.fa into the database directory. If you perform a directory listing, it should look like this:
directory.index volvox_remarks.gff3 volvox_domains.gff3 volvox_genes_simple.gff3 volvox_bacs.gff3 volvox_est_targets.gff3 ests.fa volvox_matches.gff3 volvox_genes.gff3 volvox_array.gff3 volvox.fa
NOTE: If you see doubled EST features after this point, make sure that you have removed volvox7.gff. Another thing to watch out for is that some sort of bug in the BioPerl layer (up through at least version 1.4) causes the EST DNA display to get messed up at this point on Windows systems. To fix the latter problem, go to the volvox database directory and remove the files directory.dir and directory.pag. These are automatically-generated DNA file indexes that GBrowse develops, and will be regenerated for you the next time you access a page.
We're not done with making configuration file changes, but volvox_halfway.conf contains all configuration file enhancements up to this point. If you like, you can copy it over the live volvox.conf. It contains the following version of the [EST] track:
[EST]
feature = EST_match:est
glyph = segments
height = 6
draw_target = 1
show_mismatch = 1
canonical_strand = 1
label_position = left
bgcolor = sub {
my $feature = shift;
my $name = $feature->display_name;
if ($name =~ /\.5$/) {
return 'red';
} else {
return 'orange';
}
}
group_pattern = /\.[53]$/
key = ESTs
The key addition to this track configuration is the "draw_target", "show_mismatch" and "canonical_strand" options. All options are true/false flags, where 0 means false and 1 means true. draw_target tells the segments glyph to draw the DNA sequence of the target ESTs when the magnification allows. show_mismatch instructs the glyph to highlight mismatches between the genome and the EST in pink. canonical_strand instructs the glyph to display the plus strand
sequence even when the EST matches the minus strand. We've also added a "label_position" option that tells GBrowse to draw each EST's label to its left. This allows the multiple alignments to pile up nicely.
To see this work, reload the page, turn on the EST track and search for region "ctgA:1065..1165". This will show the aligned 5' ends of agt221.5, agt830.5 and agt767.5 (Figure 18). Notice that one of the T's towards the beginning of agt830.5 is highlighted to show that it doesn't match the corresponding genomic base.

Figure 18: Multiple alignments at the DNA level
If you don't see the EST sequence appearing, make sure that ests.fa is in the volvox database directory and is world readable. If it still isn't working, you may need to "touch" the file in order to update its modification date. This tells GBrowse that it is new and needs to be reindexed. In UNIX:
% touch /var/www/gbrowse2/databases/volvox/ests.fa
If you are still having problems, remove the directory.index file completely in order to force reindexing.
2.9. Trace Data
If you have sequence trace information (in SCF format) associated with the reference sequence, this can be displayed in gbrowse using the trace glyph. To use this glyph, you must have installed:
- The Staden io-lib package (staden.sourceforge.net)
- zlib (www.zlib.net)
- The Bio::SCF perl module (Available from CPAN)
- zlib (www.zlib.net)
Note that at this time, it is not possible to use the trace glyph with Windows servers, since we do not know of a version of the Staden io-lib package that has been compiled for Windows.
The data file volvox_trace.gff3 contains an example trace entry.
ctgA example read 44401 45925 . + . Name=trace;trace=volvox_trace.scf
This aligns the full trace sequence to the reference sequence. The trace file in this case is named volvox_trace.scf, and it is located in /var/www/gbrowse2/tutorial/data_files/volvox_trace.scf.
Due to sequence quality, the first few bases of a trace file usually don't align. Even so, these bases need to be included in the gff file. For instance, if the bases 10-700 of the trace file aligns to the bases 100-800 of the reference sequence, the feature would be 90-800 to account for the first 10 bases (starting at base 0).
NOTE: The trace glyph currently doesn't deal with insertions or deletions. If an indel occurs, the alignment after the indel will be off.
Copy this file into the volvox database directory. Then, to display the trace, copy the following into volvox.conf:
[Traces] feature = read glyph = trace fgcolor = black bgcolor = orange strand_arrow = 1 height = 6 description = 1 a_color = green c_color = blue g_color = black t_color = red show_border = 1 trace_height = 80 trace_prefix = http://localhost/gbrowse2/tutorial/data_files/ key = Traces
The fgcolor, bgcolor, strand_arrow and height control the bar that shows the location and directionality of the trace.
The trace_prefix option is important because it gives the path to the trace files. This is prepended to the trace file name defined in the gff file. It can be a direct path to the directory (e.g., /usr/local/trace_files/) or a web address (as above).
The a/c/g/t_color options allow configuration of the base colors. The trace_height refers to the height of the trace itself. Play around with it to find a height that you like.
If show_border is set to 1, a black box will be drawn around the trace.
After configuring the trace glyph, reload the browser page and enable traces. Zoomed out you will see:

Figure 19: The trace glyph zoomed out.
Zooming in will show you the trace diagram:

Figure 19: The trace glyph zoomed in.
2.10. Next Generation Sequencing Data
GBrowse allows you to load up next generation sequencing data such as produced by the Illumina and SOLiD platforms. See the SAM or BAM format (see the GBrowse NGS Tutorial for instructions on how to do this).
3. GBrowse Enhancements
3.3. Semantic Zooming
One of the cooler features of GBrowse is its ability to support semantic zooming. Semantic zooming is a feature in which objects show different levels of detail depending on the level of magnification. We've already seen this behavior in the "dna" and "segments" glyphs, which show the DNA sequence only when there's sufficient room to display it.
GBrowse has several types of semantic zooming:
- glyph-based, automatic: the dna and segments glyphs, and others that support semantic zooming out of the box. This happens automatically and can't be modified.
- semantic labeling: when there's sufficient room, GBrowse will print the label and descriptions next to the glyphs. The threshold at which this happens is under your control.
- semantic bumping: when there's sufficient room, GBrowse will "bump" features to prevent them from colliding on the screen. When this would cause the display to become to high, bumping is suppressed. This threshold is also under your control.
- semantic options: you can set track configuration sections up so that when a preset size threshold is exceeded, one configuration replaces another.
The thresholds for labeling and bumping are set by configuration options named "label density" and "bump density" respectively. The standard values can be found in the defaults track named [TRACK DEFAULTS]. They are originally set so that labels are suppressed when there are more than 25 features per track, and bumping is suppressed when there are more than 100 features per track. You can set these values globally by editing their values in [TRACK DEFAULTS], or you can add "label density" and/or "bump density" options to individual track configuration sections in order to override the settings for specific tracks.
The process of setting up semantic options is a bit more interesting. To illustrate, we will create semantic zooming for the [Alignments] track ("Example Alignments"). We would like the track to shift from showing the individual segments to showing solid rectangles when the user is zoomed out to 30K and beyond, and turn bumping off when the user is zoomed out to 45K and beyond. The process is simple. Beneath the [Alignments] stanza, we add a stanza qualified for zoomlevels of >= 30,000 and another stanza qualified for zoomlevels of >= 45,000:
[Alignments] feature = match glyph = segments key = Example alignments [Alignments:30000] glyph = box label = 0 [Alignments:45000] glyph = box bump = 0 label = 0
The format for semantic options is [Trackname:distance], where Trackname must be the same as the non-qualified track, and distance is the length of the region at which the semantic options will kick in. Only options that are different from the non-qualified track need to be listed. According to the configuration given above, when the user is looking at a region 30,000 bp or longer, the glyph option will change to "box," which is a solid rectangle that doesn't show any internal details. All other options, such as feature and key, will be inherited from the [Alignments] track.
At 45,000 bp, the glyph is again set to box, and in addition the "bump" option is set to zero, turning off collision control. Notice that options are inherited from the unqualified track stanza, and not from the previous semantic zoom level. If we had neglected to specify the glyph option in [Alignments:45000], the glyph would have reverted to "segments."
Make these changes to volvox.conf, turn on the "Example Alignments" track and view the contig at 20K, 40K and 50K. At 40K, you'll see the alignments lose their internal structure and be replaced by solid
boxes (Figure 21). At 50K they'll begin to overlap and the feature labels will be suppressed.

Figure 21: Semantically zoomed alignments at 40K
Conclusion
This is just a short introduction to the many things that you can do with GBrowse.
For more comprehensive information, see either: