table of contents
- expected learning outcomes
- getting started
- exercise 1: species differentiation in dolphins
- exercise 2: species differentiation in dolphins, integrating across loci
- exercise 3: population structure of parasitic lice
expected learning outcomes
The objective of this activity is to promote understanding of the genealogical sorting index, gsi, and its application to problems of lineage divergence (Cummings, Neel and Shaw 2008). The gsi is a novel and objective way to quantify genealogical structure by assessing the amount of exclusive ancestry of a group on a rooted tree and determining the probability of observing that amount of exclusive ancestry at random in a single lineage independent of any specific topology or distribution of coalescent times. The gsi thereby transcends the categorical view of monophyly and nonmonophyly characteristic of phylogenetic systematics to enable novel insight into the evolutionary process. Likewise, the gsi captures historical information about diverging populations from its quantification of exclusive relationship, independent of a reliance on estimates of coalescent times characteristic of historical population genetics. Although most often described in the context of evolutionary divergence, the gsi statistic is more broadly applicable to quantifying and assessing the significance of clustering of observations in labeled groups on any tree.
getting started
This learning activity can be done by a person working alone, or preferably, two people working together. There are opportunities to divide the work among partners, and subsequently share and compare results from analyses that were independently obtained. The ultimate analyses for exercises in this activity will be done using the web interface at genealogicalsorting.org.
This activity will also provide an opportunity to apply your knowledge of the theory and practice of phylogenetics to generate gene genealogies for subsequent analysis. Note that the trees have to be rooted, and should only include operational taxonomic units that under the null hypothesis might be consider equivalent. Therefore, although gene genealogies/trees maybe rooted with an outgroup, the outgroup should be removed before determining the probability of the observed gsi values for the groups of interest.
exercise 1: species differentiation in dolphins
This example comes from a genetic study of the demography of speciation in two very closely related (sister taxa) allopatric dolphin species with anti-tropical distribution, Pacific white-sided dolphin (Lagenorhynchus obliquidens, LOB) in the North Pacific, and dusky dolphin (L. obscurus, LOS) in the South Pacific (Fig. 1). Atlantic white-sided dolphin (L. acutus, LAC) is used as the outgroup (Hare et al. 2002). The data are sequences of four loci, nuclear protein coding gene introns, sampled from multiple individuals of each species.

Figure 1. Pacific white-sided dolphin, Lagenorhynchus obliquidens (left), and dusky dolphin, L. obscurus (right).
- Retrieve and save the data files for the four nuclear protein coding gene introns: ACT, actin; BTM, Butyrophilin; CAMK, calcium calmodulin-dependent kinase; and HEXB, lysosomal beta-hexosaminidase.
- Using whatever program, model, and analysis options you wish, generate a tree for each gene, saving the tree file.
-
You will need to eliminate the outgroup, which were used to root the tree, before subsequent analysis. Furthermore, the gsi web interface currently only accepts Newick tree files as input, so if the program you used generated a NEXUS tree file, you must convert it to Newick format. Below are some relevant commands for PAUP*, continuing from after execution of a data file and tree file.
command line
use outgroup to root tree -
outgroup name name;
roottrees rootmethod=outgroup outroot=monophyl;prune outgroup taxa from tree -
delete taxonname taxonname / prune=yes;save tree as a newick formatted file -
savetrees file=treenameornumber format=phylip;graphical user interface
use outgroup to root tree -
Trees > Root Trees
Rooting Options...
Make outrgroup a monophyletic sister group to ingroup
Define Outgroup...
choose outgroup taxa from list on left and add to outgroup
OK
Rootprune outgroup taxa from tree -
Data > Delete/Restore Taxa...
choose outgroup taxa from list on left and add to delete
OK
Prune deleted taxa from trees
Proceed With Deletionsave tree as a newick formatted file -
Trees > Save Trees to File...
Format: PHYLIP 3.x
Save - Retrieve and save the group assignment file, which maps each operational taxonomic unit in the trees to one of the two species, L. obliquidens and L. obscurus (denoted by the specific epithet).
- Upload a tree file and the assignment file.
- Examine the tree for each locus, which should be presented with group labels. This will require you to reload the assignment file.
- Based on examination of the tree before further analysis, make note of your impressions regarding the divergence between L. obliquidens and L. obscurus.
- Make note of your impressions regarding the relative degree of exclusive ancestry for each species.
- Visually compare trees from the different loci and make note of the relative differentiation between L. obliquidens and L. obscurus.
- You may recall that different programs and program options affect the way polytomies are handled, and as a consequence your previous decisions in the phylogenetic analysis steps may affect your results. Do any of your trees have polytomies, and if so how might these affect the gsi values for the groups involved?
- Choose the analysis parameters and launch an analysis.
- Retrieve your results when the analysis is completed and examine the gsi values.
- Are the analytical results consistent with your initial impressions from examining the trees?
- In those cases where your initial impressions and the gsi values seem in conflict, reexamine the tree and the gsi values, and reevaluate your impressions.
exercise 2: species differentiation in dolphins, integrating across loci
The dolphin data set also provides an opportunity to learn about how to use information from multiple loci to quantify lineage divergence. You may recall that the variance in gene genealogies can be quite high, as demonstrated both by coalescent theory and empirical observations. In much the same way that data from multiple unlinked loci provide a more precise estimates of θ (theta) in population genetics, integrating over multiple independent gene genealogies can provide more precise estimates of lineage divergence. The statistic for an ensemble of trees or gene genealogies is gsiT (Cummings, Neel and Shaw 2008).
- Combine the Newick tree files from the individual loci into a single multi-tree file, making note of the order of the trees. Note that there are several ways that this file might be created (e.g., using PAUP*, UNIX command line, text editor, or a combination of these).
- Upload the multi-tree file and the assignment file again.
- Choose the analysis parameters and launch an analysis.
- Retrieve your results when the analysis is completed and examine the gsi values.
- How does the value of gsiT, the ensemble statistic generated by integrating across gene genealogies, compare to the value of gsi for the individual trees?
- How does the p-value associated with gsiT compare to those associated with the individual loci?
exercise 3: population structure of parasitic lice
This example comes from a genetic study of a parasitic louse species, Polyplax serrata (Fig. 2), and some of their mice hosts in Europe (Stefka and Hypsa 2007). The mice hosts here are striped field mouse, Apodemus agrarius, yellow-necked mouse, A. flavicollis, and wood mouse, A. sylvaticus (Fig. 3). The data are sequences of the mitochondrial gene for cytochrome c oxidase subunit I (COI) sampled from 94 individuals of P. serrata, which were sampled from the three host Apodemus spp. from several areas of Europe. The data for this learning activity comprise a subset of the data from the original study.

Figure 2. Polyplax serrata.

Figure 3. Striped field mouse, Apodemus agrarius (left), yellow-necked mouse, A. flavicollis (center), and wood mouse, A. sylvaticus (right).
- The phylogenetic analysis has already been completed, and the tree file is available.
- Retrieve and save the group assignment files, which maps each operational taxonomic unit in the tree. For this problem there are two assignment files: one based on host species (denoted by the specific epithet), and one based on geographical location as broad areas within Europe (central, central-eastern, eastern, and southwestern, western).
- Upload the tree file and an assignment file.
- Examine the tree, which should be presented with group labels.
- Make note of your impressions regarding the relative degree of exclusive ancestry for each group based on the assignment.
- Visually compare the tree with the different group assignments and make note of the relative differentiation based on host species and geography.
- Choose the analysis parameters and launch an analysis.
- Retrieve your results when the analysis is completed and examine the gsi values.
- Are the analytical results consistent with your initial impressions from examining the trees?
- In those cases where your initial impressions and the gsi values seem in conflict, reexamine the tree and the gsi values, and reevaluate your impressions.
- Does exclusive ancestry seem higher when sequences from P. serrata are grouped based on host species, or on geographical region?
- What host-parasite evolutionary scenarios are more consistent with the results, and which are less consistent with the results?