Wednesday, July 1, 2015

Networks of admixture or introgression

There are several processes that create reticulate phylogenetic topologies, including hybridization, introgression (or admixture) and horizontal gene transfer (HGT). Biologically, introgression operates via the same mechanism as does hybridization (ie. during sexual reproduction), but it results in only a small amount of genetic material entering the recipient genome, making an admixed genome that is similar to the end result of HGT.

Constructing phylogenetic networks in situations where introgression or HGT have occurred has been somewhat different in practice to that used for hybridization. Hybridization has usually been tackled by merging incongruent tree topologies, based on the idea that the different topologies represent the phylogenetic history of the different genomes of the hybrid taxon. Introgression and HGT have usually been tackled by adding reticulation edges to a phylogenetic tree, on the basis that the tree represents the phylogenetic history of the main part of the genome.

So, the study of introgression (and HGT) involves (a) constructing a phylogenetic tree from some genomic sample, and (b) detecting the introgressed (or HGT) parts of the genome. This is potentially a problematic procedure, because how do we construct a phylogenetic tree from data that already contain non-tree components? Apparently, the expectation is that a single tree will be supported by the majority of the data, and the remainder will represent the introgressed (or HGT) pathways(s), plus whatever other components have created the observed genomic variability (such as incomplete lineage sorting, gene duplication-loss, and stochastic mutations).

Recently, there have been quite a few studies published that have adopted a specific protocol for this procedure, usually under the rubric of admixture. Most of these have involved the study of ancient human DNA, but there have also been studies of contemporary humans, as well as ancient non-humans, An example of the latter is shown in the next two figures, which represent parts (a) and (b), respectively. They are taken from this study of the relatives of horses: Hákon Jónsson, et alia (2014) Speciation with gene flow in equids despite extensive chromosomal plasticity. Proceedings of the National Academy of Sciences of the USA 111: 18655-18660.



The phylogenetic tree (step a) was constructed using "maximum likelihood inference and 20,374 protein-coding genes ... based on a relaxed molecular clock." So, only stochastic mutations were accounted for when constructing the tree, and not incomplete lineage sorting or gene duplication-loss.

The detection of introgression (step b) used "the D statistics approach, which tests for an excess of shared polymorphisms between one of two closely related lineages (E1 or E2) and a third lineage (E3)". The reticulations representing the detected gene flow were then added to the tree manually.

The D-statistic is also known as the ABBA-BABA test (see: Patterson NJ et alia (2012) Ancient admixture in human history. Genetics 192: 1065-1093). It operates as follows for sets of four taxa, applied to character data.

Let the species tree be this, where E1–E3 are the three taxa being compared, and O is the outgroup:


There are three possible allele trees for each binary character (ie. single nucleotide polymorphism) in which states are shared pairwise:


In the first tree, E3 shares the ancestral character state with the outgroup, which is expected to be the most common pattern in the absence of gene flow. E1 and E2 share the ancestral state with the outgroup in the second and third trees, respectively.

The admixture test compares the ABBA tree to the BABA tree. The expectation is that if there has been no introgression then the data support for these two trees should be equal. That is, under the null hypothesis that there is no gene flow between the species (and the underlying species tree is correct), the difference in the expected number of occurrences of the ABBA and BABA patterns should be zero. Deviation from this expectation is statistically evaluated using a jackknife procedure.

When there are more than three ingroup taxa, they are tested in groups of three (plus the outgroup). No correction for multiple hypothesis testing seems ever to be applied.

Note that this test assumes that:
  • the "excess of shared polymorphisms" arises solely from gene flow, rather than from any other tree-like processes such as incomplete lineage sorting or gene duplication-loss
  • there are no other sources of co-ordinated polymorphisms, such as character-state reversals due to adaptation / selection
  • any gene flow that does exist is due to introgression, rather than to hybridization or HGT.
How realistic these assumptions are is not immediately obvious.

Monday, June 29, 2015

Wigwag, and the Family Tree


I have noted before that common usage of expressions like "family tree" often extend far beyond actual pedigrees. This particular expression is often used to describe any sort of historical relationship, not just genealogical ones. It is also sometimes used simply to describe any sort of personal inter-connection. All of these usages occurred in a short-lived magazine from 25 years ago called Wigwag.


Wigwag magazine formally debuted in October 1989 (after a test issue in 1988), and published its last issue in February 1991, for a total of 15 issues. It was a sort of cozy version of the New Yorker magazine. Similarly, it had a number of regular features, such as the Road Trip, the Map, and Letters From Home. The one that is of interest to us was called The Family Tree.

This feature mapped cultural relationships, having been described as "a field guide to the genealogy of influence in American life". It included human relationships, but it also included things like cars (the tree of which is reproduced in the book by Nobuhiro Minaka & Kunihiko Sugiyama. 2012. Phylogeny Mandala: Chain, Tree, and Network) and comic-book superheroes.

I have been unable to locate any decent copies, but four of the "trees" are included below.

As you can see, sometimes The Family Tree was actually a genealogical tree, but just as often it was simply a network of pairwise cultural connections. The latter, of course, usually formed a complex network that did not really map historical relationships.





This last Family Tree is from the original trial issue, and shows the inter-relationships of the writers and producers of American TV sitcoms.

You can read a bit more about the magazine, and its history, here: