Wednesday, October 28, 2015

Arguments against the use of networks?


The usual argument in favour of using phylogenetic networks is the obvious one that they can account for gene flow during phylogenetic history, as well as vertical inheritance. The usual argument against their use, if there is one, is that vertical inheritance is of primary importance, and thus a tree is "adequate" under many circumstances; or the use of a tree is simply an unquestioned assumption (ie. phylogenetics = trees).

However, Walter Salzburger, Greg B. Ewing and Arndt von Haeseler (2011. The performance of phylogenetic algorithms in estimating haplotype genealogies with migration. Molecular Ecology 20: 1952-1963) have presented a different argument. They point out that a collection of trees can contain more information than can a single network that combines them. This occurs when reticulations represent ambiguity rather than gene flow, as they will in a population or haplotype network (see How do we interpret a rooted haplotype network?).


Their argument is this:
We note that out of a set of different haplotype genealogies, no single genealogy offers a better description of the ‘truth’ than any other one does without considering external data such as the underlying DNA sequences (this is the same when dealing with a set of different MP trees with the same score). The question raised is how are we better off with a group of haplotype genealogies vs. a network that may not be tree-like. The existence of many haplotype genealogies is simply another way of representing ambiguity in the data.
However, the important difference between a network and a set of trees is the lack of independence of Fitch length labellings [ie. the Hadamard distance between nodes]. We illustrate this in Fig. 2. We have the same initial tree with the same tip sequences, but the Fitch branch lengths and internal sequences are different. In the top figure we see that haplotype E connects to D, while haplotype A and B form a cherry also connecting to D. But an alternative is that haplotype E connects to C. This has the effect of changing the topology throughout the tree. So by making some choice in one part of the Fitch tree, it can have topological consequences elsewhere in the tree. In the network case, each ambiguity is represented independently of each other.
It is difficult to represent the same information in a [single] graph compared to a set of trees.

Using this argument, the authors focussed entirely on trees in their simulation study comparing phylogenetic methods: "Here, we are considering the case where the true signal is tree-like and that reticulations represent reconstruction ambiguity." They then confirmed the consequent, by demonstrating that under these circumstances network methods produce false-positive reticulations. Tree-based methods cannot produce reticulations, and so there can be no false positives.

Apart from the impracticality of dealing with potentially large numbers of trees, the main downside of a collection of trees is that we cannot easily compare those trees, which we can instantly do when they are represented by a single network (ie. the trees differ where there are reticulations in the network). Salzburger et al. indirectly refer to this when they note that a "problem is the evaluation of the reliability of connections in haplotype genealogies." They suggest mapping the consistency index for each mutation responsible for each connection in each tree, which seems to be a rather cumbersome alternative to the use of network reticulations to represent unreliability. (NB. A consensus tree is the third way to represent a set of trees, and this seems rarely be used for haplotypes.)

Interestingly, the authors' results showed that the Phylip program DNAPARS consistently did better than the program PAUP* at recovering the simulated trees. The main difference between these two programs is that PAUP* does a better job of finding the set of maximum-parsimony (MP) trees. The results therefore suggest that the authors' trees were usually not MP trees, so that PAUP* was simply wasting its time looking harder for them.

No comments:

Post a Comment