Introduction
We describe new methods to partition phylogenetic data sets of discrete characters based on pairwise compatibility. The partitioning methods make no assumptions regarding the phylogeny, model of evolution, or characteristics of the data. The methods first require building a compatibility graph, in which each node represents a character in the data set. Edges in the compatibility graph may represent strict compatibility of characters or they may be weighted based on a similarity scoring procedure that measures how close the characters are to being compatible. Given the desired number of partitions, the partitioning methods then seek to cluster together the characters with the highest average pairwise compatibility, so that characters in each partition are more compatible with each other than they are with characters in the other partition(s).
Methods
Our partitioning methods use compatibility to cluster phylogenetic characters. A set of characters is said to be compatible if it admits a perfect phylogeny, or in other words, if can evolve on a single topology without homoplasy. If is compatible, all characters in must be pairwise compatible; however, if there are more than two character states and/or missing data, sets of pairwise compatible characters are not necessarily compatible. Our goal is to partition a set of discrete characters into two or more clusters so that characters in the same cluster are more compatible with each other than they are to characters in different clusters. We use strict compatibility, as well as a fractional compatibility scheme that measures the degree of compatibility between pairs of characters. The latter is useful in situations where strict pairwise compatibility is unlikely, such as when characters originate from large sets of taxa or the characters have fast rates of evolution. We cluster only parsimony-informative characters, since characters that are not parsimony informative are compatible with all other characters and will fit equally well into any cluster. The implementation of our partitioning methods removes the uninformative sites prior to partitioning and places them in a separate cluster.