|
|
An assembly and alignment-free method is robust for phylogeny reconstruction |
|
Text Size: A A A |
|
Understanding the phylogenetic relationships among organisms is an essential aspect for many ecological, biogeographical, and evolutionary questions. Multiple-sequence alignment is a central issue in phylogenetic reconstruction. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging. Dr. FAN Huan of Xishuangbanna Tropical Botanical Garden (XTBG), together with her teachers, proposed a new method that directly reconstructed a phylogeny from whole-genome short read sequence (SRS) data. By removing the need for assembly of sequencing reads, they extended alignment-free methods to Assembly and Alignment-Free (AAF) method. Furthermore, they developed, explained, and validated the AAF method using a combination of sequence evolution models, mathematical calculations and simulated SRS data from published genomes for 11 primates. The AAF approach first calculated pairwise genetic distances between each sample using the number of evolutionary changes between their genomes, which are represented by the number of k-mers that differ between genomes. The phylogenetic relationships among the genomes were then reconstructed from the pairwise distance matrix. Using simulated SRS read data (with sequencing error and incomplete coverage) from published and fully assembled genomes, the AAF method obtained the same phylogeny for 11 primate species as those previously published using traditional methods, even though AAF did not use any information about assembly or alignment. Furthermore, the AAF method was very efficient, requiring only a few days on a standard work station. The AAF method proved to be an accurate and efficient way of estimating the phylogenetic relationships using raw sequence data from whole genomes. The researchers developed the theoretical basis for optimizing k-mer length selection, filtering, correcting tip branch lengths, and bootstrapping, directly addressing the problems of homoplasy, sequencing error, and incomplete coverage. Thus, AAF provided a robust tool for phylogeny reconstruction especially when only low-coverage and heterogeneous genome data are available – data that would challenge traditional assembly- and alignment-based methods. Key Wordsk-mers; Phylogenomics; Homoplasy; Alignment-free; Assembly-freeContactFAN HuanKey Laboratory of Tropical Forest Ecology, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla, Yunnan 666303, China E-mail: hfan22@wisc.edu |
|