The plastid genome (plastome, plastids including the chloroplast and other plastid forms) and mitochondrial genome (mitogenome or chondriome) represent the portions of endosymbiotic organelle inheritance in eukaryotes that have remained in organelles without being transferred to the nucleus or lost.
The DNA sequenes from the organelle genomes have been widely used in phylogenetic and evolutionary analyses, and DNA barcoding. Due to high copy numbers of the organelle genome in a single cell, it is feasible to get enough coverage from the low coverage whole genome sequencing (WGS) data to assemble complete organelle genomes.
With the rapid advances of high throughput sequencing technologies, a tremendous amount of WGS data were produced in low cost, which makes the accurate and high throughput assembly of organelle genomes in great need. Although many toolkits and pipelines for assembling organelle genomes have been developed, their assembly qualities and efficiencies are generally below expectations.
Research teams led by Prof. LI Dezhu and Prof. YI Tingshuang from the Kunming Institute of Botany of the Chinese Academy of Sciences (KIB/CAS) have been engaged in plastid phylogenomics, comparative genomics, and DNA barcoding for years. Jointly, they have established a research system utilizing the plastome data for phylogenetic and evolutionary studies, and achieved fruitful results.
Teams were also dedicated to develop new tool for plastome analyses, including a popular plastome annotation toolkit, PGA.
Aiming at accurate and efficient organelle genome assembly, an international joint team led by Prof. LI and Prof. YI, with collaborators from KIB, the Xishuangbanna Tropical Botanical Garden of CAS, and Pennsylvania State University, has recently developed GetOrganelle, an advanced toolkit for de novo assembling accurate organelle genomes.
It is innovative that GetOrganelle provides a pre-grouping strategy for speeding up target-associated reads recruitment and contig multiplicity estimation algorithm for better repeat resolution. The new algorithm of contig multiplicity estimation incorporates both information of graph characteristics and contig coverage.
To evaluate the accuracy of assemblies using GetOrganelle, in comparison with another popular assembler NOVOPlasty, the GetOrganelle team tested the 156 public datasets from plants, animals, and fungi. For 50 public plant WGS datasets, GetOrganelle generated a high completeness rate of 78% with default settings, significantly better than currently most popular tool NOVOPlasty which generated 16% with fine-tuning but cost slightly less computational resources.
GetOrganelle still significantly outperformed NOVOPlasty at completeness rate even when consuming comparable or less computational resources. Furthermore, NOVOPlasty generated 20%~25% wrong/false complete plastomes in K=23 and K=31 runs.
In the same test, the consistency of GetOrganelle assemblies under different parameters was also better than that of NOVOPlasty assemblies.
Comparisons of four sets of runs using GetOrganelle and four sets of runs using NOVOPlasty when assembling 50 public plant datasets. (Image by KIB)
According to the read mapping evaluation, GetOrganelle plastomes outperformed both NOVOPlasty and published plastomes from the same reads at accuracy.
Many mistakes in the published plastomes were detected during this evaluation. For 56 animal datasets and 50 fungal datasets, GetOrganelle was generally better over NOVOPlasty in obtaining mitogenome contigs and genes.
Noteworthily, in 2020, Freudenthal et al. presented a benchmark comparison of several chloroplast assembly pipelines/toolkits (including chloroExtractor, Fast-Plast, GetOrganelle, IOGA, NOVOPlasty, org.ASM) and found significant differences among those assemblers.
In their tests, GetOrganelle significantly outperformed all other assemblers in accuracy and success rate, and was recommended as the default assembler.
This study entitled "GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes" was published online in the journal Genome Biology on September 10th, 2020.
This study was supported by grants from the Strategic Priority Research Program of CAS, the National Natural Science Foundation of China, the CAS Large-scale Scientific Facilities, the open research project of "Cross-Cooperative Team" of the Germplasm Bank of Wild Species, the Kunming Institute of Botany, Chinese Academy of Sciences, the National Natural Science Foundation of China, and the CAS 135 Program.