A Near-Complete Genome Assembly of Cinchona calisaya – Scientific Data
The Rubiaceae family, a globally pervasive group of plants, holds immense economic and medicinal significance. It spans the vital crop of coffee to Cinchona calisaya Wedd., renowned for its rich alkaloid content and its pivotal role in malaria treatment. Despite advancements, the comprehensive phylogenetic relationships within Rubiaceae remain largely elusive. In this study, we unveil a nearly complete diploid genome assembly of C. calisaya, characterized by an 869.93 Mb genome size and an impressive contig N50 length of 44.34 Mb. Remarkably, 99.75% of sequences have been successfully anchored to 17 chromosomes, with only 12 gaps. The BUSCO assessment indicates a presence of 97.40% complete core genes in the assembly. We’ve identified a total of 42,741 protein-coding genes, with 89.00% functionally annotated. Such continuity and integrity in the C. calisaya genome lay a solid foundation for functional genomics research, varietal improvement, and conservation efforts in medicinal plants.
Rubiaceae, a plant family with a wide global distribution mainly focuses on tropical and subtropical regions. It wields notable economic and medicinal value in human society, comprising well-known species like coffee and those with significant medicinal properties, such as Knoxia roxburghii. The phylogenetic intricacies of Rubiaceae, particularly Cinchonae, have been scrutinized using internal transcribed spacer of the ribosomal DNA gene (ITS) and plastid datasets (matK, rbcL, rps16, trnL-F). However, these analyses have only provided a murky outlook on the phylogenetic relationships of Rubiaceae (including genus Cinchona). Genome-wide phylogenetic trees of the Gentianales position C. pubescens alongside Isertia hypoleuca Benth., yet these samples neglect other genera, hindering biodiversity conservation, crop enhancement, and Rubiaceae breeding efforts.
Cinchona calisaya, or the quinine tree, is a member of the Rubiaceae family. Native to the cool, humid rainforests of South America, this tree typically reaches heights of 3-6 meters. Valued for its medicinal properties, Cinchona is predominantly recognized for quinine, derived from its bark, with other alkaloids such as quinidine, cinchonine, and cinchonidine present. Historically, as highlighted in Xue-Min Zhao’s Qing Dynasty recordings, cinchona treatment extended to malaria and immune system disorders. Despite its storied importance, the available genomic resources for Cinchona remain sparse, with only one draft nuclear and chloroplast genome reported for the genus. Although the nuclear genome of C. pubescens was unveiled in 2022 with a 904 Mb genome size, it contains numerous gaps, underscoring the urgent need for a high-quality Cinchona reference genome to further phylogenetic and genetic medicinal studies within Rubiaceae.
In our research, we assembled a near-complete C. calisaya genome size of 869.93 Mb and undertook phylogenetic analyses to decode Rubiaceae’s evolutionary connections. Our findings suggest that a distinct whole-genome duplication event ties the Cinchonoideae subfamily, allowing us to map out karyotype evolution and major chromosomal rearrangements within Rubiaceae. Specifically, a local tandem duplication gene cluster of tropinone reductase on chromosome 11 in C. calisaya was highlighted. This genomic reference provides a window into Rubiaceae evolution, underpinning future genetic studies and medicinal pursuits.
The assembly process entailed sequencing 51.53 Gb (59X) of PacBio HiFi reads, 124.86 Gb (143X) of Hi-C reads, and 166.38 Gb (191X) of Illumina reads. Estimated genome size rounded to approximately 822.51 Mb, resulting in 85 contigs upon PacBio HiFi assembly, and subsequently anchored across 17 pseudochromosomes with 99.75% coverage via Hi-C reads. The resultant assembly size reached 869.93 Mb, with nine chromosomes void of gaps, and minimal gap presence in the remainder. We have identified all centromeres within the C. calisaya genome, with lengths ranging from 4.60 Mb to 1.00 Mb. Additionally, the sequence TTTAGGG allowed for telomeric identification across 15 chromosomal termini, although chr13 and chr17 exhibited none.
A robust evaluation of assembly quality was pursued through alignment of various read types to the C. calisaya assembly, recording an impressive 99.68% average mapping rate. BUSCO analysis revealed 98.30% full presence of single-copy orthologs within the assembly. The consensus quality value (QV), judiciously measured from short-sequencing data, was recorded at 47.93. Ultimately, our endeavors culminated in a highly contiguous, nearly complete, reference-level C. calisaya genome. This progress opens unprecedented avenues for understanding Rubiaceae phylogenetics and their medicinal attributes.
The genome annotation process for C. calisaya unearthed 42,741 genes, bearing a mean gene length of 3,738 bp, exon length averaging 248 bp, and introns at 780 bp. Among these insights, 87.8% of protein-coding genes revealed predicted protein domains and functional sites via InterProScan. Additionally, Gene Ontology (GO) terms associated 65.86% of genes, while 30.16% were tied to KEGG pathways. Noteworthy uneven gene distribution skews towards non-centromeric pseudochromosome regions. Further annotation highlighted 575.17 Mb of transposable elements (TEs) within the genome, comprised of 19.46% Copia elements, 17.27% Gypsy elements, alongside LINE-type elements and DNA transposons.
In summary, Supplementary Note alongside this report outlines analytical methods and results related to ‘Phylogenetic analysis and visualization,’ ‘Gene family evolution,’ ‘Whole-Genome Duplication Events,’ ‘Synteny analysis,’ and ‘Karyotype evolution in Rubiaceae species,’ furnishing a comprehensive insight into the groundbreaking genetic mapping of Cinchona calisaya.