The Chromosome-Level Genome Assembly of Prunus cerasifera ‘Atropurpurea’ – Scientific Data
In an ambitious study conducted at the Fruit Tree Germplasm Repository of Beijing University of Agriculture, detailed genomic sequencing of the Purpleleaf Plum (Prunus cerasifera ‘Atropurpurea’) has provided invaluable insights. This project leveraged state-of-the-art techniques, including PacBio HiFi sequencing and RNA-seq, to produce a comprehensive chromosome-level genome assembly that highlights the complexity and unique characteristics of this plant species.
PacBio HiFi Sequencing
Fresh young leaves from a 3-year-old Purpleleaf Plum were meticulously collected and utilized for genomic analysis. The extraction of genomic DNA (gDNA) involved the cetyltrimethylammonium bromide (CTAB) method, ensuring high purity and concentration, which was well-suited for creating sequencing libraries. Significant care was taken in shearing the gDNA into appropriately sized fragments (~15 kb) using a Megaruptor from Diagenode, followed by purification to eliminate smaller fragments.
Size selection and quality enhancement of these DNA fragments were conducted using the BluePippin system, resulting in high-quality sequencing data amounting to approximately 16.22 Gb. This meticulous process paved the way for robust downstream genomic analyses, highlighting the level of precision needed for genome assembly.
RNA-Seq and Transcriptome Analysis
For comprehensive gene expression studies, RNA-seq was performed on leaf samples displaying three distinct phenotypes: red, green, and purple leaves. A total of 58.36 Gb of paired-end reads were generated on the Illumina platform. This data set facilitated the exploration of gene expression patterns across different leaf types, contributing significantly to understanding the plant’s phenotypic diversity.
Genome Assembly and Annotation
The genome’s complexity was scrutinized using a k-mer-based approach, revealing a genome size of approximately 242 Mb, with significant amounts of repetitive sequences. High-quality HiFi reads allowed for the assembly of primary contigs, and sophisticated software pipelines were employed to ensure the elimination of duplication and contamination from plastid sequences. The result was a meticulously curated set of 43 contigs, spanning a total length of 252.20 Mb.
Further analysis identified repeats and telomeric regions, key indicators of chromosome completeness. Scaffold construction using reference genomes from P. salicina ‘Sanyueli’ and P. armeniaca helped in anchoring contigs to chromosomes and confirmed the existence of complete telomere-to-telomere (T2T) chromosomes.
Repetitive Sequences and Gene Prediction
Repetitive elements in the genome were annotated using a de novo approach, employing tools like RepeatModeler and EDTA. This process categorized sequences into non-redundant libraries, leading to the discovery that long terminal repeats (LTRs) and DNA transposons constituted substantial portions of the genome. Based on this analysis, protein-coding genes were predicted through a combination of methods that included transcript-based, homology-based, and ab initio predictions.
The final gene annotation identified a total of 28,231 genes. Comprehensive functional annotation, using various databases such as NR, Swiss-Prot, and KEGG, provided insights into biochemical pathways and gene regulation networks. This extensive annotation strategy underscored the Purpleleaf Plum’s complex genomic architecture, revealing its wide array of functional genes and regulatory protein-coding sequences.
Synteny Analysis
In-depth synteny analysis using jcvi was performed to compare the Purpleleaf Plum’s genome with that of its reference relatives, P. armeniaca and P. salicina ‘Sanyueli’. This study uncovered significant structural conservation across these genomes, identifying numerous syntenic blocks and homologous gene pairs. These findings illuminate the evolutionary relationships and genetic consistency maintained across these species.
This research not only enhances our understanding of Prunus cerasifera ‘Atropurpurea’ but also provides comprehensive genomic resources that may benefit future breeding, conservation, and functional studies in prunus species.