Chromosome-level genome assembly of cultivated strawberry ‘Seolhyang’ (Fragaria × ananassa) – Scientific Data
The cultivated strawberry, Fragaria × ananassa, is part of the Rosaceae family and is recognized as an allo-octoploid species due to its complex genetic structure, which consists of eight sets of chromosomes (2n = 8× = 56). This genetic intricacy, compounded by its highly heterozygous nature, makes it a challenging subject for genetic research and breeding endeavors. Strawberries hold immense significance as a global crop, exemplified by a reported worldwide production of 9.57 million tons in 2022, according to the United Nations Food and Agricultural Organization (UN-FAO). South Korea plays its part with an annual output of 158,807 tons cultivated across 5,745 hectares, contributing around USD 932 million to the country’s agricultural economy.
In South Korea, the ‘Seolhyang’ variety, derived from a hybrid of ‘Akihime’ and ‘Red Pearl’, dominates the strawberry industry. As of 2022, ‘Seolhyang’ occupies 82.1% of the strawberry farming landscape. This dominance is attributed to its advantageous farming characteristics: ease of cultivation, large berry size, substantial yields, and resistance to widespread diseases such as angular leaf spot, anthracnose, and powdery mildew. ‘Seolhyang’ is particularly noted for its high concentration of volatile organic compounds (VOCs), which confer its unique aroma and flavor profile, marking it as an elite cultivar in various breeding programs.
Nonetheless, advances in precision breeding for this cultivar have been sluggish due to a lack of comprehensive genomic studies. Reference genomes are instrumental in agricultural research, illuminating the genetic substratum of phenotypic characteristics and the evolutionary implications of artificial selection. They enhance understanding of plant-environment interactions, crucial for tackling challenges posed by pests and pathogens.
Recent innovations in genome assembly, facilitated by third-generation sequencing technologies, have revolutionized the accuracy and completion of plant genome references. While high-throughput sequencing like next-generation sequencing (NGS) provides extensive data, it struggles with the cohesion of shorter read sequences in contigs and scaffolds. This challenge is effectively met by long-read sequencing technologies, including PacBio, BioNano, and Nanopore. PacBio’s High-Fidelity (HiFi) sequencing, known for its long average read span (10-25 kb) and low error rate (below 0.5%), is particularly advantageous in generating superior quality genome assemblies.
In this study, the ‘Seolhyang’ genome was assembled using around 100 Gb of HiFi data from the PacBio Revio platform. Unlike earlier attempts with octoploid strawberry genomes that required supplementary sequencing data, this research achieved a notably high-quality reference genome, comparable to those of ‘Royal Royce’ and ‘Florida Brilliance.’ We accomplished a complete telomere-to-telomere genome assembly, comprising a 797 Mb genome with a contig N50 of 27.04 Mb. Our assembly’s integrity was underscored by BUSCO analysis, which detected 99.1% of conserved genes.
The assembly’s robustness is further highlighted by its long terminal repeat assembly index (LAI) of 17.28, indicating outstanding genome continuity as assessed by the Extensive de novo TE Annotator (EDTA) together with LTR retriever. Furthermore, we identified 50 out of the possible 56 telomeres across 28 chromosomes. For annotation purposes, ‘Seolhyang’ genomic data utilized RNA-Seq information from varied F. × ananassa tissues archived by the NCBI, resulting in a compendium of 129,184 genes.
The study’s contributions extend beyond genome assembly, offering insights into disease resistance mechanisms inherent to the ‘Seolhyang’ cultivar. Known for its resistance to powdery mildew, a prevalent issue in controlled cultivation settings such as greenhouses, the study focused on deciphering the genetic basis of this resistance. This involved examining the MLO (Mildew Locus O) gene family, which is implicated in powdery mildew defense. ‘Seolhyang’ harbors 55 MLO genes, which were systematically compared against 20 known MLO genes in diploid strawberries and 69 in the octoploid variety ‘Camarosa.’ Understanding these genetic configurations can guide initiatives aimed at bolstering disease resilience through targeted breeding programs.
In conclusion, the comprehensive genome assembly of the ‘Seolhyang’ cultivar not only provides a crucial genetic resource for resolving agricultural and breeding challenges but also underscores the transformative potential of advanced sequencing technologies in agricultural genomics. This assembly offers valuable insights into the genome’s complexity, facilitating future research into genes associated with disease resistance and other desirable agricultural traits, thereby supporting the development of improved cultivars.