The benefit of a complete reference genome for cancer structural variant analysis.

TitleThe benefit of a complete reference genome for cancer structural variant analysis.
Publication TypeJournal Article
Year of Publication2024
AuthorsPaulin, LF, Fan, J, O'Neill, K, Pleasance, E, Porter, VL, Jones, SJM, Sedlazeck, FJ
Date Published2024 Mar 18

The complexities of cancer genomes are becoming more easily interpreted due to advancements in sequencing technologies and improved bioinformatic analysis. Structural variants (SVs) represent an important subset of somatic events in tumors. While detection of SVs has been markedly improved by the development of long-read sequencing, somatic variant identification and annotation remains challenging. We hypothesized that use of a completed human reference genome (CHM13-T2T) would improve somatic SV calling. Our findings in a tumour/normal matched benchmark sample and two patient samples show that the CHM13-T2T improves SV detection and prioritization accuracy compared to GRCh38, with a notable reduction in false positive calls. We also overcame the lack of annotation resources for CHM13-T2T by lifting over CHM13-T2T-aligned reads to the GRCh38 genome, therefore combining both improved alignment and advanced annotations. In this process, we assessed the current SV benchmark set for COLO829/COLO829BL across four replicates sequenced at different centers with different long-read technologies. We discovered instability of this cell line across these replicates; 346 SVs (1.13%) were only discoverable in a single replicate. We identify 49 somatic SVs, which appear to be stable as they are consistently present across the four replicates. As such, we propose this consensus set as an updated benchmark for somatic SV calling and include both GRCh38 and CHM13-T2T coordinates in our benchmark. The benchmark is available at: 10.5281/zenodo.10819636 Our work demonstrates new approaches to optimize somatic SV prioritization in cancer with potential improvements in other genetic diseases.

Alternate JournalmedRxiv
PubMed ID38562786
PubMed Central IDPMC10984048
Grant ListU01 HG011758 / HG / NHGRI NIH HHS / United States
UG3 NS132105 / NS / NINDS NIH HHS / United States
UM1 DA058229 / DA / NIDA NIH HHS / United States