Nearly 30 million people in the United States are affected by rare diseases, although many families lack a molecular diagnosis. Since its inception in 2021, the GREGoR Consortium (Genomics Research to Elucidate the Genetics of Rare diseases) continues to optimize benchside methods, utilize diverse sequencing approaches, and develop novel analytical tools toward the aim of ‘solving the unsolved.' As of October 2023, BCM-GREGoR has enrolled more than 1,400 individuals (over 340 families), and the Human Genome Sequencing Center has generated sequence data for 1,065 study participants and performed reanalysis of extant data for over 200 samples utilizing a stepwise approach to apply exomes, short and long read genome sequencing technologies, and RNA sequencing. Additionally, we are investigating the diagnostic utility of short and long read RNA sequencing in eight families. For a cost-effective long read approach, we developed a custom panel and a sequencing protocol with TWIST and Pacific Biosciences that covers 389 genes inaccessible with short reads alone and yields, on average, 94% of targeted bases at 8x coverage or greater. Application of this panel to several families is in progress.
Novel analytical approaches leverage this rare disease dataset, and we are currently investigating the role of reference/pangenomes in identifying molecular diagnoses. One study utilizes a modified GRCh38 version that accounts for reference errors involving 33 protein coding genes, of which twelve are medically relevant, and assess the difference between the diagnostic yield of short read data to those data from long reads with this improved reference. Analyses of these approaches are challenging but are aided by tools such as VizCNV, an in-house developed analytical tool for detection, visualization, and interpretation of copy number variations and other products of structural variation mutagenesis.
Major phenotypes studied include neurodevelopmental disorders, structural brain abnormalities, and intellectual disability. Thirty-five percent of this dataset includes a unique cohort from the Middle East and Northern Africa with a molecular diagnostic rate of at least 41%, including both novel and known genes. Cumulative analysis of the BCM-GREGoR dataset has yielded 22 manuscripts detailing discoveries in 230 genes of which 38% are novel genes. Data and methods are rapidly disseminated through AnVIL, GeneMatcher, VariantMatcher, ClinVar, and GitHub. Each discovery provides insight into the genomics of rare diseases and insights into human disease biology – bringing us a step closer to equitable personalized medicine.