Everything is bigger in Texas: Pan-Structural Variation hackathon in the Cloud!

In October 2020, the “Structural Variant Crying Club” held the previous round of Structural Variants in the Cloud Hackathon teaming up with Pangenomes and COVID research. For the latter we worked together with COV-IRT. The hackathon was held remotely, Oct. 11-14, 2020.

  • COVID 19 diversity
  • Precision medicine
  • Incorporation of population data to a reference graph
  • Mapping structural variants to public databases
  • Building genome graphs representing population SVs
  • Calculating the heritability of different types of structural variants
  • CNV effect on isoform expression
  • Incorporation of Public Annotation Databases with Graphs
  • Assembly accuracy for metagenomics
  • Develop a pipeline for submission of non-graph associated read assemblies to a public sequence database.
  • Quality assessment in large cohorts
  • Develop efficient query mechanisms from graph genomes
  • Assessing the benefits of graph genomes in clinical analysis

Topics

Working groups of eight to ten individuals were formed into five to eight teams. These teams built pipelines to analyze large datasets within a cloud infrastructure. The projects were unveiled before the hackathon starts, and built off of previous NCBI hackathons and community projects.

Organization

After a brief organizational session, teams spent four days addressing a challenging set of scientific problems related to a group of datasets.  Participants analyzed and combined datasets in order to work on these problems. Throughout the four days, we came together to discuss progress on each of the topics, bioinformatics best practices, coding styles, etc.

Datasets

Datasets came from public repositories, with a focus on a number of trios produced by long read sequencing as a base graph and short read datasets in the sequence read archive that have been ported to cloud infrastructure, as well as derivative contigs of the above.

Products

All pipelines and other scripts, software, and programs generated in this hackathon were added to a public GitHub repository designed for that purpose (github.com/NCBI-Hackathons).

Manuscripts describing the design and usage of the software tools constructed by each team may be submitted to an appropriate journal such as the F1000Research hackathons channel, BMC Bioinformatics, GigaScience, Genome Research or PLoS Computational Biology.