Our next round of the Structural Variants in the Cloud Hackathon is planned tentatively for October 10-13, 2021. Check back soon for more details and registration information.
We'll be looking for folks who have experience in working with structural variants, complex disease, precision medicine, and similar genomic analysis. If this describes you, stay tuned to our BCM-HGSC Twitter account for an announcement. This event will be for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for large scale genomic analyses from high-throughput experiments. The event will be open to anyone selected for the hackathon.› Read more about our 2020 Hackathon
› Read more about the 2019 Hackathon in the DNAnexus blog
In October 2020, the “Structural Variant Crying Club” held the previous round of Structural Variants in the Cloud Hackathon teaming up with Pangenomes and COVID research. For the latter we worked together with COV-IRT. The hackathon was held remotely, Oct. 11-14, 2020.
- COVID 19 diversity
- Precision medicine
- Incorporation of population data to a reference graph
- Mapping structural variants to public databases
- Building genome graphs representing population SVs
- Calculating the heritability of different types of structural variants
- CNV effect on isoform expression
- Incorporation of Public Annotation Databases with Graphs
- Assembly accuracy for metagenomics
- Develop a pipeline for submission of non-graph associated read assemblies to a public sequence database.
- Quality assessment in large cohorts
- Develop efficient query mechanisms from graph genomes
- Assessing the benefits of graph genomes in clinical analysis
Working groups of eight to ten individuals were formed into five to eight teams. These teams built pipelines to analyze large datasets within a cloud infrastructure. The projects were unveiled before the hackathon starts, and built off of previous NCBI hackathons and community projects.
After a brief organizational session, teams spent four days addressing a challenging set of scientific problems related to a group of datasets. Participants analyzed and combined datasets in order to work on these problems. Throughout the four days, we came together to discuss progress on each of the topics, bioinformatics best practices, coding styles, etc.
Datasets came from public repositories, with a focus on a number of trios produced by long read sequencing as a base graph and short read datasets in the sequence read archive that have been ported to cloud infrastructure, as well as derivative contigs of the above.
All pipelines and other scripts, software, and programs generated in this hackathon were added to a public GitHub repository designed for that purpose (github.com/NCBI-Hackathons).
Manuscripts describing the design and usage of the software tools constructed by each team may be submitted to an appropriate journal such as the F1000Research hackathons channel, BMC Bioinformatics, GigaScience, Genome Research or PLoS Computational Biology.