Everything is bigger in Texas: Pan-Structural Variation hackathon in the Cloud!

Pan Structural Variation Hackathon in the Cloud - Aug. 28-31


The “Structural Variant Crying Club” is pleased to announce the 6th round of Structural Variants in the Cloud Hackathon!

August 28-31, DNAnexus will help run a virtual bioinformatics hackathon in Houston, Texas hosted by the Baylor College of Medicine. This hybrid event will include opportunities for participation in person or online over Zoom. Potential topics include:

  • Mapping structural variants to public databases
  • Mendelian disease discovery 
  • Identification of somatic and mosaic variants (tumor vs normal, within tissue)
  • Assembly accuracy for metagenomics
  • Analysis of long read RNA and comparison on variant calling
  • Respiratory Virus Variations
  • Structural variants affecting agricultural production
  • Integrating genome graphs with phenotype networks

We're specifically looking for folks who have experience in working with structural variants, complex disease, precision medicine, and similar genomic analysis. If this describes you, please apply! This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for large scale genomic analyses from high-throughput experiments. The event is open to anyone selected for the hackathon (see below).


Working groups of five to six individuals will be formed into five to eight teams. These teams will build pipelines to analyze large datasets within a cloud infrastructure. The projects will be unveiled before the hackathon starts, and will build off of previous NCBI-style hackathons and community projects.


After a brief organizational session, teams will spend four days addressing a challenging set of scientific problems related to a group of datasets. Participants will analyze and combine datasets in order to work on these problems. Throughout the four days, we will come together to discuss progress on each of the topics, bioinformatics best practices, coding styles, etc.

The hackathon ends in a final presentation of all groups on the last day.


Datasets will come from public repositories, with a focus on a number of trios produced by long read sequencing as a base graph and short read datasets in the sequence read archive that have been ported to cloud infrastructure, as well as derivative contigs of the above.


All pipelines and other scripts, software, and programs generated in this hackathon will be added to a public GitHub repository designed for that purpose (github.com/collaborativebioinformatics).

Manuscripts describing the design and usage of the software tools constructed by each team may be submitted to an appropriate journal such as the F1000Research hackathons channel, BMC Bioinformatics, GigaScience, Genome Research or PLoS Computational Biology.

The outcomes of the past Hackathons have been published here:


Initial applications are due Aug. 21, 2024 by 3 p.m. CDT. Participants will be selected based on the experience and motivation they provide on the form.

If you confirm, please make sure it is highly likely you can attend, as confirming and not attending prevents other data scientists from attending this event. Please include a monitored email address, in case there are follow-up questions.

Note: Participants will need to bring their own laptop to this program. A working knowledge of scripting (e.g., Shell, Python, R) is useful but not necessary to be successful in this event. Employment of higher level scripting or programming languages may also be useful. Participants will also have access to cloud computing infrastructure.

Applicants must be willing to commit to all three days of the event.

There will be no registration fee or cost associated with attending this event.

For more information, or with any questions, please contact Ben Busby / Fritz Sedlazeck.

Application Form

Workshop participants will be selected based on registration responses.