This opportunity with the Next-Generation Sequencing Informatics (NGSI) group requires a high-performing data scientist with broad informatics experience to generate, troubleshoot and deliver large-scale production deliverables. The ideal candidate will be fluent in at-scale sequencing and analysis methods and possess exceptional organizational and communication skills. As the HGSC Bioinformatics Core, NGSI manages the production, maintenance and primary analysis of all HGSC genome sequence data from Novaseq, PromethION, and PacBio sequencing platforms. NGSI also contributes to multiple clinical, Mendelian, and large cohort sequencing studies, specifically in the areas of structural variation and at-scale genomic data science. Under the direction of the NGSI leads, a qualified candidate will execute and manage data generation, delivery, QC, and analysis of large data sets, requiring use of local and cloud-based compute resources. These responsibilities include direct interaction with collaborators and communicating relevant results, with opportunities to present work at meetings and conferences.
The HGSC was founded in 1996 under the leadership of Dr. Richard Gibbs and is a world leader in genomics. The fundamental interests of the HGSC are in advancing biology and genetics by improved genome technologies. As one of the three large-scale sequencing centers funded by the National Institutes of Health, the HGSC provides a unique opportunity to work on the cutting-edge of genomic science in a state-of-the-art institution. Today, the HGSC employs ~ 200 staff, and it occupies more than 36,000 square feet on the 14th, 15th, and 16th floors of the Margaret M. and Albert B. Alkek Building. The HGSC is located on the southwest edge of downtown Houston, the fourth largest city in the U.S., in the Texas Medical Center, the world's largest medical complex. The major activity of the HGSC is high-throughput DNA sequence generation and the accompanying analysis. The HGSC is also involved in developing the next generation of DNA sequencing and bioinformatics technologies that will allow greater scientific advances in the future.
- Manage the generation, storage and delivery of large-sample genomic datasets
- Develop, test and deploy at-scale analysis protocols
- Deliver QC’ed data to public repositories and collaborators
- Maintain extensive project-specific documentation and best practices
- Support day-to-day NGSI production pipelines
- Participate in calls and meetings with collaborators
- Identify novel ways to improve data quality and analysis
- Provide excellent customer service to other HGSC groups and outside collaborators through
- Bachelor's degree in Genetics, Biology, Bioinformatics, Biostatistics, Computational Biology, Computer Science, or a related field.
- No experience required.
- Master’s degree in a related field.
- At least 1 year of hands-on experience working on Linux or Unix-based systems from the command line.
- At least 1 year of programming experience with Python (preferred) or Java.
- Familiar with running analyses on a HPC clusters (Moab, PBS, and Torque preferred).
- Familiar with Cloud Computing (AWS, Google).
- Demonstrated ability to manage multiple tasks and overlapping deadlines.
- Excellent written and verbal communication skills.
- NGS pipeline development.
- NGS sequence analysis tools (e.g., BWA, Samtools, bedtools, bamUtils, Picard, GATK,vcftools,bcftools).
- Common genomics data formats (e.g., FASTQ, BAM, VCF, BED).
- Database and big data software (e.g. NoSQL, Hadoop, HBase).
- Statistical and visualization software (e.g. R, SAS).
- Demonstrated experience in software development or testing.
- Structural variation detection methods.
Baylor College of Medicine requires employees to be fully vaccinated -subject to approved exemptions-against vaccine-preventable diseases including, but not limited to, COVID-19 and influenza.
Baylor College of Medicine is an Equal Opportunity/Affirmative Action/Equal Access Employer.