Mullai Murugan, M.S.

Director of Software Engineering, Human Genome Sequencing Center, Baylor College of Medicine

Contact information


image: Mullai Murugan, M.S.I hold an M.S. in Management Information Systems and a B.E. in Civil Engineering and have more than 15+ years of experience with spearheading and implementing IT initiatives, leading IT teams and managing software applications and infrastructure. At the HGSC, I partner business and technology, to instrument robust solutions and to optimize business processes; I lead the engineering team that develops and maintains information systems such as the Laboratory Information Management System (LIMS) and related applications that provide the information technology backbone required to manage and support the clinical and research production pipelines at the HGSC. My purview also includes clinical application development at the HGSC Clinical Laboratory (HGSC-CL); designing and engineering clinical IT systems to support scientific discovery and progress, with projects such as HeartCare and Clinical Testing and Reporting System (CTRS). These projects include the orchestration and development of HIPAA compliance ready applications for managing clinical genetic testing at the HGSC-CL, integration with Electronic Health Record Systems and adoption and development of leading industry standards for interoperability such as HL7 FHIR. Previously completed projects include the eMERGE Dashboard and Analysis Portal (eDAP) for the NHGRI eMERGE Phase III program, creation of a HL7 FHIR specification for eMERGE genetic test results, a pilot project for the eMERGE phase III program spotlighting the need of a standard specification for representing genetic test results and lessons learned during this process and a HGSC Hadoop based storage and analysis platform for management and mining of EHR and genomic data (Anton). I work closely with HGSC faculty, data scientists, informatics and production leaders to identify opportunities for technology innovation and growth and to provision the platform and tools for facilitating cutting edge research.


Standardized specification for representing genomic data: The use of genetic testing in healthcare has grown in recent years, attributed to our increased understanding of the human genome and availability of laboratories able to conduct these tests. Typically, the results from genetic testing laboratories are represented as a PDF document, which is simply scanned or uploaded to the EHR. The PDF format, though widely accepted and ubiquitous in its use, is primarily targeted to human readers and is not optimal to support research, clinical decision support or further analytics. Among the biggest limitations is having access to relevant data in a standardized computable format. As part of the eMERGE Network’s Phase III program, along with collaborators from the eMERGE network, I worked on creating a standardized specification for representing genomic test results using HL7 Fast Healthcare Interoperability Resources (FHIR), an emerging interoperable healthcare standard. I led the effort to create the manuscript that details this effort.

Murugan MBabb LJTaylor COverbyRasmussen LVFreimuth RRVenner EYan FYi VGranite SJZouk HAronson SJPower KFedotov ACrosslin DRFasel DJarvik GPHakonarson HBangash HKullo IJConnolly JJNestor JGCaraballo PJWei WQWiley KRehm HLGibbs RAGenomic considerations for FHIR®; eMERGE implementation lessons. J Biomed Inform. 2021 ;118:103795. doi: 10.1016/j.jbi.2021.103795. Epub 2021 Apr 28. PubMed DOI Google Scholar

Tools for data mining and analysis

Anton: Rapid advancements in sequencing throughput have allowed large scale sequencing centers such as the HGSC to process many tens of thousands of samples across dozens of projects. This throughput produces complex heterogeneous data sets of clinical and research data across whole genomes, exomes and panels. The storage and analysis of these data is a complex task requiring scalable and secure infrastructure providing tiered controlled access and high availability. I spearheaded the effort to create Anton, a Hadoop-based analytics platform that allows the integration of multi-omics datasets and EHR data in a HIPAA compatible framework. With EHR data paired with genomic data becoming one of the major components of personalized medicine as well as being increasingly factored in clinical diagnosis and treatment, Anton provides the big data infrastructure, platform and tools for data mining, predictive analytics and machine learning. This platform was also pivotal for the research effort performed at the HGSC that identifies variants suspected to cause Mendelian disorders. For this effort, to accelerate discovery, exome sequencing data from 18,696 individuals referred for suspected Mendelian disease, together with relatives, was loaded in the Anton Hadoop data lake as the Hadoop Architecture Lake of Exomes [HARLEE]. Geocentric analysis on HARLEE rapidly identified 154 genes harboring variants suspected to cause Mendelian disorders. I also contributed to the manuscript related to this effort

Hansen AWMurugan MLi HKhayat MMWang LRosenfeld JB Andrews KJhangiani SNAkdemir ZHCobanSedlazeck FJAshley-Koch AELiu PMuzny DMDavis EEKatsanis NSabo APosey JEYang YWangler MFEng CMV Sutton RLupski JRBoerwinkle EGibbs RAA Genocentric Approach to Discovery of Mendelian Disorders. Am J Hum Genet. 2019 ;105(5):974-986. PubMed DOI Google Scholar

eMERGE Data Access Portal (eDAP): Capitalizing on Anton, I also led the effort to develop the eMERGE Data Access Portal (eDAP), a web-based tool for genotype/phenotype analysis and sample tracking for the Electronic Medical Records and Genomics (eMERGE) Network Phase III program.

ARBoR: Clinical genome sequencing laboratories return reports containing clinical testing results, signed by a board certified clinical geneticist, to the ordering physician. This report is often a PDF but can also be a paper copy or a structured data file. The reports are frequently modified and re-issued due to changes in variant interpretation or clinical attributes. To precisely track report authenticity, along with Eric Venner and a group at the HGSC, I developed ARBoR (Authenticated Resources in a Hashed Block Registry). ARBoR tracks the authenticity and lineage of versioned clinical reports even when they are distributed as PDF or paper copies. ARBoR tracks clinical reports as cryptographically signed hash blocks in an electronic ledger file, which is then exactly replicated to many clients. I co-authored the manuscript that details this effort.

Venner EMurugan MHale WJones JMLu SYi VGibbs RAARBoR: an identity and security solution for clinical reporting. J Am Med Inform Assoc. 2019 ;26(11):1370-1374. PubMed DOI Google Scholar

For a complete list of published work: