ERIS

About the Software

ERIS is a quality control software that assesses possible contamination of Illumina whole-genome and whole-exome sequence data by comparing sequenced reads to SNP array data. Implemented as an automated pipeline at the HGSC, ERIS also validates the identities of all samples, detecting potential swaps that can occur throughout the pipeline.

ERIS inputs are the sample sequence data (FASTQ format: https://en.wikipedia.org/wiki/FASTQ_format), sample SNP array data (Birdseed format), and a chip specific probelist. The Birdseed file is a tab-delimited text file of all the SNPs and their associated genotypes for a given sample. The probelist is a tab-delimited text file containing details of all the probes in the chip used to generate the sample SNP arrays.

Omni2.5 chip probelist file example:

Birdseed file example:

For each sample, ERIS first queries the FASTQ for all instances of each probe in the probelist and classifies each instance as either “reference” or “variant.” These class frequencies are used to determine sequence-based SNP genotypes for a sample. These genotypes are compared to the corresponding array-based SNP genotype to determine self-concordance ratios. Concordance ratios less than 0.90 generally indicate a contaminated sample. ERIS also compares sequence-based genotype to other sample array data and calculates best-hit concordance ratios.

For contamination calculation, we compare only homozygous variant genotypes from the SNP array data to the FASTQ data. If the sequence-based SNP genotype is not homozygous variant, then there is possible contamination (# of non-matching genotypes / total homozygous variant sites). Contamination greater than 5% is investigated.

ERIS final output reports, self-concordance to sample array data, contamination value, top 6 concordance ratios from comparison to other sample arrays in ascending order. A higher concordance to a sample array other than the self indicates a possible swap and is further investigated.

The ERIS pipeline is integrated with the BCM-HGSC Laboratory Information Management System (LIMS), which tracks every sample from intake to data delivery. All ERIS results are stored in the LIMS database, allowing the BCM-HGSC QC team to monitor center-wide SNP concordance and identify potential systemic errors.

License

Copyright © Baylor College of Medicine Human Genome Sequencing Center. All rights reserved.