Software tools

Enhancer Linking by Methylation Expression Relationships (ELMER)


ELMER  is designed to combine DNA methylation and gene expression data from human tissues to infer multi-level cis-regulatory networks. It uses DNA methylation to identify enhancers, and correlates enhancer state with expression of nearby genes to identify one or more transcriptional targets. Transcription factor (TF) binding site analysis of enhancers is coupled with expression analysis of all TFs to infer upstream regulators. This package can be easily applied to TCGA public available cancer data sets and custom DNA methylation and gene expression data sets.



A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites

We introduce motifbreakR, which allows the biologist to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. MotifbreakR is both flexible and extensible over previous offerings; giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum probability matrix, 2) log-probabilities, and 3) weighted by relative entropy. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor (currently there are 22).


Rules-based functional segmentation of the genome.

Build models for functional annotations with any kind of data, for use in genomics/epigenomics research. 
R/bioconductor package (TBA) forthcoming. 


FunciSNP - Integrating Functional Non-coding Datasets with Genetic Association Studies to Identify Candidate Regulatory SNPs

FunciSNP integrates information from GWAS, 1000genomes and chromatin feature to identify functional SNP in coding or non-coding regions.


Bis-SNP - A bisulfite space genotyper & methylation caller

BisSNP is a package based on the Genome Analysis Toolkit (GATK) map-reduce framework for genotyping and accurate DNA methylation calling in bisulfite treated massively parallel sequencing (Bisulfite-seq, NOMe-seq, RRBS and any other bisulfite treated sequencing) with Illumina directional library protocol. It contains the following key features:

  • Call and summarize methylation of any cytosine context provided (CpG, CHH, CHG, GCH;
  • Work for single end and paired-end data;
  • Accurtae variant detection. Enable base quality recalibration and indel calling in bisulfite sequencing;
  • Based on Java map-reduce framework, allow multi-thread computing. Cross-platform;
  • Allow multiple output format, detailed VCF files, CpG haplotype reads file for mono-allelic methylation analysis, simplified bedGraph, wig and bed format for visualization in UCSC genome broswer and IGV browser.


BisSNP uses bayesian inference with locus specific methylation probabilities and bisulfite conversion rate of different cytosine context(not only CpG, CHH, CHG in Bisulfite-seq, but also GCH in other bisulfite treated sequencing) to determine genotypes and methylation levels simultaneously. Specificity and sensitivity has been validate by Illumina IM SNP array. In default threshold (Phred scale score > 20), it could detect 92.21% heterozygous SNPs with 0.14% false positive rate (90.88% sensitivity in C/T SNPs with 0.16% false positive rate, 98.51% sensitivity in non C/T SNPs with 0.16% false positive rate). Cytosine calling is not only based on reference context, so it could detect non-reference cytosine context for usage in epigenome wide association study.


IGV - Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations

The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.


Epigenome Center Data Portal (ECDP)

This scalable portal allows researchers to explore and download their datasets in a secure fashion. From the initial LIMS sample entry (currently using Genologics) through sequencing and downstream analysis on our supercomputing cluster, all characteristics of a sample are parsed and tracked allowing for the presentation of these metrics on a single integrated interface.