ParaDIME : (Parallel Differential Methylation analysis)

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Epigenetics Epigenetics - Heritable changes in gene expression that operate outside of changes in DNA itself - stable changes in gene expression caused.
Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
[Bejerano Fall10/11] 1 Thank you for the midterm feedback! Projects will be assigned shortly.
D. Cell Specialization: Regulation of Transcription Cell specialization in multicellular organisms results from differential gene expression.
Estrogen and its receptors play an important role in breast carcinogenesis. In humans, there are two subtypes of estrogen receptors (ER), ER  and ER ,
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
Regulation of Gene Expression Eukaryotes
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
Epigenetic Modifications in Crassostrea gigas Claire H. Ellis and Steven B. Roberts School of Aquatic and Fishery Sciences, University of Washington, Seattle,
Histone Methyltransferases: Global Industry Report for Research Tools, Diagnostics and Drug Discovery
Bioinformatics and Computational Biology
Molecules and mechanisms of epigenetics. Adult stem cells know their fate! For example: myoblasts can form muscle cells only. Hematopoetic cells only.
Agenda  Epigenetics and microRNAs – Update –What’s epigenetics? –Preliminary results.
The Importance of Epigenetic Phenomena in Regulating Activity of the Genetic Material Sin Chan.
Gene Regulation, Part 2 Lecture 15 (cont.) Fall 2008.
Regulation of gene expression in eukaryotes II. 22 November 2013.
Warm up  1. How is DNA packaged into Chromosomes?  2. What are pseudogenes?  3. Contrast DNA methylation to histone acetylation (remember the movie.
Epigenetics of cancer Vilja ja Mia.
Differential Methylation Analysis
Presented by: John Lawson Developed by: John Lawson, Nathan Sheffield
High-throughput data used in bioinformatics
Sungkyunkwan University, School of Medicine.
Functional Elements in the Human Genome
GENETIC BIOMARKERS.
Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 2016 Workshop
EPIGENETICS Textbook Fall 2013.
Epigenetics 04/04/16.
Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.
RNA-Seq analysis in R (Bioconductor)
Functional Mapping and Annotation of GWAS: FUMA
Regulation of Gene Expression
Discovery of Multiple Differentially Methylated Regions
Introduction to Genetic Analysis
Regulation of Gene Expression by Eukaryotes
Regulation of gene Expression in Prokaryotes & Eukaryotes
Concept 18.2: Eukaryotic gene expression can be regulated at any stage
Gene Regulation.
High-Resolution Profiling of Histone Methylations in the Human Genome
Beyond genetics: epigenetic code in chronic kidney disease
Epigenetics Heritable alteration of gene expression without a change in nucleotide sequence.
7.2 Transcription & Gene Expression
Volume 9, Issue 3, Pages (September 2017)
Review Warm-Up What is the Central Dogma?
The histone H3.3K36M mutation reprograms the epigenome of chondroblastomas by Dong Fang, Haiyun Gan, Jeong-Heon Lee, Jing Han, Zhiquan Wang, Scott M. Riester,
High-Resolution Profiling of Histone Methylations in the Human Genome
Volume 11, Issue 3, Pages (April 2015)
ChIP-seq Robert J. Trumbly
Volume 16, Issue 8, Pages (August 2016)
Systematic mapping of functional enhancer-promoter connections with CRISPR interference by Charles P. Fulco, Mathias Munschauer, Rockwell Anyoha, Glen.
Epigenetics modification
Volume 72, Issue 2, Pages e5 (October 2018)
Non coding DNA Coding Not all DNA codes for a polypeptide to be made May have another useful function Non-coding sequences of DNA e.g. STRs Another example:
Presentation by: Hannah Mays UCF - BSC 4434 Professor Xiaoman Li
Volume 132, Issue 2, Pages (January 2008)
Systematic mapping of functional enhancer–promoter connections with CRISPR interference by Charles P. Fulco, Mathias Munschauer, Rockwell Anyoha, Glen.
Epigenetic Control of the S100A6 (Calcyclin) Gene Expression
Anh Pham Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease.
By Wenfei Jin Presenter: Peter Kyesmu
Eukaryotic Gene Regulation
Volume 26, Issue 12, Pages e5 (March 2019)
Integrative analysis of 111 reference human epigenomes
Volume 10, Issue 7, Pages (February 2015)
7.2 Transcription and gene expression
Volume 13, Issue 10, Pages (December 2015)
The Genetics of Transcription Factor DNA Binding Variation
Epigenetics.
Transcriptional and epigenetic landscapes of RMS cell lines and primary tumors. Transcriptional and epigenetic landscapes of RMS cell lines and primary.
Presentation transcript:

ParaDIME : (Parallel Differential Methylation analysis) A statistical suite for genome wide differential DNA methylation analysis. Sarabjot Pabla1, Robert Podolsky2, Hui-Dong Shi1, Richard McIndoe1 1Center for Biotechnology and Genomic Medicine, Georgia Health Sciences University, Augusta, GA 30912 2School of Medicine, Wayne State University, Detroit, MI 48202 ABSTRACT Aberrant DNA methylation is known to play an important role in pathogenesis of several types of cancer. Identifying the location of these methylation changes is imperative to understanding the epigenetic landscape and its subsequent role in disease development and progression. Current technologies, such as reduced representation bisulfite sequencing (RRBS) allow comprehensive interrogation of the DNA methylation on a genome wide scale. However, statistical analysis of genome wide methylation data is complex and resource intensive. To overcome these challenges we developed ParaDIME (Parallel Differential DNA Methylation) which is a parallel algorithm to perform differential methylation analysis of RRBS data. ParaDIME uses a non-parametric RaoScott chi-squared test that does not assume normal distribution of methylation measurements on the genome allowing for a more appropriate test for differential methylation. Moreover, the parallel architecture significantly increases the speed of analysis. The ParaDIME framework is scalable to accommodate large volumes of RRBS data. Performance of ParaDIME was evaluated on high and low risk subtypes of chronic lymphocytic leukemia (CLL) patients. ParaDIME analyzed 22 million data points for 11 patients in approximately 2 hours, making it significantly faster than current platforms. ParaDIME identified 57,463 differentially methylated CpG sites in two subtypes of CLL patients. Downstream analyses of these sites revealed significant enrichment of cancer related genes and pathways. ParaDIME can be easily modified to analyze differentially methylated regions such as CpG islands, promoters, enhancers and other regulatory elements. ParaDIME will significantly improve the quality and speed of genome wide DNA methylation analysis. ALGORITHM   RESULTS Fig6: Cancer Neighborhood enrichment of significantly differentially methylated CpG sites in two CLL subtypes Fig2 : Workflow describing the design and implementation of the algorithm INTRODUCTION DNA methylation is a stable and heritable epigenetic alteration that not only regulates gene expression but also genome structure and stability. DNA methylation is a covalent chemical modification, wherein a methyl (CH3) moiety is added at the carbon 5 position of the cytosine ring. This reaction is catalyzed by various enzymes known as DNA methyltransferases (DNMT). The importance of this epigenetic modification in development and sustenance of life is evident from the fact the DNMT knockout mice and frogs are embryonic lethal. Next generation sequencing has allowed us to interrogate the whole genome for methylated CpG sites. Consequently, methylation data from different disease groups allow us to perform differential methylation analyses. It provides us an insight into the role of global differential methylation in the disease. We have devised an algorithm which uses Rao-Scott Chi Squared test to estimate differential methylation. We use a permutation test to adjust the p values of the test. Unlike other available packages, we do not assume the distribution of the test statistic. We create a null distribution from the provided data to adjust for multiple testing, hence providing the user with a better estimate of differential methylation across the genome. The parallel architecture allows us to perform a large number of permutations in significantly less time. It also adds the benefit of scalability giving the user the power to add more compute nodes for extremely large datasets. ParaDIME is a .NET based webservice which reports all the sites that were tested in all the samples with p values and adjusted p values in UCSC genome browser (.BED) format. The user can then filter sites based on a preferred cutoff and use the sites for downstream analyses. PERFORMANCE Fig7: UCSC Genome Browser view of differentially methylated CpG sites in HDAC1 gene with tracks for transcription factor chip, B cell RNA seq data and DNaseI raw signal for K562 cancer cell line. DISCUSSION Developed a parallel algorithm (ParaDIME) for genome wide differential DNA methylation. It significantly reduces the time for genome wide methylation analyses [Fig 3-4]. Uses Rao-Scott chi squared test to determine the difference in proportion of DNA methylation between groups. Proper non parametric test that is not dependent on assumption of normality and takes into account variation in coverage of different samples. Uses permutation test and False Discovery Rate (FDR) calculations to estimate p values. Robust Scalability offers a unique advantage to handle large genome wide methylation data. Adding more modes reduces the analyses time for increasing data loads [Fig 3]. ParaDIME analysis of CLL patient data reveals distribution of differentially methylated sites similar to previously published ENCODE project data [Fig 5] with most sites present in intergenic and intronic regions. Our analyses revealed that differentially methylated CpG sites are highly enriched in cancer neighborhood genes [Fig 6]. Further analysis of the most enriched gene HDAC1 revealed Histone Deacetylase 1 (HDAC1) is known to be responsible for DNA damage response. Seven DM CpG sites were found to be present in the CpG Island in the regulatory region of HDAC1 gene. These sites also overlapped with transcription factor binding site (Txn Factor ChIP) and preceded by open chromatin as shown by DNase on K562 cancer cell line. ParaDIME is being further developed to include region wise analyses. This will allow estimation of differentially methylated regions in the genome such as CpG islands, promoters, enhancer and distal promoters. Fig3 : ParaDIME performance results. Execution times of ParaDIME at various data loads and different number of compute nodes used. Fig4 : Performance Comparison with existing platforms. Custom R code equivalent of ParaDIME algorithm (not parallel) for differential methylation analyses compared with ParaDIME Analyses of 11 patients with 2 millions sites per patient. Execution time with different number of compute nodes shown. RESULTS We used ParaDIME to perform preliminary site wise analyses to find differentially methylated CpG sites in chronic lymphocytic leukemia (CLL) patient subtypes. We used RRBS data from 11 patents 7 patients in good prognosis group 4 patients in poor prognosis group Differentially methylated CpG sites were estimated by filtering adjusted p values <0.05 after permutation test. ~2 million CpG sites per sample were tested using 10 compute nodes. 57,463 CpG sites were found to be differentially methylated between the two groups. Fig 1: DNA methylation and its effects in various elements of the genome. CONCLUSIONS ParaDIME is a significantly faster algorithm to estimate differentially methylated sites in the genomes across multiple samples. Parallelization of the algorithm allows the use of permutation testing which helps in more accurate estimation of differentially methylated sites while controlling for multiple testing. Its scalability allows it to perform with immensely large datasets which encompass population wide genome data. FUTURE WORK ParaDIME is being further developed to include region wise analyses. This will allow estimation of differentially methylated regions in the genome such as CpG islands, promoters, enhancer and distal promoters. Fig5: Annotation of significant CpG sites ACKNOWLEDGEMENTS All the work presented in this poster has been supported by grant DK076169 from the NIDDK.