Presentation is loading. Please wait.

Presentation is loading. Please wait.

ParaDIME : (Parallel Differential Methylation analysis)

Similar presentations


Presentation on theme: "ParaDIME : (Parallel Differential Methylation analysis)"— Presentation transcript:

1 ParaDIME : (Parallel Differential Methylation analysis)
A statistical suite for genome wide differential DNA methylation analysis. Sarabjot Pabla1, Robert Podolsky2, Hui-Dong Shi1, Richard McIndoe1 1Center for Biotechnology and Genomic Medicine, Georgia Health Sciences University, Augusta, GA 30912 2School of Medicine, Wayne State University, Detroit, MI 48202 ABSTRACT Aberrant DNA methylation is known to play an important role in pathogenesis of several types of cancer. Identifying the location of these methylation changes is imperative to understanding the epigenetic landscape and its subsequent role in disease development and progression. Current technologies, such as reduced representation bisulfite sequencing (RRBS) allow comprehensive interrogation of the DNA methylation on a genome wide scale. However, statistical analysis of genome wide methylation data is complex and resource intensive. To overcome these challenges we developed ParaDIME (Parallel Differential DNA Methylation) which is a parallel algorithm to perform differential methylation analysis of RRBS data. ParaDIME uses a non-parametric RaoScott chi-squared test that does not assume normal distribution of methylation measurements on the genome allowing for a more appropriate test for differential methylation. Moreover, the parallel architecture significantly increases the speed of analysis. The ParaDIME framework is scalable to accommodate large volumes of RRBS data. Performance of ParaDIME was evaluated on high and low risk subtypes of chronic lymphocytic leukemia (CLL) patients. ParaDIME analyzed 22 million data points for 11 patients in approximately 2 hours, making it significantly faster than current platforms. ParaDIME identified 57,463 differentially methylated CpG sites in two subtypes of CLL patients. Downstream analyses of these sites revealed significant enrichment of cancer related genes and pathways. ParaDIME can be easily modified to analyze differentially methylated regions such as CpG islands, promoters, enhancers and other regulatory elements. ParaDIME will significantly improve the quality and speed of genome wide DNA methylation analysis. ALGORITHM RESULTS Fig6: Cancer Neighborhood enrichment of significantly differentially methylated CpG sites in two CLL subtypes Fig2 : Workflow describing the design and implementation of the algorithm INTRODUCTION DNA methylation is a stable and heritable epigenetic alteration that not only regulates gene expression but also genome structure and stability. DNA methylation is a covalent chemical modification, wherein a methyl (CH3) moiety is added at the carbon 5 position of the cytosine ring. This reaction is catalyzed by various enzymes known as DNA methyltransferases (DNMT). The importance of this epigenetic modification in development and sustenance of life is evident from the fact the DNMT knockout mice and frogs are embryonic lethal. Next generation sequencing has allowed us to interrogate the whole genome for methylated CpG sites. Consequently, methylation data from different disease groups allow us to perform differential methylation analyses. It provides us an insight into the role of global differential methylation in the disease. We have devised an algorithm which uses Rao-Scott Chi Squared test to estimate differential methylation. We use a permutation test to adjust the p values of the test. Unlike other available packages, we do not assume the distribution of the test statistic. We create a null distribution from the provided data to adjust for multiple testing, hence providing the user with a better estimate of differential methylation across the genome. The parallel architecture allows us to perform a large number of permutations in significantly less time. It also adds the benefit of scalability giving the user the power to add more compute nodes for extremely large datasets. ParaDIME is a .NET based webservice which reports all the sites that were tested in all the samples with p values and adjusted p values in UCSC genome browser (.BED) format. The user can then filter sites based on a preferred cutoff and use the sites for downstream analyses. PERFORMANCE Fig7: UCSC Genome Browser view of differentially methylated CpG sites in HDAC1 gene with tracks for transcription factor chip, B cell RNA seq data and DNaseI raw signal for K562 cancer cell line. DISCUSSION Developed a parallel algorithm (ParaDIME) for genome wide differential DNA methylation. It significantly reduces the time for genome wide methylation analyses [Fig 3-4]. Uses Rao-Scott chi squared test to determine the difference in proportion of DNA methylation between groups. Proper non parametric test that is not dependent on assumption of normality and takes into account variation in coverage of different samples. Uses permutation test and False Discovery Rate (FDR) calculations to estimate p values. Robust Scalability offers a unique advantage to handle large genome wide methylation data. Adding more modes reduces the analyses time for increasing data loads [Fig 3]. ParaDIME analysis of CLL patient data reveals distribution of differentially methylated sites similar to previously published ENCODE project data [Fig 5] with most sites present in intergenic and intronic regions. Our analyses revealed that differentially methylated CpG sites are highly enriched in cancer neighborhood genes [Fig 6]. Further analysis of the most enriched gene HDAC1 revealed Histone Deacetylase 1 (HDAC1) is known to be responsible for DNA damage response. Seven DM CpG sites were found to be present in the CpG Island in the regulatory region of HDAC1 gene. These sites also overlapped with transcription factor binding site (Txn Factor ChIP) and preceded by open chromatin as shown by DNase on K562 cancer cell line. ParaDIME is being further developed to include region wise analyses. This will allow estimation of differentially methylated regions in the genome such as CpG islands, promoters, enhancer and distal promoters. Fig3 : ParaDIME performance results. Execution times of ParaDIME at various data loads and different number of compute nodes used. Fig4 : Performance Comparison with existing platforms. Custom R code equivalent of ParaDIME algorithm (not parallel) for differential methylation analyses compared with ParaDIME Analyses of 11 patients with 2 millions sites per patient. Execution time with different number of compute nodes shown. RESULTS We used ParaDIME to perform preliminary site wise analyses to find differentially methylated CpG sites in chronic lymphocytic leukemia (CLL) patient subtypes. We used RRBS data from 11 patents 7 patients in good prognosis group 4 patients in poor prognosis group Differentially methylated CpG sites were estimated by filtering adjusted p values <0.05 after permutation test. ~2 million CpG sites per sample were tested using 10 compute nodes. 57,463 CpG sites were found to be differentially methylated between the two groups. Fig 1: DNA methylation and its effects in various elements of the genome. CONCLUSIONS ParaDIME is a significantly faster algorithm to estimate differentially methylated sites in the genomes across multiple samples. Parallelization of the algorithm allows the use of permutation testing which helps in more accurate estimation of differentially methylated sites while controlling for multiple testing. Its scalability allows it to perform with immensely large datasets which encompass population wide genome data. FUTURE WORK ParaDIME is being further developed to include region wise analyses. This will allow estimation of differentially methylated regions in the genome such as CpG islands, promoters, enhancer and distal promoters. Fig5: Annotation of significant CpG sites ACKNOWLEDGEMENTS All the work presented in this poster has been supported by grant DK from the NIDDK.


Download ppt "ParaDIME : (Parallel Differential Methylation analysis)"

Similar presentations


Ads by Google