Differential Methylation Analysis

Slides:



Advertisements
Similar presentations
Quantitative Methods Topic 5 Probability Distributions
Advertisements

Multistage Sampling.
STATISTICS Sampling and Sampling Distributions
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
The Application of Propensity Score Analysis to Non-randomized Medical Device Clinical Studies: A Regulatory Perspective Lilly Yue, Ph.D.* CDRH, FDA,
Sales Forecasting using Dynamic Bayesian Networks Steve Djajasaputra SNN Nijmegen The Netherlands.
Visualization and analysis of large data collections: a case study applied to confocal microscopy data Wim de Leeuw, Swammerdam Institute for Life Sciences,
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.
Chapter 12 Analysing quantitative data
Assumptions underlying regression analysis
SADC Course in Statistics Session 4 & 5 Producing Good Tables.
STATISTICAL INFERENCE ABOUT MEANS AND PROPORTIONS WITH TWO POPULATIONS
Keith D. McCroan US EPA National Air and Radiation Environmental Laboratory Radiobioassay and Radiochemical Measurements Conference October 29, 2009.
Chapter 7 Sampling and Sampling Distributions
Hypothesis Test II: t tests
Chapter 4: Basic Estimation Techniques
Department of Engineering Management, Information and Systems
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
Company Confidential © 2012 Eli Lilly and Company Beyond ICH Q1E Opening Remarks Rebecca Elliott Senior Research Scientist Eli Lilly and Company MBSW 2013.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 10-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Testing Workflow Purpose
Inferential Statistics and t - tests
Non-Parametric Statistics
Chi-Square and Analysis of Variance (ANOVA)
指導教授:李錫堤 教授 學生:邱奕勛 報告日期:
Chapter 4 Inference About Process Quality
Squares and Square Root WALK. Solve each problem REVIEW:
Lecture 3 Validity of screening and diagnostic tests
Quantitative Methods for Researchers Paul Cairns
Statistical Analysis SC504/HS927 Spring Term 2008
Lecture 8: Testing, Verification and Validation
Lecture Unit Multiple Regression.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
1 General Iteration Algorithms by Luyang Fu, Ph. D., State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting LLP 2007 CAS.
Addition 1’s to 20.
25 seconds left…...
Determining How Costs Behave
Week 1.
We will resume in: 25 Minutes.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Chapter Thirteen The One-Way Analysis of Variance.
Chapter 18: The Chi-Square Statistic
Chapter 8 Estimation Understandable Statistics Ninth Edition
Chapter 11: The t Test for Two Related Samples
1 Chapter 20: Statistical Tests for Ordinal Data.
Learning Outcomes Participants will be able to analyze assessments
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.
Visualising and Exploring BS-Seq Data
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
Differential Methylation Analysis
Expression and Methylation: QC and Pre-Processing
RNA-Seq analysis in R (Bioconductor)
Genome Wide Association Studies using SNP
12 Inferential Analysis.
Analysing ChIP-Seq Data
Exploring and Understanding ChIP-Seq data
Visualising and Exploring BS-Seq Data
12 Inferential Analysis.
Presentation transcript:

Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews

A basic question…

Factors to consider Number of observations Magnitude of effect Technical considerations Biological variability Biological common sense

The problem of power… Ideally want to cover every Cytosine (CpG) Have to correct for the number of tests There’s no way you’ll collect enough data to analyse each C and have p-values which survive multiple testing correction Stats have to find a way to work round this.

Maximising power Options Analyse in windows Pre-filter Hierarchical or Adaptive filtering

Window sizes Small windows Large windows Good resolution Specific biological effects High MTC burden Small observations High p-values Lots of data High statistical power Low MTC burden Low p-values Effect averaging

Simple Statistical Approach Is the proportion of methylated calls different between two samples, given the number of observations? Meth count A Unmeth count A Meth count B Unmeth count B % change Significant? 2 100 No 200 198 5 1.5 50 75 60 11 Probably

Contingency tests Chi-square / G-test / Fisher’s exact test Differ only at low observations Significant changes require enough observations that any of these should give the same answer Operates on single replicates Technical measure of difference Meth A Unmeth A Meth B Unmeth B

Chi-Square results

Biological considerations Minimum relevant effect size? Balance power vs change What makes biological sense (what would you follow up?) Minimum coverage worth testing No point testing poorly covered regions

Effect of pre-filtering

Distribution of methylation Chi square assumes a normal distribution, and methylation data isn’t normally distributed

Beta binomial distribution More relevant statistics than chi-square. Need to fit custom model to actual data.

Implications of a beta distribution Many summaries assume normality Mean Standard Deviation Boxplots None of these is strictly appropriate when looking at methylation data

Dealing with replicates Simple approach Merge data from replicates together Single test, High power Post-hoc test for consistency Explicitly account for batch effects Logistic regression Measures batch effects and excludes them from final significance calculation Work with methylation values Normalise percentage methylation values Use conventional statistics (t-tests etc) for comparing groups

Hierarchical testing Test larger regions Windows / Features etc. Take significant hits and subdivide Smaller windows Individual CpGs Correct only for these tests Assemble hits together to make up DMRs

X X Hierarchical testing Genome CGI Genome CGI X Genome CGI X Statistically ‘creative’ solution to not having enough data

Methylation statistics packages swDMR (Perl/R-package) Sliding window DMR finding (choose between t_test, Kolmogorov, Fisher, ChiSquare, Wilcoxon for n = 2; ANOVA, Kruskal for n > 3) methylKit* (R-package by A. Akalin et al.) Sliding window, Fisher’s exact test or logistic regression. Adjusts p-values to q-values using SLIM method. bsseq* (R/Bioconductor by K.D. Hansen) Implements the BSmooth smoothing algorithm. Numerous CpG-wise t-tests and p-value cutoff to define DMRs. Outperforms Fisher’s exact test. Requires biological replicates for DMR detection BiSeq* (R/Bioconductor by K. Hebestreit et al.) Beta regression model, impractical for very large data other than RRBS or targeted BS-Seq RnBeads* (R package by F. Mueller et al.) works for 450K arrays, BS-Seq, MeDIP or MBD-Seq data DMAP* (C command line tool by P. Stockwell et al.) RRBS fragment or fixed window approach, Fisher’s exact test, Chi-squared or ANOVA RADMeth (C++ command line tool by E. Dolzhenko and A.D. Smith) Beta-binomial regression analysis to find DMCs or DMRs, local likelihood, adjust for neighbouring CpGs MOABS* (C++ command line tool by D. Sun et al.) Beta binomial hierarchical model to capture sampling and biological variation, Credible Methylation Difference (CDIF) single metric that combines biological and statistical significance ComMet (Y. Saito et al., 2014) Bisulfighter suite; DMR detection based on hidden Markov models (HMMs) that enable automated adjustment of DMC chaining criteria. Does not require biological replicates DSS (R/Bioconductor by Feng et al., 2014) Constructs genome-wide prior distribution for beta-binomial dispersion. Bayesian hierarchical model to detect differentially methylated loci more appearing every other week… * interface well with

Tool Statistical test Suitable for Implementation Notes bsseq Sample-wise smoothing, then group differences via CpG-wise t-tests (p-value cutoff to define adjacent CpG sites as DMRs) WGBS; not designed for targeted BS-Seq or RRBS R package/ Bioconductor Outperforms Fisher’s exact test; intended to compare 2 groups; replicates required BiSeq Define CpG clusters, smooth methylation data, model and test group effect (fitting beta regression model to smoothed methylation levels and testing for group effect using the Wald test), hierarchical testing procedure on CpG clusters, then define DMR boundaries RRBS; targeted BS-Seq; for WGBS Very computationally intensive; Not limited to 2 groups MethylKit Models CpG methylation within a logistic regression. Sliding linear model (SLIM) to correct for multiple testing (e)RRBS R package * WGBS = whole genome BS-Seq; (e)RRBS = (enhanced) reduced representation BS-Seq

bsseq – for whole genome BS-Seq Smoothing of low coverage BS-Seq first to get reliable semi-local methylation estimation estimates Not suitable for captured or restricted data After smoothing it uses biological replicates to estimate biological variation and identify methylated regions (DMRs) Smoothing suitable for even a single sample Works for CpG context in humans, will probably not scale to 2x585M Cs in non-CG context

BSmooth algorithm black: 25x (Lister) pink: 4x (Lister)

Bsmooth t-values