Differential Methylation Analysis

Slides:

Advertisements

Similar presentations

Quantitative Methods Topic 5 Probability Distributions

Advertisements

Multistage Sampling.

STATISTICS Sampling and Sampling Distributions

By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.

The Application of Propensity Score Analysis to Non-randomized Medical Device Clinical Studies: A Regulatory Perspective Lilly Yue, Ph.D.* CDRH, FDA,

Sales Forecasting using Dynamic Bayesian Networks Steve Djajasaputra SNN Nijmegen The Netherlands.

Visualization and analysis of large data collections: a case study applied to confocal microscopy data Wim de Leeuw, Swammerdam Institute for Life Sciences,

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Title Subtitle.

Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA

1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

Chapter 12 Analysing quantitative data

Assumptions underlying regression analysis

SADC Course in Statistics Session 4 & 5 Producing Good Tables.

STATISTICAL INFERENCE ABOUT MEANS AND PROPORTIONS WITH TWO POPULATIONS

Keith D. McCroan US EPA National Air and Radiation Environmental Laboratory Radiobioassay and Radiochemical Measurements Conference October 29, 2009.

Chapter 7 Sampling and Sampling Distributions

Hypothesis Test II: t tests

Chapter 4: Basic Estimation Techniques

Department of Engineering Management, Information and Systems

On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach

Company Confidential © 2012 Eli Lilly and Company Beyond ICH Q1E Opening Remarks Rebecca Elliott Senior Research Scientist Eli Lilly and Company MBSW 2013.

Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 10-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.

Testing Workflow Purpose

Inferential Statistics and t - tests

Non-Parametric Statistics

Chi-Square and Analysis of Variance (ANOVA)

指導教授：李錫堤教授學生：邱奕勛報告日期：

Chapter 4 Inference About Process Quality

Squares and Square Root WALK. Solve each problem REVIEW:

Lecture 3 Validity of screening and diagnostic tests

Quantitative Methods for Researchers Paul Cairns

Statistical Analysis SC504/HS927 Spring Term 2008

Lecture 8: Testing, Verification and Validation

Lecture Unit Multiple Regression.

GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.

1 General Iteration Algorithms by Luyang Fu, Ph. D., State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting LLP 2007 CAS.

Addition 1’s to 20.

25 seconds left…...

Determining How Costs Behave

We will resume in: 25 Minutes.

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

Chapter Thirteen The One-Way Analysis of Variance.

Chapter 18: The Chi-Square Statistic

Chapter 8 Estimation Understandable Statistics Ninth Edition

Chapter 11: The t Test for Two Related Samples

1 Chapter 20: Statistical Tests for Ordinal Data.

Learning Outcomes Participants will be able to analyze assessments

January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.

Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.

Visualising and Exploring BS-Seq Data

Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.

5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.

Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.

Differential Methylation Analysis

Expression and Methylation: QC and Pre-Processing

RNA-Seq analysis in R (Bioconductor)

Genome Wide Association Studies using SNP

12 Inferential Analysis.

Analysing ChIP-Seq Data

Exploring and Understanding ChIP-Seq data

Visualising and Exploring BS-Seq Data

12 Inferential Analysis.

Presentation transcript:

Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews

A basic question…

Factors to consider Number of observations Magnitude of effect Technical considerations Biological variability Biological common sense

The problem of power… Ideally want to cover every Cytosine (CpG) Have to correct for the number of tests There’s no way you’ll collect enough data to analyse each C and have p-values which survive multiple testing correction Stats have to find a way to work round this.

Maximising power Options Analyse in windows Pre-filter Hierarchical or Adaptive filtering

Window sizes Small windows Large windows Good resolution Specific biological effects High MTC burden Small observations High p-values Lots of data High statistical power Low MTC burden Low p-values Effect averaging

Simple Statistical Approach Is the proportion of methylated calls different between two samples, given the number of observations? Meth count A Unmeth count A Meth count B Unmeth count B % change Significant? 2 100 No 200 198 5 1.5 50 75 60 11 Probably

Contingency tests Chi-square / G-test / Fisher’s exact test Differ only at low observations Significant changes require enough observations that any of these should give the same answer Operates on single replicates Technical measure of difference Meth A Unmeth A Meth B Unmeth B

Chi-Square results

Biological considerations Minimum relevant effect size? Balance power vs change What makes biological sense (what would you follow up?) Minimum coverage worth testing No point testing poorly covered regions

Effect of pre-filtering

Distribution of methylation Chi square assumes a normal distribution, and methylation data isn’t normally distributed

Beta binomial distribution More relevant statistics than chi-square. Need to fit custom model to actual data.

Implications of a beta distribution Many summaries assume normality Mean Standard Deviation Boxplots None of these is strictly appropriate when looking at methylation data

Dealing with replicates Simple approach Merge data from replicates together Single test, High power Post-hoc test for consistency Explicitly account for batch effects Logistic regression Measures batch effects and excludes them from final significance calculation Work with methylation values Normalise percentage methylation values Use conventional statistics (t-tests etc) for comparing groups

Hierarchical testing Test larger regions Windows / Features etc. Take significant hits and subdivide Smaller windows Individual CpGs Correct only for these tests Assemble hits together to make up DMRs

X X Hierarchical testing Genome CGI Genome CGI X Genome CGI X Statistically ‘creative’ solution to not having enough data

Methylation statistics packages swDMR (Perl/R-package) Sliding window DMR finding (choose between t_test, Kolmogorov, Fisher, ChiSquare, Wilcoxon for n = 2; ANOVA, Kruskal for n > 3) methylKit* (R-package by A. Akalin et al.) Sliding window, Fisher’s exact test or logistic regression. Adjusts p-values to q-values using SLIM method. bsseq* (R/Bioconductor by K.D. Hansen) Implements the BSmooth smoothing algorithm. Numerous CpG-wise t-tests and p-value cutoff to define DMRs. Outperforms Fisher’s exact test. Requires biological replicates for DMR detection BiSeq* (R/Bioconductor by K. Hebestreit et al.) Beta regression model, impractical for very large data other than RRBS or targeted BS-Seq RnBeads* (R package by F. Mueller et al.) works for 450K arrays, BS-Seq, MeDIP or MBD-Seq data DMAP* (C command line tool by P. Stockwell et al.) RRBS fragment or fixed window approach, Fisher’s exact test, Chi-squared or ANOVA RADMeth (C++ command line tool by E. Dolzhenko and A.D. Smith) Beta-binomial regression analysis to find DMCs or DMRs, local likelihood, adjust for neighbouring CpGs MOABS* (C++ command line tool by D. Sun et al.) Beta binomial hierarchical model to capture sampling and biological variation, Credible Methylation Difference (CDIF) single metric that combines biological and statistical significance ComMet (Y. Saito et al., 2014) Bisulfighter suite; DMR detection based on hidden Markov models (HMMs) that enable automated adjustment of DMC chaining criteria. Does not require biological replicates DSS (R/Bioconductor by Feng et al., 2014) Constructs genome-wide prior distribution for beta-binomial dispersion. Bayesian hierarchical model to detect differentially methylated loci more appearing every other week… * interface well with

Tool Statistical test Suitable for Implementation Notes bsseq Sample-wise smoothing, then group differences via CpG-wise t-tests (p-value cutoff to define adjacent CpG sites as DMRs) WGBS; not designed for targeted BS-Seq or RRBS R package/ Bioconductor Outperforms Fisher’s exact test; intended to compare 2 groups; replicates required BiSeq Define CpG clusters, smooth methylation data, model and test group effect (fitting beta regression model to smoothed methylation levels and testing for group effect using the Wald test), hierarchical testing procedure on CpG clusters, then define DMR boundaries RRBS; targeted BS-Seq; for WGBS Very computationally intensive; Not limited to 2 groups MethylKit Models CpG methylation within a logistic regression. Sliding linear model (SLIM) to correct for multiple testing (e)RRBS R package * WGBS = whole genome BS-Seq; (e)RRBS = (enhanced) reduced representation BS-Seq

bsseq – for whole genome BS-Seq Smoothing of low coverage BS-Seq first to get reliable semi-local methylation estimation estimates Not suitable for captured or restricted data After smoothing it uses biological replicates to estimate biological variation and identify methylated regions (DMRs) Smoothing suitable for even a single sample Works for CpG context in humans, will probably not scale to 2x585M Cs in non-CG context

BSmooth algorithm black: 25x (Lister) pink: 4x (Lister)

Bsmooth t-values