Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab.

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Psychology: A Modular Approach to Mind and Behavior, Tenth Edition, Dennis Coon Appendix Appendix: Behavioral Statistics.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Table of Contents Exit Appendix Behavioral Statistics.
Measures of Dispersion
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
Measures of Central Tendency. Central Tendency “Values that describe the middle, or central, characteristics of a set of data” Terms used to describe.
Applied statistics Katrin Jaedicke
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Mutual Information Mathematical Biology Seminar
Differentially expressed genes
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Biol 500: basic statistics
Outline What is cancer? How do people know they have cancer?
Today Concepts underlying inferential statistics
PCR Application: Can Breast Cancer be Cured?. Normal, Healthy Cells Cells can change or differentiate to become specialised according to the tissue that.
Chapter 14 Inferential Data Analysis
Inferential Statistics
Chemometrics Method comparison
Method Comparison A method comparison is done when: A lab is considering performing an assay they have not performed previously or Performing an assay.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Quantitative Skills: Data Analysis and Graphing.
Data Collection & Processing Hand Grip Strength P textbook.
Chapter 3 – Descriptive Statistics
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
Mitosis Cell Division E. McIntyre. In The Beginning One  Most of the organisms we see started out as one cell  Humans start out as a single cell, the.
STEM Fair Graphs & Statistical Analysis. Objectives: – Today I will be able to: Construct an appropriate graph for my STEM fair data Evaluate the statistical.
Quantitative Skills 1: Graphing
Biostatistics: Measures of Central Tendency and Variance in Medical Laboratory Settings Module 5 1.
PSY2004 Research Methods PSY2005 Applied Research Methods Week Eleven Stephen Nunn.
Research & Statistics Looking for Conclusions. Statistics Mathematics is used to organize, summarize, and interpret mathematical data 2 types of statistics.
Quality Control Lecture 5
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
VII-1 Stratification Case study to illustrate alternative methods to stratify a sampling frame Dr. Will Yancey, CPA This material is the property of the.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Comp. Genomics Recitation 3 The statistics of database searching.
Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Numerical Measures of Variability
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
RESEARCH & DATA ANALYSIS
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
The Broad Institute of MIT and Harvard Differential Analysis.
Shankar Subramaniam University of California at San Diego Data to Biology.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Tutorial I: Missing Value Analysis
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
CELL CYCLE & DIVISION Chapter 10. Cell Cycle Series of 4 ordered steps that result in duplication (copy) of the cell. When is it done? grow, repair, &
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
And distribution of sample means
Clinical practice involves measuring quantities for a variety of purposes, such as: aiding diagnosis, predicting future patient outcomes, serving as endpoints.
MEASURES OF CENTRAL TENDENCY Central tendency means average performance, while dispersion of a data is how it spreads from a central tendency. He measures.
1.3 Measuring Center & Spread, The Five Number Summary & Boxplots
Understanding Results
Measures of Central Tendency
Using statistics to evaluate your test Gerard Seinhorst
Sam Gordji, Obtaining PD ,PFA , and Other Important Parameters For Pipeline Leak Detection System Sam Gordji,
An Introduction to Correlational Research
Higher National Certificate in Engineering
Sam Gordji, Obtaining PD ,PFA , and Other Important Parameters For Pipeline Leak Detection System Sam Gordji,
Presentation transcript:

Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Nijman Lab

Working on specialised target-oriented cancer therapies Cancer = cell mutation Drug Mutation Drug

Motivation Testing various drugs on various mutated cells 100 drugs vs 100 mutations = interactions Analyse the generated data to find new treatments

Overview Background –Biological Background –Technical Procedure –Initial State –Special Aspects –Previous Approach Analysis –Explorative Data Analysis –Drug Noisiness Data generation

Overview Hit detection –Statistical Methods –Filtering Methods –The Algorithm –Evaluation of the result

Biological Background Idea behind cancer treatment –Kill cancer cells while leaving normal cells alive Common chemotherapies –Kill cells with higher division rate –Problem: moth-, throat-, bowel-mucosa and hair cells –Feel sick, loosing hair etc.

Biological Background Synthetic lethality approach –Some biochemical process which are necessary for cell growth are redundant –e.g. DNA repair –Biochemical processes are chained = “protein pathway”

Protein pathways Protein A Protein B Protein C Cell growth Drug Gene

Synthetic lethality Choose a cancer which has a mutation of a gene in one of that pathways Find a drug which inhibits the other pathway

Synthetic lethality Produce cells with mutations which are normally present in cancer Find drug Possible that this will work in real cancer –Tumours have more than one mutation  can influence each other

Technical Procedure Standard dataset consists of interactions 96 drugs x 100 mutations x 4 Testing would be inefficient

Technical Procedure Idea: Testing different cell lines in one well  384 wells

Before the experiment

After the experiment Copy the barcodes of the cells by a polymerase chain reaction (PCR)  amplifies the signal Adding a vitamin to the barcode which can stick on a dye-containing protein Amount of barcode correlates with the amount of remaining cells

After the experiment

Allocation Red and infrared emitted light  barcode  mutation Green reflected light  cell amount –Arbitrary unit which correlates with the cell amount –Called “Reporter” Drug  because of the used well

Initial state Because drugs are dissolved in a dilution, we can use wells without drugs  use as control

Back to statistics....

Special Aspects Biological and technical factors cause noisy and not directly usable data  Inter- and intraindividual variability

Interindividual Variability Variability between observation units Cells with the same mutation = one observation unit = “one virtual cancer patient” Variation among different mutated cells Reasons –Mutations can be toxic itself –Characteristics of the technical process

Interindividual Variability Average amount of remaining mutations

Variability of Technical Procedure Limited precision –Precision of drug dosing –Precision of cell amount –Quality of the measurement equipment Decreased sensitivity to a lower signal –Detection limit –Killed cells don’t get a zero signal  background noise with different variability

Variability of Technical Procedure Amplification problems –Copying the barcodes by PCR needs material –If some cell lines are completely killed  more material for other cell lines  higher amplification of survived cells

Amplification Problems

Previous Approach Visual method, based on scatter plots Identify outliers visually

Previous Approach 1.Calculating the effect 1.Median normalization of drugs 2.Calculate a relative ratio

Plotting the ratio against the median of a mutation Previous Approach

There are some problems.... If two lines overlap, hits can be obscured No comparable value that estimates the significance of outliers Intraindividual variability referred to replicates is ignored Human errors  outlier-detection is subjective Slow, not automatable method

Overview Background –Biological Background –Technical Procedure –Initial State –Special Aspects –Previous Approach Analysis –Explorative Data Analysis –Drug Noisiness

Explorative Data Analysis Necessary for hit detection Analysis of the behaviour of the data Closer look at –Distribution of mutations –Variability of mutations and replicates –Skewness of mutations –Noisiness of Drugs

Distribution of Mutations Choosing the right statistical test Test will be applied on mutations to see which drug works best Effect is point of interest  Matrix of relative ratios

Variability of Mutations Decreased sensitivity to lower signal Maybe a detection limit Spread vs Level plot

Replicate Variability Important factor is the multiple testing of cells by the same drugs. Indicator for accurateness and reproducibility of the technical procedure.

Skewness of Mutations Another indicator for different behaviour below the threshold Right skewed distributions because of background noise in lower signal

Drug Noisiness Nothing to do with background noise Caused by technical procedure –Overdosing of cells or drugs –Toxicity (“Dosis facit venenum“) Different effect –Strong resistance –Strong sensitivity

Amplification Problems

Strong Noisiness Easy to identify Dedicated outliers High amount of false positive hits Idea: Noisiness causes weak correlation to the control

Weak Noisiness Also numerous differences in sensitivity or resistance Contrast to normal drugs is not well defined Visual methods failed Also a lot of false positive hits

Strong Noisiness vs Weak Noisiness

Overview Hit detection –Statistical Methods –Filtering Methods –The Algorithm –Evaluation of the result

Hit detection Definition of a Hit –Indicate synthetic lethality –Resistance is also interesting from a biological point of view –Not noisy 2 Stages: 1.Finding potential hits 2.Filtering false-positive hits and incomparable data

Statistical Test Mutations not normally distributed Compare the 4 replicates to their mutation Mann-Whithney u-test –Compares two medians –Needs approximately identical distribution form of random variables X and Y –No symmetry or normal distribution needed

Statistical Test Disadvantages –Rank-sum tests are based on the order, not on the magnitudes –Weak outlying interactions get the same p-values as strong outliers –P-values are not interindividual comparable, but the significance is an indicator for it. –Strong noisy drugs are usually extreme outliers  reduce the significance

Multiple testing Multiple testing of interactions against their mutations Increases the error 100 different interactions =

Multiple testing Bonferroni correction needed How to achieve significant results? –Calculate the median of replicates –Testing just the upper and lower 10% of the data

Filtering Drugs Filtering strong noisy drugs by correlation coefficient Filter before the test to increase the significance Note: Drugs shouldn’t be filtered automatically, just identified. If drugs are toxic or not is the decision of a biologist

Filtering strong noisy drugs

Filtering weak noisy drugs Much harder to identify Idea: Weak noisy drugs producing many false- positive hits with high significance –Calculating p-value –Order by significance –Frequency of drugs in the top hits is an indicator for weak noisiness

Top Drugs

Filter Mutations Filter data below a detection limit Ideas Filter by threshold: 30% of the data  just one dataset  no universal validity of the threshold about 250 Filter by skewness: 17% of the data Filter by variationcoefficient 12%

Threshold Estimation Idea: Modification of skewness filter method Outliers of skewness are below the threshold Last non-outlier above the skewness outliers are normal data Threshold should be approximately in the middle of these points

The Algorithm R-Demo

Results