Download presentation
Presentation is loading. Please wait.
Published byJosephine Atkins Modified over 9 years ago
1
Canadian Bioinformatics Workshops www.bioinformatics.ca
2
2Module #: Title of Module
3
Lecture 5 Multivariate Analyses II: General Models MBP1010 Dr. Paul C. Boutros Winter 2015 D EPARTMENT OF MEDICAL BIOPHYSICS This workshop includes material originally developed by Drs. Raphael Gottardo, Sohrab Shah, Boris Steipe and others † † Aegeus, King of Athens, consulting the Delphic Oracle. High Classical (~430 BCE)
4
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Course Overview Lecture 1: What is Statistics? Introduction to R Lecture 2: Univariate Analyses I: continuous Lecture 3: Univariate Analyses II: discrete Lecture 4: Multivariate Analyses I: specialized models Lecture 5: Multivariate Analyses II: general models Lecture 6: Machine-Learning Lecture 7: Sequence Analysis Lecture 8: Microarray Analysis I: Pre-Processing Lecture 9: Microarray Analysis II: Multiple-Testing Final Exam (written)
5
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca House Rules Cell phones to silent No side conversations Hands up for questions
6
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Topics For This Week Review to date Examples Assignments Attendance More on Multivariate Models
7
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Review From Lecture #2 How can you interpret a QQ plot? Compares two samples or a sample and a distribution. Straight line indicates identity. What is hypothesis testing? Confirmatory data-analysis; test null hypothesis What is a p-value? Evidence against null; probability of FP, probability of seeing as extreme a value by chance alone
8
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Review From Lecture #2 Parametric vs. non-parametric tests Parametric tests have distributional assumptions What is the t-statistic? Signal:Noise ratio Assumptions of the t-test? Data sampled from normal distribution; independence of replicates; independence of groups; homoscedasticity
9
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Flow-Chart For Two-Sample Tests Is Data Sampled From a Normally-Distributed Population? No Sufficient n for CLT (>30)? Yes Equal Variance (F-Test)? Yes Homoscedastic T-Test Heteroscedastic T-Test Yes No Wilcoxon U-Test No
10
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Review From Lecture #3 What is statistical power? Probability a test will incorrect reject the null AKA sensitivity or 1- false-negative rate What is a correlation? A relationship between two (random) variables Common correlation metrics? Pearson, Spearman, Kendall
11
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Lecture #3 Review Hypergeometric test Is a sample randomly selected from a fixed population? Proportion test Are two proportions equivalent? Fisher’s Exact test Are two binary classifications associated? (Pearson’s) Chi-Squared Test Are paired observations on two variables independent?
12
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Example #1 You are conducting a study of osteosarcomas using mouse models. You are using a strain of mice that is naturally susceptible to these tumours at a frequency of ~20%. You are studying two transgenic lines, one of which has a deletion of a putative tumour suppressor (TS), the other of which has an amplification of a putative oncogene (OG). Tumour penetrance in these two lines is 100%. Your hypothesis: tumours in mice lacking TS will be smaller than those in mice with amplification of OG, as assessed by post-mortem volume measurements of the primary tumour. Your data: TS (cm 3 ) 3.9 7.1 3.1 4.4 5.0 OG (cm 3 ) 5.2 1.9 5.0 6.1 4.5 4.8
13
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Example #2 You are conducting a study of osteosarcomas using mouse models. You are using a strain of mice that is naturally susceptible to these tumours at a frequency of ~20%. You are studying two transgenic lines, one of which has a deletion of a putative tumour suppressor (TS), the other of which has an amplification of a putative oncogene (OG). Tumour penetrance in these two lines is 100%. Your hypothesis: mice lacking TS will acquire tumours sooner than mice with amplification of OG. You test the mice weekly using ultrasound imaging. Your data: TS (week of tumour) 4 2 5 4 OG (week of tumour) 3 6 3 2 4 3
14
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Example #3 You are conducting a study of osteosarcomas using mouse models. You are using a strain of mice that is naturally susceptible to these tumours at a frequency of ~20%. You are studying two transgenic lines, one of which has a deletion of a putative tumour suppressor (TS), the other of which has an amplification of a putative oncogene (OG). Tumour penetrance in these two lines is 100%. Your hypothesis: mice lacking TS are less likely to respond to a novel targeted therapeutic (DX) than those with amplification of OG as assessed by a trained pathologist: TS (pathological response) Yes No Yes No OG (pathological response) Yes No Yes
15
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Example #4 You are conducting a study of osteosarcomas using mouse models. You are using a strain of mice that is naturally susceptible to these tumours at a frequency of ~20%. You are studying two transgenic lines, one of which has a deletion of a putative tumour suppressor (TS), the other of which has an amplification of a putative oncogene (OG). Based on your previous data, you now hypothesize that mice lacking TS will show a similar molecular response to DX as those with amplification of OG. You use microarrays to study 20,000 genes in each line, and identify the following genes as changed between drug-treated and vehicle-treated: TS (DX-responsive genes) MYCKRASCD53 CDH1FBW1SEPT7 MUC1MUC3MUC9 RNF3 OG (DX-responsive genes) MYCKRASCD53 CDH1MUC1MARCH1 PTENIDH3ESR2 RHEBCTCFSTK11 MLL3KEAP1NFE2L2 ARID1A
16
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Review From Lecture #4 Assumptions of linear-modeling One variable is a response and one a predictor No adjustment is needed for confounding or other between-subject variation Linearity σ 2 is constant, independent of x Predictors are independent of each other For proper statistical inference (CI, p-values), errors are normally distributed
17
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Review From Lecture #4 How do we assess the adequacy of a model? By considering the size of the residuals (R 2 ) How can we test the quality of a model? Residual plots; qq plots; prediction accuracy Compare a one-way ANOVA to a logistic regression Linear model where x is factorial vs. one where y is factorial
18
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Lots of Analyses Are Linear Regressions Y = a 0 + a 1 x 1 x 1 continuous Y = a 0 + a 1 x 1 Y factorial Linear Regression Logistic Regression Y = a 0 + a 1 x 1 x 1 factorial 1-way ANOVA
19
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Quick Thoughts on Assignment Code Tip #1: avoid reserved words data Tip #3: consistent indentation readability Tip #2: take advantage of file- handling arguments Shorter code
20
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Attendance Break
21
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca When Do We Use Statistics? Ubiquitous in modern biology Every class I will show a use of statistics in a (very, very) recent Nature paper. Advance Online Publication
22
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Cervix Cancer 101 Diesease burden increasing (~380k to ~450k in the last 30 years) By age 50, >80% of women have HPV infection >75% of sexually active women exposed, only a subset affected Why is nearly totally unknown! Tightly Associated with Poverty
23
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca HPV Infection Associated Multiple Cancers Cervix>99% Anal~85% Vaginal~70% Vulvar~40% Penile~45% Head & Neck~20-30% Of course not all of these are the HPV subtypes caught by current vaccines, but a majority are. Thus many cancers are preventable.
24
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Figure 1 is a Classic Sequencing Figure Mutation rate vs. histology
25
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca But Histology Is Associated With Age
26
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Age Is Associated With Mutation Rate R 2 = 0.08; p = 0.005 Is this meaningful? 4.2/Mbp1.6/Mbp P(Wilcoxon) = 0.0095
27
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Perhaps Not in Isolation But...
28
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca The Solution: Linear Regression Mutation Rate = a 0 + x 1 a 1 + x 2 a 2 x 1 = histology indicator (adeno = 1; squam = 0) x 2 = age in years (continuous) Mutation Rate = 0.259 - 0.145x 1 + 0.006x 2 P(a 1 ≠ 0) = 0.045 P(a 2 ≠ 0) = 0.012
29
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca General Linear Modeling The underlying mathematical framework for most statistical techniques we are familiar with: ANOVAs Logistic regression Linear regression Multiple regression Y = a 0 + a 1 x 1 + a 2 x 2 + … NOT the same as a “Generalized Linear Model”!!!
30
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca General Linear Modeling: Special Cases Y = a 0 + a 1 x 1 x 1 continuous Y = a 0 + a 1 x 1 Y factorial Linear Regression Logistic Regression Y = a 0 + a 1 x 1 + a 2 x 2 Multiple Regression x 1,x 2 continuous
31
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca ANOVAs Y = a 0 + a 1 x 1 x 1 factorial Y = a 0 + a 1 x 1 + a 2 x 2 + a 3 x 1 x 2 1-way ANOVA x 1 x 2 two-level factors 2-way ANOVA
32
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca ANOVA Experimental Designs Are Common Classic one-way ANOVAs: Treat a cell-line with 5 drugs – do any of them make a difference? Make 5 different genetic mutations – do any of them alter gene- expression? H 0 : the mean of at least one group differs Guesses at the assumptions?
33
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Assumptions Are Similar to T-test Normal distribution for the dependent variable Samples are independent Homoscedasticity Independent variables are: Not correlated Random normal variables
34
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca But This Is Limited A 1-way ANOVA just says that one group differs Which one post hoc tests Often hard to know which post hoc test to use, often worth consulting a statistician here
35
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Sometimes 1-Way ANOVAs are not worth the Effort Wildtype Mutation 1 Mutation 2 1-way ANOVA + post hoc Or 2 t-tests?
36
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Not Always Testing Raw Data Vehicle 1Drug 1 Drug 2 3 drugs with different controls 1-way ANOVA on the fold-changes Vehicle 2 Drug 3 Vehicle 3
37
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Two-Way ANOVAs Probably even more common than one-way ANOVAs Very powerful: Synergy? Additivity? Antagonism? Y = a 0 + a 1 x 1 + a 2 x 2 + a 3 x 1 x 2 Assumptions?
38
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Assumptions Are Similar to 1-Way ANOVA Normal distribution for the dependent variable Samples are independent Homoscedasticity Independent variables are: Not correlated Random normal variables
39
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Do these treatments interact? Standard approach: ANOVA Treatment #1 Treatment #2 Interaction
40
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Example: Radiation Toxicity Some people are prone to late-stage radio-toxicity Does radiation induce specific patterns of gene-expression in these people? RadiationRadio-Sensitive 0 Gy 3 Gy
41
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Two-Way ANOVAs in R Standard model-fitting uses the lm() function For microarray and –omic analyses, the limma package is one very good approach for this (covered over the next few weeks)
42
Lecture 5: Multivariate Analyses II: General Cases bioinformatics.ca Course Overview Lecture 1: What is Statistics? Introduction to R Lecture 2: Univariate Analyses I: continuous Lecture 3: Univariate Analyses II: discrete Lecture 4: Multivariate Analyses I: specialized models Lecture 5: Multivariate Analyses II: general models Lecture 6: Machine-Learning Lecture 7: Microarray Analysis I: Pre-Processing Lecture 8: Microarray Analysis II: Multiple-Testing Lecture 9: Sequence Analysis Final Exam (written)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.