Canadian Bioinformatics Workshops www.bioinformatics.ca.

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Inferential Statistics
Confidence Interval and Hypothesis Testing for:
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Significance Testing Chapter 13 Victor Katch Kinesiology.
Multiple regression analysis
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
Topic 2: Statistical Concepts and Market Returns
Final Review Session.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Inferences About Process Quality
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Today Concepts underlying inferential statistics
Correlation and Regression Analysis
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Power and Sample Size IF IF the null hypothesis H 0 : μ = μ 0 is true, then we should expect a random sample mean to lie in its “acceptance region” with.
Week 9: QUANTITATIVE RESEARCH (3)
Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
Canadian Bioinformatics Workshops
AM Recitation 2/10/11.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Inference for regression - Simple linear regression
Hypothesis Testing:.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Hypothesis testing – mean differences between populations
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Education 793 Class Notes T-tests 29 October 2003.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Choosing and using statistics to test ecological hypotheses
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Essential Statistics in Biology: Getting the Numbers Right
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/09/2015 7:46 PM 1 Two-sample comparisons Underlying principles.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Special Topics 504: Practical Methods in Analyzing Animal Science Experiments The course is: Designed to help familiarize you with the most common methods.
Significance Tests: THE BASICS Could it happen by chance alone?
LECTURE 19 THURSDAY, 14 April STA 291 Spring
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
CHAPTERS HYPOTHESIS TESTING, AND DETERMINING AND INTERPRETING BETWEEN TWO VARIABLES.
Data Science and Big Data Analytics Chap 3: Data Analytics Using R
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Canadian Bioinformatics Workshops
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Hypothesis Testing: Hypotheses
Presentation transcript:

Canadian Bioinformatics Workshops

2Module #: Title of Module

Lecture 2 Univariate Analyses: Continuous Data MBP1010H Dr. Paul C. Boutros D EPARTMENT OF MEDICAL BIOPHYSICS This workshop includes material originally developed by Drs. Raphael Gottardo, Sohrab Shah, Boris Steipe and others † † Aegeus, King of Athens, consulting the Delphic Oracle. High Classical (~430 BCE)

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Course Overview Lecture 1: What is Statistics? Introduction to R Lecture 2: Univariate Analyses I: continuous Lecture 3: Univariate Analyses II: discrete Lecture 4: Multivariate Analyses I: specialized models Lecture 5: Multivariate Analyses II: general models Lecture 6: Data Visualization & Machine-Learning Lecture 7: Microarray Analysis I: Pre-Processing Lecture 8: Microarray Analysis II: Multiple-Testing Lecture 9: Sequence Analysis Basics Final Exam (written)

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca How Will You Be Graded? 9% Participation: 1% per week 56% Assignments: 4 x 8% + 1 x 24% 35% Final Examination: in-class For individual assignments you each get unique questions Assignments will all be in R, and will be graded largely according to computational correctness only (i.e. does your R script yield the correct result when run) Final Exam will include multiple-choice and written answers

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Group Assignment Groups will be pre-assigned Marking scheme coming shortly (from Brendan) You will be given a set of (publicly available) datasets, varying data-types and sizes. Your goal is to propose, execute and report an analysis on this dataset Anything analysis you want Datasets coming in ~2 weeks One-page proposals due in ~4 weeks Paper format for final report

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca House Rules Cell phones to silent No side conversations Hands up for questions Pay attention – I will randomly call on people during the course of each lecture State your name when asking/answering Qs please! Others?

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Attendance Thought

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Review From Last Week Population vs. Sample All MBP Students = Population MBP Students in 1010 = Sample How do you report statistical information? P-value, variance, effect-size, sample-size, test Why don’t we use Excel/spreadsheets? Spreadsheet errors, reproducibility, wrong results

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Topics For This Week Introduction to continuous data & probability distributions Slightly boring, but necessary! Attendance Common continuous univariate analyses Correlations ceRNAs

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Continuous vs. Discrete Data Definitions? Examples of discrete data in biological studies? Why does it matter in the first place? Areas of discrete mathematics: Combinatorics Graph Theory Discrete Probability Theory (Dice, Cards) Number Theory

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Exploring Data When teaching (or learning new procedures) we usually prefer to work with synthetic data. Synthetic data has the advantage that we know what the outcome of the analysis should be. Typically one would create values according to a function and then add noise. R has several functions to create sequences of values – or you can write your own... 0:10; seq(0, pi, 5*pi/180); rep(1:3, each=3, times=2); for (i in 1:10) { print(i*i); }

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Synthetic Data Function... Explore functions and noise. Noise... Noisy Function...

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Probability Distributions Normal distribution N(μ,σ 2 ) μ is the mean and σ 2 is the variance. Extremely important because of the Central Limit Theorem: if a random variable is the sum of a large number of small random variables, it will be normally distributed. x <- seq(-4, 4, 0.1) f <- dnorm(x, mean=0, sd=1) plot(x, f, xlab="x", ylab="density", lwd=5, type="l") The area under the curve is the probability of observing a value between 0 and 2.

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Probability Distributions Normal distribution N(μ,σ 2 ) μ is the mean and σ 2 is the variance. Extremely important because of the Central Limit Theorem: if a random variable is the sum of a large number of small random variables, it will be normally distributed. x <- seq(-4, 4, 0.1) f <- dnorm(x, mean=0, sd=1) plot(x, f, xlab="x", ylab="density", lwd=5, type="l") The area under the curve is the probability of observing a value between 0 and 2. Task: Explore line parameters

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Probability Distributions Random sampling: Generate 100 observations from a N(0,1) set.seed(100) x <- rnorm(100, mean=0, sd=1) hist(x) lines(seq(-3,3,0.1),50*dnorm(seq(-3,3,0.1)), col="red") Histograms can be used to estimate densities!

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Quantiles (Theoretical) Quantiles: The p-quantile has the property that there is a probability p of getting a value less than or equal to it. The 50% quantile is called the median. 90% of the probability (area under the curve) is to the left of the red vertical line. q90 <- qnorm(0.90, mean = 0, sd = 1); x <- seq(-4, 4, 0.1); f <- dnorm(x, mean = 0, sd = 1); plot(x, f, xlab = "x", ylab = "density", type = "l", lwd = 5); abline(v = q90, col = 2, lwd = 5);

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Descriptive Statistics Empirical Quantiles: The p-quantile has the property that p% of the observations are less than or equal to it. Empirical quantiles can be easily obtained in R. > set.seed(100); > x <- rnorm(100, mean = 0, sd = 1); > quantile(x); 0% 25% 50% 75% 100% > quantile(x, probs = c(0.1, 0.2, 0.9)); 10% 20% 90%

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Descriptive Statistics We often need to quickly 'quantify' a data set, and this can be done using a set of summary statistics (mean, median, variance, standard deviation). > mean(x); [1] > median(x); [1] > IQR(x); [1] > var(x); [1] > summary(x); Min. 1st Qu. Median Mean 3rd Qu. Max Exercise: what are the units of variance and standard deviation?

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Boxplot Descriptive statistics can be intuitively summarized in a Boxplot. > boxplot(x) IQR 1.5 x IQR Everything above and below 1.5 x IQR is considered an "outlier". 75% quantile Median 25% quantile IQR = Inter-Quantile Range = 75% quantile – 25% quantile

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Violinplot Internal structure of a data-vector can be made visible in a violin plot. The principle is the same as for a boxplot, but a width is calculated from a smoothed histogram. p <- ggplot(X, aes(1,x)) p + geom_violin()

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca plotting data in R Task: Explore types of plots.

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca QQ–plot One of the first things we may ask about data is whether it deviates from an expectation e.g. to be normally distributed. The quantile-quantile plot provides a way to visually verify this. The QQ-plot shows the theoretical quantiles versus the empirical quantiles. If the distribution assumed (theoretical one) is indeed the correct one, we should observe a straight line. R provides qqnorm() and qqplot().

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca QQ–plot: sample vs. Normal Only valid for the normal distribution! qqnorm(x) qqline(x, col=2)

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca QQ–plot: sample vs. Normal Clearly the t distribution with two degrees of freedom is not Normal. set.seed(100) t <- rt(100, df=2) qqnorm(t) qqline(t, col=2)

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca QQ–plot set.seed(101) generateVariates <- function(n) { Nvar < Vout <- c() for (i in 1:n) { x <- runif(Nvar, -0.01, 0.01) Vout <- c(Vout, sum(x) ) } return(Vout) } x <- generateVariates(1000) y <- rnorm(1000, mean=0, sd=1) qqnorm(x) qqline(x, y, col=2) Verify the CLT.

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca QQ–plot: sample vs. sample Comparing two samples: are their distributions the same?... or... compare a sample vs. a synthetic dataset. set.seed(100) x <- rt(100, df=2) y <- rnorm(100, mean=0, sd=1) qqplot(x, y) Exercise: try different values of df for rt() and compare the vectors.

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Boxplots The boxplot function can be used to display several variables at a time. boxplot(gvhdCD3p) Exercise: Interpret this plot.

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Hypothesis Testing Hypothesis testing is confirmatory data analysis, in contrast to exploratory data analysis. Null – and Alternative Hypothesis Region of acceptance / rejection and critical value Error types p - value Significance level Power of a test (1 - false negative) Concepts:

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Null Hypothesis / Alternative Hypothesis The null hypothesis H 0 states that nothing of consequence is apparent in the data distribution. The data corresponds to our expectation. We learn nothing new. The alternative hypothesis H 1 states that some effect is apparent in the data distribution. The data is different from our expectation. We need to account for something new. Not in all cases will this result in a new model, but a new model always begins with the observation that the old model is inadequate. Don’t think about this too much!

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Test types A Z–test compares a sample mean with a normal distribution.... common types of tests A t–test compares a sample mean with a t- distribution and thus relaxes the requirements on normality for the sample. Chi–squared tests analyze whether samples are drawn from the same distribution. F-tests analyze the variance of populations (ANOVA). Nonparametric tests can be applied if we have no reasonable model from which to derive a distribution for the null hypothesis.

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Error Types Decision Truth H0H0 H1H1 Accept H 0 Reject H 0   1 -  1 -  "False positive" "False negative" "Type I error" "Type II error" “Power” “Sensitivity”

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Type I vs. Type II Errors

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca what is a p–value? a)A measure of how much evidence we have against the alternative hypothesis. b)The probability of making a false-positive. c)Something that biologists want to be below d)The probability of observing a value as extreme or more extreme by chance alone. e) All of the above.

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Distributional Assumptions A parametric test makes assumptions about the underlying distribution of the data. A non-parametric test makes no assumptions about the underlying distribution, but may make other assumptions!

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Most Common Statistical Test: The T-Test A Z–test compares a sample mean with a normal distribution. A t–test compares a sample mean with a t-distribution and thus relaxes the requirements on normality for the sample. Nonparametric tests can be applied if we have no reasonable model from which to derive a distribution for the null hypothesis. One-Sample vs. Two-Sample One-Sided vs. Two-Sided Heteroscedastic vs. Homoscedastic

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Two-Sample t–test Test if the means of two distributions are the same. The datasets y i 1,..., y i n are independent and normally distributed with mean μ i and variance σ 2, N (μ i,σ 2 ), where i=1,2. In addition, we assume that the data in the two groups are independent and that the variance is the same.

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca two–sample t–test

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca t–test assumptions Normality: The data need to be sampled from a normal distribution. If not, one can use a transformation or a non-parametric test. If the sample size is large enough (n>30), the t-test will work just fine (CLT). Independence: Usually satisfied. If not independent, more complex modeling is required. Independence between groups: In the two sample t- test, the groups need to be independent. If not, one can sometimes use a paired t-test instead. Equal variances: If the variances are not equal in the two groups, use Welch's t-test (default in R). How Do We Test These?

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca non–parametric tests Non-parametric tests constitute a flexible alternative to t-tests if you don't have a model of the distribution. In cases where a parametric test would be appropriate, non-parametric tests have less power. Several non parametric alternatives exist e.g. the Wilcoxon and Mann-Whitney tests.

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Wilcoxon test principle set.seed(53); n <- 25; M <- matrix(nrow = n+n, ncol = 2); for (i in 1:n) { M[i,1] <- rnorm(1, 10, 1); M[i,2] <- 1; M[i+n,1] <- rnorm(1, 11, 1); M[i+n,2] <- 2; } plot(M[,1], col = M[,2]); Consider two random distributions with 25 samples each and slightly different means.

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Wilcoxon test principle o <- order(M[,1]); plot(M[o,1], col = M[o,2]); For each observation in a, count the number of observations in b that have a smaller rank. The sum of these counts is the test statistic. wilcox.test(M[1:n,1], M[(1:n)+n,1]);

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Flow-Chart For Two-Sample Tests Is Data Sampled From a Normally-Distributed Population? No Sufficient n for CLT (>30)? Yes Equal Variance (F-Test)? Yes Homoscedastic T-Test Heteroscedastic T-Test Yes No Wilcoxon U-Test No

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Power, error rates and decision > power.t.test(n = 5, delta = 1, sd = 2, alternative = "two.sided", type = "one.sample"); One-sample t test power calculation n = 5 delta = 1 sd = 2 sig.level = 0.05 power = alternative = two.sided Power calculation in R: Other tests are available – see ??power.

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Power, error rates and decision PR(False Positive) PR(Type I error) μ0μ0 μ1μ1 PR(False Negative) PR(Type II error) Let’s Try Some Power Analyses in R

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Problem When we measure more one than one variable for each member of a population, a scatter plot may show us that the values are not completely independent: there is a trend for one variable to increase as the other increases. Regression analyzes the dependence. Examples: Height vs. weight Gene dosage vs. expression level Survival analysis: probability of death vs. age

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Correlation When one variable depends on the other, the variables are to some degree correlated. NB: correlation ≠ causation In R, the function cov() measures covariance and cor() measures the Pearson coefficient of correlation (a normalized measure of covariance). Pearson's coefficient of correlation values range from -1 to 1, with 0 indicating no correlation.

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Pearson's Coefficient of Correlation > x<-rnorm(50) > r <- 0.99; > y <- (r * x) + ((1-r) * rnorm(50)); > plot(x,y); cor(x,y) [1] How to interpret the correlation coefficient: Explore varying degrees of randomness...

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Pearson's Coefficient of Correlation Varying degrees of randomness... > x<-rnorm(50); > r <- 0.8; > y <- (r * x) + ((1-r) * rnorm(50)); > plot(x,y); cor(x,y) [1]

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Pearson's Coefficient of Correlation Varying degrees of randomness... > x<-rnorm(50); > r <- 0.4; > y <- (r * x) + ((1-r) * rnorm(50)); > plot(x,y); cor(x,y) [1]

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Pearson's Coefficient of Correlation > x<-rnorm(50); > r <- 0.01; > y <- (r * x) + ((1-r) * rnorm(50)); > plot(x,y); cor(x,y) [1] Varying degrees of randomness...

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Pearson's Coefficient of Correlation Non-linear relationships... > x<-runif(50,-1,1); > r <- 0.9; > # periodic... > y <- (r * cos(x*pi)) + ((1-r) * rnorm(50)); > plot(x,y); cor(x,y) [1]

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Pearson's Coefficient of Correlation Non-linear relationships... > x<-runif(50,-1,1); > r <- 0.9; > # polynomial... > y <- (r * x*x) + ((1-r) * rnorm(50)); > plot(x,y); cor(x,y) [1]

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Pearson's Coefficient of Correlation Non-linear relationships... > x<-runif(50,-1,1); > r <- 0.9; > # exponential > y <- (r * exp(5*x)) + ((1-r) * rnorm(50)); > plot(x,y); cor(x,y) [1]

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Pearson's Coefficient of Correlation Non-linear relationships... > x<-runif(50,-1,1); > r <- 0.9; > # circular... > a <- (r * cos(x*pi)) + ((1-r) * rnorm(50)); > b <- (r * sin(x*pi)) + ((1-r) * rnorm(50)); > plot(a,b); cor(a,b) [1]

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Correlation coefficient

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca When Do We Use Statistics? Ubiquitous in modern biology Every class I will show a use of statistics in a recent paper January 9, 2014

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Non-Small Cell Lung Cancer 101 Lung Cancer Non-Small Cell Small Cell Large Cell (and others) Squamous Cell Carcinomas Adenocarcinomas 80% of lung cancer 15% 5-year survival

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Non-Small Cell Lung Cancer 102 Stage I Stage II Stage III Local Tumour Only Local Lymph Nodes Distal Lymph Nodes IA = small tumour; IB = large tumour Stage IV Metastasis

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca General Idea: HMGA2 is a ceRNA What are ceRNAs? Salmena et al. Cell 2011

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Test Multiple Constructs for Activity

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca What Statistical Analysis Did They Do? No information given in main text! Figure legend says: “Values are technical triplicates, have been performed independently three times, and represent mean +/- standard deviation (s.d.) with propagated error.” In supplementary they say: “Unless otherwise specified, statistical significance was assessed by the Student’s t-test” So, what would you do differently?

Lecture 2: Univariate Analyses I: Continuous Data bioinformatics.ca Course Overview Lecture 1: What is Statistics? Introduction to R Lecture 2: Univariate Analyses I: continuous Lecture 3: Univariate Analyses II: discrete Lecture 4: Multivariate Analyses I: specialized models Lecture 5: Multivariate Analyses II: general models Lecture 6: Data Visualization & Machine-Learning Lecture 7: Microarray Analysis I: Pre-Processing Lecture 8: Microarray Analysis II: Multiple-Testing Lecture 9: Sequence Analysis Basics Final Exam (written)