University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 1 Some basic statistical concepts, statistics.

Slides:



Advertisements
Similar presentations
DATA & STATISTICS 101 Presented by Stu Nagourney NJDEP, OQA.
Advertisements

Estimation of Means and Proportions
Sampling: Final and Initial Sample Size Determination
Chap 8-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 8 Estimation: Single Population Statistics for Business and Economics.
Introduction to Statistics
Chapter 7 Introduction to Sampling Distributions
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.1 CorrelationCorrelation The underlying principle of correlation analysis.
Chapter 7 Sampling and Sampling Distributions
Sampling Distributions
Topic 2: Statistical Concepts and Market Returns
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 9: Hypothesis Tests for Means: One Sample.
Chapter 8 Estimation: Single Population
Introduction to Educational Statistics
Chapter Topics Confidence Interval Estimation for the Mean (s Known)
OMS 201 Review. Range The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of dispersion.
Part III: Inference Topic 6 Sampling and Sampling Distributions
Chapter 7 Estimation: Single Population
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
AM Recitation 2/10/11.
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/09/2015 7:46 PM 1 Two-sample comparisons Underlying principles.
Essentials of Marketing Research
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics.
Confidence Intervals (Chapter 8) Confidence Intervals for numerical data: –Standard deviation known –Standard deviation unknown Confidence Intervals for.
Estimation of Statistical Parameters
Lecture 14 Sections 7.1 – 7.2 Objectives:
ESTIMATION. STATISTICAL INFERENCE It is the procedure where inference about a population is made on the basis of the results obtained from a sample drawn.
© 2003 Prentice-Hall, Inc.Chap 6-1 Business Statistics: A First Course (3 rd Edition) Chapter 6 Sampling Distributions and Confidence Interval Estimation.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Estimates and Sample Sizes Lecture – 7.4
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Estimation PowerPoint Prepared by Alfred P. Rovai.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
PARAMETRIC STATISTICAL INFERENCE
Biostatistics: Measures of Central Tendency and Variance in Medical Laboratory Settings Module 5 1.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 10. Hypothesis Testing II: Single-Sample Hypothesis Tests: Establishing the Representativeness.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7.4 Estimation of a Population Mean  is unknown  This section presents.
Slide 1 © 2002 McGraw-Hill Australia, PPTs t/a Introductory Mathematics & Statistics for Business 4e by John S. Croucher 1 n Learning Objectives –Identify.
1 Estimation From Sample Data Chapter 08. Chapter 8 - Learning Objectives Explain the difference between a point and an interval estimate. Construct and.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 23/10/2015 9:22 PM 1 Two-sample comparisons Underlying principles.
Biostatistics Dr. Chenqi Lu Telephone: Office: 2309 GuangHua East Main Building.
Biostatistics Unit 5 – Samples. Sampling distributions Sampling distributions are important in the understanding of statistical inference. Probability.
Week 6 October 6-10 Four Mini-Lectures QMM 510 Fall 2014.
Confidence Intervals Lecture 3. Confidence Intervals for the Population Mean (or percentage) For studies with large samples, “approximately 95% of the.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-5 Estimating a Population Variance.
Confidence Interval Estimation For statistical inference in decision making:
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
CONFIDENCE INTERVALS.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7.4 Estimation of a Population Mean  is unknown  This section presents.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
Quality Control: Analysis Of Data Pawan Angra MS Division of Laboratory Systems Public Health Practice Program Office Centers for Disease Control and.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
© 2002 Prentice-Hall, Inc.Chap 8-1 Basic Business Statistics (8 th Edition) Chapter 8 Confidence Interval Estimation.
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Confidence Intervals. Point Estimate u A specific numerical value estimate of a parameter. u The best point estimate for the population mean is the sample.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
ESTIMATION.
Sampling Distributions and Estimation
Estimates and Sample Sizes Lecture – 7.4
Presentation transcript:

University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics and distributions Parameters and statistics Parametric versus non-parametric statistics Properties of statistics Some useful statistics The normal distribution The Student’s t distribution Confidence intervals for sample statistics Statistical power and experimental design

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 2 Concepts map

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 3 Parameters, statistics and estimators parameters characterize populations (which in general cannot be completely enumerated) statistics (estimators) are estimates of population parameters obtained from a finite sample (e.g., the sample mean is an estimate of the population mean) The process by which one obtains an estimate of a population parameter from a finite sample is called an estimation procedure. Population Sample

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 4 Parametric statistical analysis Estimating model parameters based on a finite sample and inferring from these estimates the values of the corresponding population parameters Therefore, parametric analysis requires relatively restrictive assumptions about the relationships between the sample and the population, i.e. about the distributions from which samples are drawn and the nature of the drawing (e.g., normal distributions and random sampling) X Y Sample Population Inference X

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 5 Non-parametric statistical analysis Calculation of model parameters based on a finite sample, but no inference to corresponding population parameters Therefore, non-parametric analysis requires relatively minimal assumptions about the relationships between the sample and the population (e.g. normal distributions of sampled variables not required) 

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 6 Properties of statistics Accuracy: an accurate statistic is one for which its value, averaged over samples from the same population, is “close” to the true population parameter. Sample Population X X Less accurate statistic More accurate statistic

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 7 Properties of statistics Precision: a precise statistic varies little among samples drawn from the same population. Sample Population X X Less precise statistic More precise statistic

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 8 Properties of statistics Consistency: the more consistent a statistic is, the faster it approaches the true population value as sample size increases. Sample Population X Less consistent More consistent X Sample size (N)

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 9 A comparison of some well-known statistics Frequency Range

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 10 Statistics: measures of central tendency mean: is easy to calculate and has a predictable distribution, but can be strongly influenced by outliers median (M): the value of a measured variable that has an equal number of observations both smaller and larger, and is less sensitive to outliers than the mean X Frequency M

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 11 Statistics of dispersion: the range range: defined by largest and smallest values in the sample It is a simple statistic, but is biased because it consistently underestimates the population (parametric) range. Frequency Population range Sample range

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 12 Dispersion Three frequency distributions with identical means and sample sizes but different dispersion patterns

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 13 Dispersion statistics: variance, standard deviation and the coefficient of variation Variance: average squared deviation from the mean Standard deviation: square root of the variance Coefficient of variation: standard deviation divided by the sample mean X 100

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 14 The normal distribution symmetric, bell-shaped distribution characterized by 2 parameters: (1) the mean  and (2) the variance  2 Probability X

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 15 The standard normal distribution obtained by scaling the distribution by converting observed values to standard normal deviates (Z- scores) resulting distribution has  = 0,  2 = 1 Probability Z Scaled (Z-transformed) Unscaled

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 16 The standard normal distribution 68% of the population within 1  of the mean 96% within 2  of the mean Z Probability  ± 1   ± 2 

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 17 Confidence intervals for observations the range of values in which X% of the observations from a population are expected to fall generally centred on the mean: for a normal population  ± Z  95.5% CI is  ± 2  but  and  are seldom known....

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 18 Confidence intervals for observations: estimation problems Replacing  and  by their sample estimates can lead to serious biases. Simulation: sample standard normal population and for each sample, calculate sample mean and variance. Then calculate CI based on sample mean and variance, and see what proportion of the true population fall within the CIs. Average 5% Proportion (%) of the population outide 95% CI N = Mean = 5% Number of trials

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 19 Confidence intervals for observations: estimation problems When sample size is large, estimated CIs are very close to true CIs. However, when sample size is small, estimated CIs are far too small.

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 20 Confidence intervals for observations: estimation problems Estimated CIs based on Z-scores approach true CIs as sample size increases, but, for small N, are highly biased (i.e. are smaller than they should be).

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 21 The Student’s t-distribution distribution of difference between sample mean and population mean divided by the standard error of the mean converges towards standard normal distribution when N is large more peaked and with longer tails at small N

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 22 Confidence intervals based on t-scores When sample size is small, calculate CIs by replacing Z with the critical value of the t distribution. This helps, but CIs are still too small when sample sizes are very small.

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 23 Confidence intervals for means interval that has a certain probability of including the value of the true mean of the population smaller than CI for observations Probability or Sample means Observations

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 24 Confidence intervals for the median If distribution is highly skewed, or sample size is very small, confidence intervals for the mean based on the t-distribution are very biased (underestimate true CI). As an alternative, calculate CI for median instead of the mean.

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 25 Confidence intervals for the median based on the binomial distribution b(x) with p = 0.5. Out of a sample of n = 10, what is the probability of obtaining only x = 1, 2, …n observations below the median? Because b(x) is discrete, confidence intervals won’t be exactly at the 1-  level. 1-  CI: what range of values would we expect the true population median to lie 100(1-  ) percent of the time? 97.86% CI for the median given by values 1 and 9, 89.08% CI for the median given by values 2 and Probability

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 26 Confidence intervals for the variance sample estimates of population variance are distributed like Chi-square with n - 1 degrees of freedom  2 or s 2 is distributed like Chi-square  2 (df = 5) Probability p =  = 0.05

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 27 Design of experiments How do I achieve a desired precision? How many times should I repeat the experiment to get “good” results? How many samples should I take if I want a precision (CV of the mean) of 5%? How to get a 99% confidence interval that is only n units wide? a goal (desired precision) estimate of dispersion (s 2 ) from a preliminary experiment, previous experience or a “guesstimate” What you wantWhat you need

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 28 Required sample size: an example Preliminary sample of N = 10 yields mean = 100 and S.D. = 25. You want a CI = 2, so that there is a 95% chance that the true parametric mean is within 1 of the sample mean. Answer: n = 2404 by iterative solution. On average, your precision will be about what you want, but about 50% of the time the calculated CI will be less than the true CI because you used s 2 instead of  2.