Sampling and Confidence Interval Kenneth Kwan Ho Chui, PhD, MPH Department of Public Health and Community Medicine 617.636.0853.

Slides:



Advertisements
Similar presentations
Sampling Distributions (§ )
Advertisements

Objectives Look at Central Limit Theorem Sampling distribution of the mean.
Sampling Distributions
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
Inferences About Means of Single Samples Chapter 10 Homework: 1-6.
Methods and Measurement in Psychology. Statistics THE DESCRIPTION, ORGANIZATION AND INTERPRATATION OF DATA.
Chapter Sampling Distributions and Hypothesis Testing.
Inference about a Mean Part II
Understanding sample survey data
The Sampling Distribution Introduction to Hypothesis Testing and Interval Estimation.
Standard error of estimate & Confidence interval.
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Confidence Intervals. Estimating the difference due to error that we can expect between sample statistics and the population parameter.
AM Recitation 2/10/11.
Statistical Inference: Which Statistical Test To Use? Pınar Ay, MD, MPH Marmara University School of Medicine Department of Public Health
1/2555 สมศักดิ์ ศิวดำรงพงศ์
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
Chapter 11: Estimation Estimation Defined Confidence Levels
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Sampling and Confidence Interval
Estimation of Statistical Parameters
ESTIMATION. STATISTICAL INFERENCE It is the procedure where inference about a population is made on the basis of the results obtained from a sample drawn.
1 Introduction to Estimation Chapter Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the.
Introduction to Statistical Inference Chapter 11 Announcement: Read chapter 12 to page 299.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Smith/Davis (c) 2005 Prentice Hall Chapter Six Summarizing and Comparing Data: Measures of Variation, Distribution of Means and the Standard Error of the.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Psychology 101. Statistics THE DESCRIPTION, ORGANIZATION AND INTERPRATATION OF DATA.
Sampling Error.  When we take a sample, our results will not exactly equal the correct results for the whole population. That is, our results will be.
Research Ethics:. Ethics in psychological research: History of Ethics and Research – WWII, Nuremberg, UN, Human and Animal rights Today - Tri-Council.
Confidence Intervals Lecture 3. Confidence Intervals for the Population Mean (or percentage) For studies with large samples, “approximately 95% of the.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
1 Outline 1. Why do we need statistics? 2. Descriptive statistics 3. Inferential statistics 4. Measurement scales 5. Frequency distributions 6. Z scores.
Descriptive Statistics for one variable. Statistics has two major chapters: Descriptive Statistics Inferential statistics.
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
1 Probability and Statistics Confidence Intervals.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
Measures of Central Tendency (MCT) 1. Describe how MCT describe data 2. Explain mean, median & mode 3. Explain sample means 4. Explain “deviations around.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
THE NORMAL DISTRIBUTION
Dr.Theingi Community Medicine
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
And distribution of sample means
Doc.RNDr.Iveta Bedáňová, Ph.D.
Sampling Distributions and Estimation
This Week Review of estimation and hypothesis testing
Frequency and Distribution
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Chapter 7 Sampling Distributions.
Descriptive and inferential statistics. Confidence interval
The Normal Distribution
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Sampling Distributions
Univariate Statistics
Chapter 7 Sampling Distributions.
What are their purposes? What kinds?
Chapter 7 Sampling Distributions.
Lecture11 review for final examination
Sampling Distributions (§ )
How Confident Are You?.
Presentation transcript:

Sampling and Confidence Interval Kenneth Kwan Ho Chui, PhD, MPH Department of Public Health and Community Medicine Epidemiology/Biostatistics

Learning objectives in the syllabus Understand how a histogram can be read as a probability distribution Understand the importance of random sampling in statistics Understand how sample means can have distributions Explain the behavior (distribution) of sample means and the Central Limit Theorem Know how to interpret confidence intervals as seen in the medical literature Know how to calculate a confidence interval for a mean

Population Parameter Sample statistics Sample Types of data How to summarize data Central tendency Variability How to evaluate graphs Distribution of sample means Know how to interpret and calculate a confidence interval for statistical inference

Assumed knowledge for today Mean Variance Standard deviation The rule

Central tendency: Mean Consider a variable with data: 1, 2, 3, 3, 4, 4, 4, 5, 5, 6

Variance & Standard deviation Observation # Values Sum them up Divide by (sample size – 1) Variance SD = √Variance

The rule 68% of sample are within ± 1SD 95% of samples are within ± 2SD 99% of samples are within ± 3SD 50 th 84 th 97.5 th 99.5 th 16 th 2.5 th 0.5 th Percentile: # of SD:

Population Parameter Sample statistics The mean BMI of a sample from Boston, Massachusetts The true mean BMI of Boston, Massachusetts Sample Researcher ?

Sample variation 1, 2, 3, 4, 5, 6 2, 44, 61, 21, 6 Samples Means Researcher 1Researcher 2Researcher 3Researcher 4 Researchers The whole population ?

Central limit theorem

Central limit theorem The means obtained from many samplings from the same population have the following properties The distribution of the means is always normal if the sample size is big enough (above 120 or so), regardless of the population’s distribution The mean of the sample mean is equal to the population mean The standard deviation of the sample means, known as the standard error of the mean (SEM) is inversely related to the sample size: if we repeat the experiment with a bigger sample size, the resulting histogram will be “slimmer”

Understanding CLT through simulation Population size: Possible values: 0 through 9, 1000 each True population mean: 4.50

Simulation scheme A population of Mean = 4.5 Sample n=500 Sample mean Frequency Sample mean 10000

Sample size = 500; # of draws = Sample means Frequency % 95% 99% SD = 0.13 SE ±1 SE: 67.95% ±2 SE: 95.04% ±3 SE: 99.10%

Characteristics for the distribution of means In the previous slide, the mean 4.5 is the true population parameter, for which we have a Greek name, μ (mu) Similarly, the SD 0.13 is the true population parameter, called σ (sigma) in Greek. We call this SD of means “standard error of means” (SEM) or “standard error” (SE) SE can be estimated using sample SD:

Why bigger sample sizes are often better Sample size = 500 Sample size = 1000Sample size = 200 Sample means SE = 0.13 SE = 0.08SE = 0.20

Confidence interval

I got CLT, so now what? The histogram can be viewed as a “probability distribution” The sample mean from a researcher can be any pixel under the bell curve How should we define “acceptably close” to the population mean? 95%

The confidence interval 95%

True mean If we put a CI on every sample mean, about 95% of them would include the true mean. The two red ones are the “unlucky” samples which do not include the true mean.

Interpretation of a confidence interval The mean and 95% confidence interval (CI) of the blood glucose of a sample is: 140 mg/dl (95%CI: 120, 160) We are 95% certain that the true population mean glucose falls between 120 and 160 mg/dl. Our best estimate is 140 mg/dl (i.e. the sample mean) Why only 95% certain? Because the sample mean can be, unfortunately, an extreme one beyond ± 2 SE (the blue zones)

Some common CIs and their z -score multipliers There are two numbers in a confidence interval: the lower and upper confidence limits 90%CI: Mean ± 1.65  SE 95%CI: Mean ± 1.96  SE 2.00 is an approximation, 1.96 is recommended The most commonly used criterion 99%CI: Mean ± 2.58  SE The more certain we want the interval to include the true mean, the wider the CI becomes “I am 100% certain that the true mean is between –∞ and ∞.”

How to narrow down confidence interval? Lower our certainty by opting for, say, a 90%CI instead of a 95%CI Decrease sample standard deviation (for instance, using a more accurate measurement device) Increase sample size

Are confidence intervals always symmetric? Not in all occasions. CIs for untransformed continuous variables are symmetric However, CIs for other statistics such as odds ratios and relative risks are calculated on logarithmic scale. When back-transformed to the ratios, the interval will be asymmetric “Multivariable analysis revealed a more than 2-fold increase in the risk of total stroke among men with job strain (combination of high job demand and low job control) (hazard ratio, 2.73; 95% confidence interval, )”

Quiz A study recruited 100 subjects and examined their height. The mean of their height is 155 cm What is the most likely type of data? A) Binary B) Nominal C) Ordinal D) Continuous

Quiz A study recruited 100 subjects and examined their height. The mean their heights is 155 cm The median of their heights is 140 cm, the height variable is likely to be: A) Normally distributed B) Skewed to the left (negatively skewed) C) Skewed to the right (positively skewed)

Quiz A study recruited 100 subjects and examined their height. The mean ± SD of their height is 155 ± 10 cm Assume the height data are normally distributed. Which of the following is false? A) 16% of the subjects are shorter than 145 cm B) The standard error of the mean is 10/√100 = 1 cm C) We are 95% certain that the sample mean is between (155 ± 1.96  standard error) cm D) We are 95% certain that the population mean is between (155 ± 1.96  standard error) cm

Another application Other than estimating the true mean, μ, we can also assume the μ to be a certain hypothesized value Then, we can sample and derive the sample mean and 95%CI If the 95%CI does not include the assumed μ (the sample mean falls into the blue zones), we can then conclude that our sample is, probability-wise, weird; it is perhaps different from the assumed population The foundation of hypothesis testing (We’ll learn it next week!)