Descriptive and inferential statistics

Slides:



Advertisements
Similar presentations
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Advertisements

ADVANCED STATISTICS FOR MEDICAL STUDIES Mwarumba Mwavita, Ph.D. School of Educational Studies Research Evaluation Measurement and Statistics (REMS) Oklahoma.
Inference Sampling distributions Hypothesis testing.
Introduction to Statistics
Chapter Seventeen HYPOTHESIS TESTING
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Chapter Sampling Distributions and Hypothesis Testing.
Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.
Richard M. Jacobs, OSA, Ph.D.
Chapter 5For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Suppose we wish to know whether children who grow up in homes without access to.
Statistical Inference Dr. Mona Hassan Ahmed Prof. of Biostatistics HIPH, Alexandria University.
AM Recitation 2/10/11.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Sampling and Confidence Interval Kenneth Kwan Ho Chui, PhD, MPH Department of Public Health and Community Medicine
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Statistical Inference An introduction. Big picture Use a random sample to learn something about a larger population.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Chapter 13 Understanding research results: statistical inference.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Outline Sampling Measurement Descriptive Statistics:
Basics of Pharmaceutical Statistics
And distribution of sample means
Logic of Hypothesis Testing
Psych 231: Research Methods in Psychology
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Lecture Nine - Twelve Tests of Significance.
INF397C Introduction to Research in Information Studies Spring, Day 12
Chapter 5: Introduction to Statistical Inference
Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern October 17 and 19, 2017.
How Psychologists Ask and Answer Questions Statistics Unit 2 – pg
PCB 3043L - General Ecology Data Analysis.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis testing. Chi-square test
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Hypothesis Testing Is It Significant?.
Central Limit Theorem, z-tests, & t-tests
Statistical Process Control
Sampling and Sampling Distributions
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Introduction to Statistics
Chapter 9 Hypothesis Testing.
Hypothesis testing. Parametric tests
Descriptive and inferential statistics. Confidence interval
Introduction to Biostatistics
Hypothesis testing. Chi-square test
Hypothesis testing. Parametric tests
Essential Statistics Introduction to Inference
Elements of a statistical test Statistical null hypotheses
Psych 231: Research Methods in Psychology
Introduction to Biostatistics
Hypothesis testing. Association and regression
Lecture 1: Descriptive Statistics and Exploratory
Chapter Nine: Using Statistics to Answer Questions
Last Update 12th May 2011 SESSION 41 & 42 Hypothesis Testing.
Type I and Type II Errors
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
Presentation transcript:

Descriptive and inferential statistics Asst. Prof. Georgi Iskrov, PhD Department of Social Medicine

Lecture slides to be updated! Before we start http://www.raredis.work/edu/ Lecture slides to be updated!

Outline Statistics Sample, population and sampling Descriptive and inferential statistics Types of variables and level of measurement Measures of central tendency and spread Normal distribution Confidence intervals Sample size calculation Hypothesis testing Significance, power and errors Normality tests

Why do we need to use statistical methods? To make strongest possible conclusion from limited amounts of data; To generalize from a particular set of data to a more general conclusion. What do we need to pay attention to? Bias Probability

Population vs Sample Population Parameters μ, σ Sample / Statistics x, s

Population vs Sample Population includes all objects of interest, whereas sample is only a portion of the population: Parameters are associated with populations and statistics with samples; Parameters are usually denoted using Greek letters (μ, σ) while statistics are usually denoted using Roman letters (X, s). There are several reasons why we do not work with populations: They are usually large and it is often impossible to get data for every object we are studying; Sampling does not usually occur without cost. The more items surveyed, the larger the cost.

Inferential statistics Population Parameters Sampling From population to sample Sample Statistics From sample to population Inferential statistics 7

Descriptive vs Inferential statistics We compute statistics and use them to estimate parameters. The computation is the first part of the statistical analysis (Descriptive Statistics) and the estimation is the second part (Inferential Statistics). Descriptive statistics: The procedure used to organize and summarize masses of data. Inferential statistics: The methods used to find out something about a population, based on a sample.

Sampling Individuals in the population vary from one another with respect to an outcome of interest.

Sampling When a sample is drawn, there is no certainty that it will be representative for the population. Sample A Sample B

Sampling Sample B Sample A Population

Sampling Sample B Sample A Population

Sampling Random sample: In random sampling, each item or element of the population has an equal chance of being chosen at each draw. While this is the preferred way of sampling, it is often difficult to do. It requires that a complete list of every element in the population be obtained. Computer generated lists are often used with random sampling. Properties of a good sample: Random selection; Representativeness by structure; Representativeness by number of cases.

Sampling Systematic sampling: The list of elements is “counted off”. That is, every k-th element is taken. This is similar to lining everyone up and numbering off “1,2,3,4; 1,2,3,4; etc”. When done numbering, all people numbered 4 would be used. Convenience sampling: In convenience sampling, readily available data is used. That is, the first people the surveyor runs into.

Sampling Cluster sampling: It is accomplished by dividing the population into groups (clusters), usually geographically. The clusters are randomly selected, and each element in the selected clusters are used. Stratified sampling: It divides the population into groups, called strata. However, this time it is by some characteristic, not geographically. For instance, the population might be separated into males and females. A sample is taken from each of these strata using either random, systematic, or convenience sampling.

Random and systematic errors Random error can be conceptualized as sampling variability. Bias (systematic error) is a difference between an observed value and the true value due to all causes other than sampling variability. Biased sample: Biased sample is one, in which the method used to create the sample results in samples that are systematically different from the population. Accuracy is a general term denoting the absence of error of all kinds.

Sample size calculation Law of Large Numbers: As the number of trials of a random process increases, the percentage difference between the expected and actual values goes to zero. Application in biostatistics: Bigger sample size, smaller margin of error. A properly designed study will include a justification for the number of experimental units (people/animals) being examined. Sample size calculations are necessary to design experiments that are large enough to produce useful information and small enough to be practical.

Sample size calculation Generally, the sample size for any study depends on: Acceptable level of confidence; Power of the study; Expected effect size and absolute error of precision; Underlying scatter in the population. High power Large sample size Large effect Little scatter Low power Small sample size Small effect Lots of scatter

Sample size calculation For quantitative variables: Z – confidence level; SD – standard deviation; d – absolute error of precision.

Sample size calculation For quantitative variables: A researcher is interested in knowing the average systolic blood pressure in pediatric age group at 95% level of confidence and precision of 5 mmHg. Standard deviation, based on previous studies, is 25 mmHg.

Sample size calculation For qualitative variables: Z – confidence level p – expected proportion in population d – absolute error of precision

Sample size calculation For qualitative variables: A researcher is interested in knowing the proportion of diabetes patients having hypertension. According to a previous study, the actual number is no more than 15%. The researcher wants to calculate this size with a 5% absolute precision error and a 95% confidence level.

Frequency distribution Mean, standard deviation Variables Different types of data require different kind of analyses. Nominal Ordinal Interval Ratio Frequency distribution Yes Median, percentiles No Mean, standard deviation

Levels of measurement There are four levels of measurement: Nominal, Ordinal, Interval and Ratio. These go from lowest level to highest level. Data is classified according to the highest level which it fits. Each additional level adds something the previous level did not have. Nominal is the lowest level. Only names are meaningful here; Ordinal adds an order to the names; Interval adds meaningful differences; Ratio adds a zero so that ratios are meaningful.

Levels of measurement Nominal scale – eg., genotype You can code it with numbers, but the order is arbitrary and any calculations would be meaningless. Ordinal scale – eg., pain score from 1 to 10 The order matters but not the difference between values. Interval scale – eg., temperature in C The difference between two values is meaningful. Ratio scale – eg., height It has a clear definition of 0. When the variable equals 0, there is none of that variable. When working with ratio variables, but not interval variables, you can look at the ratio of two measurements.

Central tendency and spread Central tendency: Mean, mode and median Spread: Range, interquartile range, standard deviation Mistakes: Focusing on only the mean and ignoring the variability Standard deviation and standard error of the mean Variation and variance What is best to use in different scenarios? Symmetrical data: mean and standard deviation Skewed data: median and interquartile range

Normal (Gaussian) distribution When data are approximately normally distributed: approximately 68% of the data lie within one SD of the mean; approximately 95% of the data lie within two SDs of the mean; approximately 99.7% of the data lie within three SDs of the mean.

Normal (Gaussian) distribution Central limit theorem: Create a population with a known distribution that is not normal; Randomly select many samples of equal size from that population; Tabulate the means of these samples and graph the frequency distribution. Central limit theorem states that if your samples are large enough, the distribution of the means will approximate a normal distribution even if the population is not Gaussian. Mistakes: Normal vs common (or disease free); Few biological distributions are exactly normal.

Confidence interval for the population mean Population mean: point estimate vs interval estimate Standard error of the mean – how close the sample mean is likely to be to the population mean. Assumptions: a random representative sample, independent observations, the population is normally distributed (at least approximately). Confidence interval depends on: sample mean, standard deviation, sample size, degree of confidence. Mistakes: 95% of the values lie within the 95% CI; A 95% CI covers the mean ± 2 SD.

Confidence interval for the population mean The duration of time from first exposure to HIV infection to AIDS diagnosis is called the incubation period. The incubation periods (in years) of a random sample of 30 HIV infected individuals are: 12.0, 10.5, 9.5, 6.3, 13.5, 12.5, 7.2, 12.0, 10.5, 5.2, 9.5, 6.3, 13.1, 13.5, 12.5, 10.7, 7.2, 14.9, 6.5, 8.1, 7.9, 12.0, 6.3, 7.8, 6.3, 12.5, 5.2, 13.1, 10.7, 7.2. Calculate the 95% CI for the population mean incubation period in HIV. X = 9.5 years; SD = 2.8 years SEM = 0.5 years 95% level of confidence => Z = 1.96 µ = 9.5 ± (1.96 x 0.5) = 9.5 ± 1 years 95% CI for µ is (8.5; 10.5 years)

Confidence interval for the population mean X = 9.5 years; SD = 2.8 years SEM = 0.5 years 95% level of confidence => Z = 1.96 µ = 9.5 ± (1.96 x 0.5) = 9.5 ± 1 years 95% CI for µ is (8.5; 10.5 years) 99% level of confidence => Z = 2.58 µ = 9.5 ± (2.58 x 0.5) = 9.5 ± 1.3 years 99% CI for µ is (8.2; 10.8 years)

Is there a difference? Hypothesis testing Diabetes type 2 study Experimental group: Mean blood sugar level: 103 mg/dl Control group: Mean blood sugar level: 107 mg/dl Pancreatic cancer study Experimental group: 1-year survival rate: 23% Control group: 1-year survival rate: 20% Is there a difference?

Hypothesis testing The general idea of hypothesis testing involves: Making an initial assumption; Collecting evidence (data); Based on the available evidence (data), deciding whether to reject or not reject the initial assumption. Every hypothesis test – regardless of the population parameter involved – requires the above three steps.

Null hypothesis – H0 This is the hypothesis under test, denoted as H0. The null hypothesis is usually stated as the absence of a difference or an effect; The null hypothesis says there is no effect; The null hypothesis is rejected if the significance test shows the data are inconsistent with the null hypothesis.

Alternative hypothesis – H1 This is the alternative to the null hypothesis. It is denoted as H', H1, or HA. It is usually the complement of the null hypothesis; If, for example, the null hypothesis says two population means are equal, the alternative says the means are unequal.

Criminal trial Criminal justice system assumes “the defendant is innocent until proven guilty”. That is, our initial assumption is that the defendant is innocent. In the practice of statistics, we make our initial assumption when we state our two competing hypotheses – the null hypothesis (H0) and the alternative hypothesis (HA). Here, our hypotheses are: H0: Defendant is not guilty (innocent); HA: Defendant is guilty; In statistics, we always assume the null hypothesis is true. That is, the null hypothesis is always our initial assumption.

Criminal trial The prosecution team then collects evidence with the hopes of finding “sufficient evidence” to make the assumption of innocence refutable. In statistics, the data are the evidence. The jury then makes a decision based on the available evidence: If the jury finds sufficient evidence – beyond a reasonable doubt – to make the assumption of innocence refutable, the jury rejects H0 and deems the defendant guilty. We behave as if the defendant is guilty. If there is insufficient evidence, then the jury does not reject H0. We behave as if the defendant is innocent.

Making the decision Recall that it is either likely or unlikely that we would observe the evidence we did given our initial assumption. If it is likely, we do not reject the null hypothesis; If it is unlikely, then we reject the null hypothesis in favor of the alternative hypothesis; Effectively, then, making the decision reduces to determining “likely” or “unlikely”.

Making the decision In statistics, there are two ways to determine whether the evidence is likely or unlikely given the initial assumption: We could take the “critical value approach” (favored in many of the older textbooks). Or, we could take the “p-value approach” (what is used most often in research, journal articles, and statistical software).

Making the decision Suppose we find a difference between two groups in survival: patients on a new drug have a survival of 15 months; patients on the old drug have a survival of 18 months. So, the difference is 3 months.

Making the decision Suppose we find a difference between two groups in survival: patients on a new drug have a survival of 15 months; patients on the old drug have a survival of 18 months. So, the difference is 3 months. Do we accept or reject the hypothesis of no true difference between the groups (the two drugs)? Is a difference of 3 a lot, statistically speaking – a huge difference that is rarely seen? Or is it not much – the sort of thing that happens all the time?

Making the decision A statistical test tells you how often you would get a difference of 3, simply by chance, if the null hypothesis is correct – no real difference between the two groups. Suppose the test is done and its result is that p = 0.32. This means that you’d get a difference of 3 quite often just by the play of chance – 32 times in 100 – even when there is in reality no true difference between the groups.

Making the decision A statistical test tells you how often you’d get a difference of 3, simply by chance, if the null hypothesis is correct – no real difference between the two groups. On the other hand if we did the statistical analysis and p = 0.0001, then we say that you would only get a difference as big as 3 by the play of chance 1 time in 10 000. That is so rarely that we want to reject our hypothesis of no difference: there is something different about the new therapy.

Hypothesis testing Somewhere between 0.32 and 0.0001 we may not be sure whether to reject the null hypothesis or not. Mostly we reject the null hypothesis when, if the null hypothesis were true, the result we got would have happened less than 5 times in 100 by chance. This is the ‘conventional’ cutoff of 5% or p <0.05. This cutoff is commonly used but it is arbitrary i.e. no particular reason why we use 0.05 rather than 0.06 or 0.048 or whatever.

Hypothesis testing Decision: Reject null hypothesis Do not reject null hypothesis Null hypothesis is true Type I error No error Null hypothesis is false Type II error

Type I and II errors A type I error is the incorrect rejection of a true null hypothesis (also known as a “false positive” finding). The probability of a type I error is denoted by the Greek letter  (alpha). A type II error is incorrectly retaining a false null hypothesis (also known as a "false negative" finding). The probability of a type II error is denoted by the Greek letter  (beta).

Level of significance Level of significance (α) – the threshold for declaring if a result is significant. If the null hypothesis is true, α is the probability of rejecting the null hypothesis. α is decided as part of the research design, while p-value is computed from data. α = 0.05 is most commonly used. Small α value reduces the chance of Type I error, but increases the chance of Type II error. Trade-off based on the consequences of Type I (false-positive) and Type II (false-negative) errors.

Power Power – the probability of rejecting a false null hypothesis. Statistical power is inversely related to β or the probability of making a Type II error (power is equal to 1 – β). Power depends on the sample size, variability, significance level and hypothetical effect size. You need a larger sample when you are looking for a small effect and when the standard deviation is large.

Choosing a statistical test Choice of a statistical test depends on: Level of measurement for the dependent and independent variables Number of groups or dependent measures Number of units of observation Type of distribution The population parameter of interest (mean, variance, differences between means and/or variances)

Choosing a statistical test Multiple comparison – two or more data sets, which should be analyzed repeated measurements made on the same individuals; entirely independent samples. Degrees of freedom – the number of scores, items, or other units in the data set, which are free to vary One- and two tailed tests one-tailed test of significance used for directional hypothesis; two-tailed tests in all other situations. Sample size – number of cases, on which data have been obtained Which of the basic characteristics of a distribution are more sensitive to the sample size?

Student t-test 51

2-sample t-test Aim: Compare two means Example: Comparing pulse rate in people taking two different drugs Assumption: Both data sets are sampled from Gaussian distributions with the same population standard deviation Effect size: Difference between two means Null hypothesis: The two population means are identical Meaning of P value: If the two population means are identical, what is the chance of observing such a difference (or a bigger one) between means by chance alone?

Paired t-test Aim: Compare a continuous variable before and after an intervention Example: Comparing pulse rate before and after taking a drug Assumption: The population of paired differences is Gaussian Effect size: Mean of the paired differences Null hypothesis: The population mean of paired differences is zero Meaning of P value: If there is no difference in the population, what is the chance of observing such a difference (or a bigger one) between means by chance alone?

One-way ANOVA Aim: Compare three or more means Example: Comparing pulse rate in 3 groups of people, each group taking a different drug Assumption: All data sets are sampled from Gaussian distributions with the same population standard deviation Effect size: Fraction of the total variation explained by variation among group means Null hypothesis: All population means are identical Meaning of P value: If the population means are identical, what is the chance of observing such a difference (or a bigger one) between means by chance alone?

Parametric and non-parametric tests Parametric test – the variable we have measured in the sample is normally distributed in the population to which we plan to generalize our findings Non-parametric test – distribution free, no assumption about the distribution of the variable in the population

Normality test Normality tests are used to determine if a data set is modeled by a normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed. In descriptive statistics terms, a normality test measures a goodness of fit of a normal model to the data – if the fit is poor then the data are not well modeled in that respect by a normal distribution, without making a judgment on any underlying variable. In frequentist statistics statistical hypothesis testing, data are tested against the null hypothesis that it is normally distributed.

Normality test Graphical methods An informal approach to testing normality is to compare a histogram of the sample data to a normal probability curve. The empirical distribution of the data (the histogram) should be bell-shaped and resemble the normal distribution. This might be difficult to see if the sample is small.

Normality test Frequentist tests Tests of univariate normality include the following: D'Agostino's K-squared test Jarque–Bera test Anderson–Darling test Cramér–von Mises criterion Lilliefors test Kolmogorov–Smirnov test Shapiro–Wilk test Etc.

Normality test Kolmogorov–Smirnov test K–S test is a nonparametric test of the equality of distributions that can be used to compare a sample with a reference distribution (1-sample K–S test), or to compare two samples (2-sample K–S test). K–S statistic quantifies a distance between the empirical distribution of the sample and the cumulative distribution of the reference distribution, or between the empirical distributions of two samples. The null hypothesis is that the sample is drawn from the reference distribution (in the 1-sample case) or that the samples are drawn from the same distribution (in the 2-sample case).

Normality test Kolmogorov–Smirnov test In the special case of testing for normality of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic.