Outline Sampling Measurement Descriptive Statistics:

Slides:



Advertisements
Similar presentations
Introduction to Statistics
Advertisements

QUANTITATIVE DATA ANALYSIS
Calculating & Reporting Healthcare Statistics
Methods and Measurement in Psychology. Statistics THE DESCRIPTION, ORGANIZATION AND INTERPRATATION OF DATA.
Introduction to Educational Statistics
AM Recitation 2/10/11.
Chapter 3 Statistical Concepts.
EPE/EDP 557 Key Concepts / Terms –Empirical vs. Normative Questions Empirical Questions Normative Questions –Statistics Descriptive Statistics Inferential.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Chapter 15 Data Analysis: Testing for Significant Differences.
Statistics Recording the results from our studies.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Psychology 101. Statistics THE DESCRIPTION, ORGANIZATION AND INTERPRATATION OF DATA.
Chapter Eight: Using Statistics to Answer Questions.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Data Analysis.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
LIS 570 Summarising and presenting data - Univariate analysis.
Organizing and Analyzing Data. Types of statistical analysis DESCRIPTIVE STATISTICS: Organizes data measures of central tendency mean, median, mode measures.
Descriptive Statistics(Summary and Variability measures)
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Descriptive and Inferential Statistics
MEASURES OF CENTRAL TENDENCY Central tendency means average performance, while dispersion of a data is how it spreads from a central tendency. He measures.
Populations.
Statistical tests for quantitative variables
Research Methods in Psychology PSY 311
How Psychologists Ask and Answer Questions Statistics Unit 2 – pg
PCB 3043L - General Ecology Data Analysis.
Statistics.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Analyzing and Interpreting Quantitative Data
Simulation-Based Approach for Comparing Two Means
AP Biology Intro to Statistics
Numerical Measures: Centrality and Variability
Statistics: the language of psychological research
Description of Data (Summary and Variability measures)
Statistical Analysis How do we make sense of the data we collect during a study or an experiment?
Social Research Methods
Numerical Descriptive Measures
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Research Statistics Objective: Students will acquire knowledge related to research Statistics in order to identify how they are used to develop research.
Module 8 Statistical Reasoning in Everyday Life
Statistical Evaluation
Introduction to Statistics
Basic Statistical Terms
NURS 790: Methods for Research and Evidence Based Practice
Descriptive and inferential statistics. Confidence interval
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerical Descriptive Measures
Summary descriptive statistics: means and standard deviations:
Data analysis and basic statistics
1. Homework #2 (not on posted slides) 2. Inferential Statistics 3
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Chapter Nine: Using Statistics to Answer Questions
PSY 250 Hunter College Spring 2018
Advanced Algebra Unit 1 Vocabulary
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Numerical Descriptive Measures
Presentation transcript:

2017 Statistics Review John Glenn College of Public Affairs Aditi Vaishali Thapar thapar.9@osu.edu

Outline Sampling Measurement Descriptive Statistics: Sampling Terms: An Example Measurement Descriptive Statistics: Measures of Central Tendency Measures of Dispersion The Normal Distribution Inferential Statistics: Correlation vs. Causation Hypothesis testing P-values Standard Error Confidence Intervals and Z-Scores

Sampling Population vs. Sample Population Sample The entire group of people or things about which we want information Sample Unlikely that we will be able to collect data for the entire population Representative portion of population about which data is collected.

Sampling Statistics vs. Parameters Parameters Statistics Summarise data for an entire population Statistics Summarise data for a sample Unit of Analysis: Entity that is being analyzed in a study Variable: A characteristic of the unit of analysis Image source: https://www.cliffsnotes.com/study-guides/statistics/sampling/populations-samples-parameters-and-statistics

Sampling Terms: Example What is the demographic information for students who attend statistics boot camp? Population: Sample: Unit of Analysis: Variables: Parameter: Statistics:

Sampling Terms: Example What is the demographic information for students who attend statistics boot camp? Population: All students who attend statistics boot camp Sample: 20 randomly selected students at statistics boot camp Unit of Analysis: The individual (i.e. student) Variables: Age, gender, income, race, etc. Parameter: Average age of all students at statistics boot camp, etc. Statistics: Average age of the randomly 20 selected students at boot camp, etc.

Measurement Nominal Ordinal Numerical values just "name" the attribute uniquely No ordering of the cases is implied Example: Numbers on football/basketball jerseys Ordinal Attributes can be rank-ordered, numerically Distances between attributes do not have any meaning. Example: Coding educational Attainment as 0 = less than high school 1 = high school degree 2 = college degree 3 = Masters, PhD, etc.

Measurement Interval Ratio The distance between attributes does have meaning Example: When measuring temperature, the distance between 30F and 40F is the same as that between 70F and 80F. Ratio  There is always an absolute zero that is meaningful. i.e. you can construct a meaningful fraction/ratio Source: http://www.socialresearchmethods.net/kb/measlevl.php

Measures of Central Tendency Central tendencies tell us where most of the data lie Mean: also known as the average Add up all the values for your variable, then divide by the total number of values Median: The middle score for a set of data that has been arranged in order of magnitude. Mode: The most frequent value in the dataset

Which Measure Should We Use? It depends on, both, the type of variable and the distribution of the data Mode: Typically used when we have categorical data (i.e. gender, race, educational attainment etc.) Mean: When we want the average value of a variable, UNLESS our data is skewed. Median: When we have skewed data and/or outliers Question: What measure of central tendency would you use to calculate the average salary for a group of 10 people where 9 people earn $1 and 1 person earns $100?

Measures of Dispersion Dispersion studies the spread of the data Range | Maximum – Minimum | Variance How far each of the observations in the sample dataset lie away from the mean Standard Deviation Square root of the variance A low standard deviation tells us that data points tend to be close to the mean

Measures of Dispersion Question: Given the data below on test scores what is the sample size (N), mean, median, mode, range, standard deviation and variance? 6 10 8 7 4 9 3

Measures of Dispersion Answer: Start by ordering the data in order of magnitude: 0, 3, 4, 6, 6, 6, 7, 8, 9, 10 Sample size: 10 Mean: 0+3+4+6+6+6+7+8+9+10 10 =5.9 Median: 6 Mode: 6 Range: 10 – 0 = 10 Variance: 8.76, calculated using 0−5.9 2 + 3−5.9 2 + 4−5.9 2 + 3∗ 6−5.9 2 + 7−5.9 2 + 8−5.9 2 + 9−5.9 2 + 10−5.9 2 10−1 Standard deviation: 8.76 =2.96

The Normal Distribution The normal distribution is a symmetric, bell-shaped distribution that is completely described by the mean and the standard deviation The mean describes the centre of the curve The standard deviation determines the shape

Central Limit Theorem As the sample size of a random variable grows larger, the sampling distribution of mean approaches a normal distribution What does this theorem tell us? A sample with more observations gives us a truer picture of the actual population Making assumptions based on samples that are “too small” may make for a biased analysis

Correlation Correlation: A single number that describes the degree of relationship between two variables. The value of correlation ranges from -1 to 1 If the correlation coefficient is positive, this means that the two variables move together Example: Education and salary (as level of education increases, as does salary) If the correlation coefficient is negative, this means that the two variables have an inverse relationship Example: Education and unemployment rate (as the level of education increases, the unemployment rate decreases) If the correlation coefficient is zero, the two variables do not have a relationship Example: The weather and salary

Causation Causation is a much stronger relationship than just correlation Image source: https://www.dreamstime.com/royalty-free-stock-images-causation-correlation-difference-explained-image37881989; https://xkcd.com/925/

Hypothesis Testing Hypothesis testing is used to compare our observed statistic to other statistics/parameter. But what does that really mean? You’re testing whether your results are valid by calculating the odds that your results are a product of chance. The null hypothesis (H0) is the hypothesis that we are trying to disprove. Usually, the null hypothesis is a statement of no effect or no difference The alternative hypothesis (H1) describes the relationship as we expect it to be Tests can be either one-tailed or two-tailed

Hypothesis Testing Two-tailed test example: A researcher claims that individuals aged 17 have an average body temperature higher than the commonly accepted average of 98.6F. H0: Individuals aged 17 have an average body temperature that is not greater than 98.6 F average temp <= 98.6F H1: Individuals aged 17 have an average body temperature that is greater than 98.6 F average temp > 98.6F

Hypothesis Testing One-tailed test example: A researcher claims that consuming a drug she developed increases student performance on exams. The average student test score is 87. H0: The drug will have no effect on average student test scores (i.e. they stay constant) average test score = 87 H1: The drug will increase average student test scores (i.e. they stay constant) average test score > 87

P-values P-value is the probability of finding an observed result, assuming that the null hypothesis is true. There are multiple critical values (1%, 5% and 10%) that we use to test the validity of our claims The most frequently used critical value is 5% (0.05) If the p-value obtained is higher than the 0.05 threshold, we say that our finding is not statistically significant Therefore, we cannot reject our null hypothesis. If the p-value obtained is lower than the 0.05 threshold, we say that our finding is statistically significant Therefore, we can reject our null hypothesis, and accept the alternate hypothesis.

Standard Error Standard error is how far the sample mean is likely to be from the population mean. How does this differ from the standard deviation? Standard deviation is the degree to which individuals within the sample differ from the sample mean. Calculated using: Example: if we only sample 5 universities to examine the impact of ownership on the test score, what is the likelihood that the true average test score is equivalent to that in our sample?

Confidence Intervals and Z-Scores A Z-score score is a numerical measurement of a value's distance from the mean. If a Z-score is 0, it represents the score is identical to the mean score. Calculated using: 𝑥−𝑚𝑒𝑎𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 At the 95% level, we use 1.96 A confidence interval is a range of values between which we are certain that the true mean lies. Calculated using: mean +/- (standard error * Z-score)

Finding the Confidence Interval Question: You want to investigate the impact of college degree on income. Therefore, you sample 20 persons that have college degree (Group A) and 20 persons that do not have (Group B). You get the following statistics. What is the 95% confidence intervals of each group? How can we interpret the results? Mean Min Max SE Variance Group A 70,000 20,000 130,000 25,000 200 Group B 68,000 200,000 15,000 400 70000+25,000*1.96=119,000 70000-25,000*1.96=21000 68,000+15000*1.96=97400 68,000-15000*1.96= 38600

Let’s move to our worksheets!