Describing Data September 14, 2016. Updates This week – Lab sections begin Wed: 2-4pm (Today!) Wed: 4-6pm (Today!) Mon: 4-6pm Next week Eric Glass, guest.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
DESCRIPTIVE STATISTICS: GRAPHICAL AND NUMERICAL SUMMARIES
Descriptive Statistics
Statistics Lecture 2. Last class began Chapter 1 (Section 1.1) Introduced main types of data: Quantitative and Qualitative (or Categorical) Discussed.
Introduction to Educational Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
Very Basic Statistics.
Today: Central Tendency & Dispersion
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Objective To understand measures of central tendency and use them to analyze data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Smith/Davis (c) 2005 Prentice Hall Chapter Four Basic Statistical Concepts, Frequency Tables, Graphs, Frequency Distributions, and Measures of Central.
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
Chapter 1 The Role of Statistics. Three Reasons to Study Statistics 1.Being an informed “Information Consumer” Extract information from charts and graphs.
Introduction to Descriptive Statistics Objectives: 1.Explain the general role of statistics in assessment & evaluation 2.Explain three methods for describing.
Chapter 2 Describing Data.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Displaying Distributions with Graphs. the science of collecting, analyzing, and drawing conclusions from data.
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
The field of statistics deals with the collection,
LIS 570 Summarising and presenting data - Univariate analysis.
Presenting Data Descriptive Statistics. Chapter- Presentation of Data Mona Kapoor.
Measurements Statistics WEEK 6. Lesson Objectives Review Descriptive / Survey Level of measurements Descriptive Statistics.
Descriptive Statistics(Summary and Variability measures)
What is Statistics?. Statistics 4 Working with data 4 Collecting, analyzing, drawing conclusions.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistics Vocabulary. 1. STATISTICS Definition The study of collecting, organizing, and interpreting data Example Statistics are used to determine car.
Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU.
AP Statistics. Chapter 1 Think – Where are you going, and why? Show – Calculate and display. Tell – What have you learned? Without this step, you’re never.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Chapter 11 Summarizing & Reporting Descriptive Data.
A QUANTITATIVE RESEARCH PROJECT -
Descriptive Statistics
Prof. Eric A. Suess Chapter 3
Exploratory Data Analysis
Methods for Describing Sets of Data
Measurements Statistics
MATH-138 Elementary Statistics
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Chapter 2: Methods for Describing Data Sets
Unit 4 Statistical Analysis Data Representations
Module 6: Descriptive Statistics
CHAPTER 5 Basic Statistics
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Displaying Distributions with Graphs
An Introduction to Statistics
Basic Statistical Terms
Describing Distributions of Data
Introduction to Summary Statistics
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Statistics: The Interpretation of Data
Welcome!.
Introduction to Summary Statistics
Honors Statistics Review Chapters 4 - 5
Introduction to Summary Statistics
Introduction to Summary Statistics
Math 341 January 24, 2007.
Presentation transcript:

Describing Data September 14, 2016

Updates This week – Lab sections begin Wed: 2-4pm (Today!) Wed: 4-6pm (Today!) Mon: 4-6pm Next week Eric Glass, guest speaker from DSSC (part of class) The following week, another speaker talking about Zotero.

Updates to assignments Updated LiPS assignment Still have to seven write-ups One must be either Fulong Wu (Monday evening Nov 14 th ) or Malo Hutson (Tuesday evening Sept. 20 th ) Assignment 2 posted to CourseWorks Due at the start of your lab in 2 weeks. Hand in a paper copy to your TA and post also to CourseWorks.

Today: Statistics Descriptive Describe and summarize our data to give insights Inferential Use statistics to make generalizations about a broader population

Types of Variables Categorical Nominal (not ranked) College major, type of property, color of car Ordinal (ordered or ranked) Useful for preferences, though no value assigned Dichotomous (two categories, not ranked) Yes/no Numerical Discrete (values are counts) Continuous (values are measures)

Variables Nominal Exclusive but not ordered or ranked Ordinal Ranked Interval Equally spaced variables

Nominal Examples Think of nominal scales as “labels” No quantitative value

Nominal Examples Think of nominal scales as “labels” No quantitative value

Nominal Examples Think of nominal scales as “labels” No quantitative value ColorCount Blue10 Black8 Red6 blue5 Purple3 Green2 Purple2 White2 BLUE1 Brown1 Burgundy1 Gray1 Pink1 Red1 Yellow1 nav1 orange1 purple1 red1 seafoam green1 turquoise1 white1

Nominal Examples Think of nominal scales as “labels” No quantitative value Other Examples: Gender Hair color Neighborhood When there are only two categories, we call this “dichotomous.” Examples – Heads/Tails, On/Off, Rural/Urban, In poverty / Not in poverty Q: What about gender? Is that a dichotomous variable?

Ordinal Ranked in order of values, but the difference between values is not always known Example: Educational attainment

Ordinal example: educational attainment

Interval Numerical scales where order of and differences between variables is known Examples: Money or income Height Weight

Likert items Allow people to respond according to some scale

Likert items Allow people to respond according to some scale Examples: Question: How frequently do you think you need to come to class to get a high pass? o Always o Often o Occasionally o Rarely o never

Likert items Allow people to respond according to some scale Examples: Question: I already know everything there is to know about “Planning Techniques” o Agree Strongly o Agree Slightly o Neutral o Disagree Slightly o Disagree Strongly

Likert items Allow people to respond according to some scale Examples – four point scale Question: I read s from Nick Klein o Most of the time o Some of the time o Seldom o Never

Likert items Allow people to respond according to some scale Examples – four point scale Question: I read s from Nick Klein o Most of the time – ALL OF THE TIME o Some of the time o Seldom o Never

Likert Scales What types of variables are these? How can we interpret them?

Descriptive stats

We need some data to describe

Lucky us!

What year were you born? 50 responses: 1993, 1991, 1960, 1993, 1994, 1992, 1989, 1992, 1993, 1993, 1994, 1991, 1990, 1992, 1987, 1989, 1994, 1992, 1989, 1992, 1994, 1985, 1994, 1991, 1991, 1992, 1993, 1993, 1993, 1992, 1991, 1985, 1992, 1992, 1992, 1985, 1994, 1993, 1995, 1991, 1985, 1993, 1990, 1992, 1994, 1994, 1994, 1994, 1992, 1990

Hard to make sense of this… 50 responses: 1993, 1991, 1960, 1993, 1994, 1992, 1989, 1992, 1993, 1993, 1994, 1991, 1990, 1992, 1987, 1989, 1994, 1992, 1989, 1992, 1994, 1985, 1994, 1991, 1991, 1992, 1993, 1993, 1993, 1992, 1991, 1985, 1992, 1992, 1992, 1985, 1994, 1993, 1995, 1991, 1985, 1993, 1990, 1992, 1994, 1994, 1994, 1994, 1992, 1990

We can use a “frequency table” Year bornFrequencyPercent

Let’s represent it another way, graphically

We can use a “dot plot” where each dot represents a response

This is similar to a histogram

But a histogram is more flexible

We can change the number of “bins”

And change the y-axis to a measure of “relative frequency” rather than a count.

Another approach is a “stem and leaf” 195. | 196. | 197. | 198. | 199. | 200. | The stem consists of the numbers with the last digit omitted. So for our years, this would mean ignore the year but keep the decade. So “1975” would become “197”

Another approach is a “stem and leaf” 195. | 196. | | 198. | | | Then add the final digits (the leaf or leaves) back in to the corresponding stem

Summary Statistics

Central Tendency and Spread Two of the most simple and most important measures

Central Tendency There are a number of measures of central tendency The most common are: Mean Median Mode Let’s focus on the first two

Mean

Median The median is the middle most value We can identify it by placing our data in order. Let’s use the same five values: The mean (1989.2) and median (1992) are often different. The median has a nice attribute in that it is generally not sensitive to outliers.

Median If there are two middle-most variables, we would take the average of the two middle values Let’s add our outlier (1960) to our data set and figure out the median: The median is now ( ) / 2 =

Mean and Median Mean ● Easy to understand. It’s the average ● Affected by extreme high or low values (outliers) ● May not best characterize skewed distributions Median ● Not affected by outliers ● May better characterize skewed distributions

What about mode? Mode ● The most frequent value ● Less often used in social science

Mode ● The most frequent value ● Less often used in social science

Percentiles Imagine a chart will all the observable values in a population; it contains 100 percent of the possible values. The p th percentile is the value of a given distribution such that p% of the distribution is less than or equal to that value. Quartiles: The 25th, 50th, and 75th percentiles Quintiles: The 20th, 40th, 60th, and 80th are quintiles Deciles: 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, and 90th. The 50th percentile is the MEDIAN

10 th percentile= percent under curve (shaded red)

Basic descriptive statistics 25 th percentile= percent under curve (shaded red)

Basic descriptive statistics 50 th percentile= percent under curve (shaded red)

75 th percentile= percent under curve (shaded red)

Basic descriptive statistics 90 th percentile= percent under curve (shaded red)

Percentiles from our data

50 th Percentile / the median value is th Percentile is th Percentile is 1993

Measures of Spread

How do we describe the different distributions?

Measures Range Interquartile range Index of dispersion Standard Deviation

Interquartile Range (IQR) The IQR is a simple measure of spread: It is the difference between 25 th and 75 th percentile values. The IQR tells us about the spread from the median

Interquartile Range (IQR) 50 th Percentile / the median value is th Percentile is th Percentile is 1993

Boxplots

Standard Deviation Often, we will use and talk about st. dev. Represented by sigma : σ The st. dev tells us about the spread from the mean (The IQR tells us about the spread form the median)

Standard Deviation

But the st. dev. is really useful. If we have normally distributed data, We can expect 68% is within 1 st. dev. And 95% is within 2.

Other ways to describe spread

Skewness and Symmetry

Why might data be skewed? Why might data be bimodal?

Skewed data example: Family Income

Q: Guess the mean

$71,840

Q: Guess the mean $71,840

Q: Guess the mean $71,840 Q: Guess the median

Q: Guess the mean $71,840 Q: Guess the median $55,000

Interpreting Tables

Elements of a Table Title describes content Sample size presented Actual and percentage shares presented

Assumptions stated Source of calculations stated

Interpreting Tables From Manski (2014) Death penalty moratorium was lifted in U.S. is 1976 Three ways to interpret data presented

Interpreting Tables 1)“Before and after” Average effect of death penalty is -.6 (calculated as )

Interpreting Tables 2) Compare treated and untreated Assumes all else equal, e.g. propensity to kill is the same everywhere Average effect in 1977 is 2.8 (= )

Interpreting Tables 3) Difference in difference Changes in effects over time to account for policy changes Treated states declined from 10.3 to 9.7 = -.6 Untreated states declined from 8.0 to 6.9 = 1.1 Effect =.5 = [( )-( )]

Interpreting Tables Before and after shows reduced homicide rates Comparison of treated and untreated shows increase in rate to 2.8 Difference in difference shows increase in rate to.5 per 100,000 Explanations?

Presenting Data Tables Charts Graphs

Problems with Pie Charts No sample size Similarly sized pies suggest all groups are equal and all response rates are about the same Were yes/no the only options? What are “enough transportation options”?

When Pie Charts Are Appropriate

Bar Chart

Measures of association