DEFINITIONS Population Sample Unit of analysis Case Sampling frame.

Slides:



Advertisements
Similar presentations
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Advertisements

DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.
Statistics for the Social Sciences
Lecture 2 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Measures of Central Tendency
Today: Central Tendency & Dispersion
Describing distributions with numbers
Objective To understand measures of central tendency and use them to analyze data.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Descriptive statistics I Distributions, summary statistics.
DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
VARIABILITY. Case no.AgeHeightM/F 12368M 22264F 32369F 42571M 52764F 62272M 72465F 82366M 92366F F M F M F F F.
Descriptive Statistics: Presenting and Describing Data.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number.
LIS 570 Summarising and presenting data - Univariate analysis.
VARIABILITY. Case no.AgeHeightM/F 12368M 22264F 32369F 42571M 52764F 62272M 72465F 82366M 92366F F M F M F F F.
Data Description Chapter 3. The Focus of Chapter 3  Chapter 2 showed you how to organize and present data.  Chapter 3 will show you how to summarize.
DEFINITIONS Population Sample Unit of analysis Case Sampling frame.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
SAMPLING Purposes Representativeness “Sampling error”
DEFINITIONS Population Sample Unit of analysis Case Sampling frame.
Exploratory Data Analysis
INTRODUCTION TO STATISTICS
Analysis of Quantitative Data
Measurements Statistics
SPSS CODING/GRAPHS & CHARTS CENTRAL TENDENCY & DISPERSION
Descriptive Statistics
Chapter 2: Methods for Describing Data Sets
SAMPLING Purposes Representativeness “Sampling error”
Chi-Square X2.
PCB 3043L - General Ecology Data Analysis.
Descriptive Statistics (Part 2)
CATEGORICAL VARIABLES
Introduction to Summary Statistics
Introduction to Summary Statistics
Introduction to Summary Statistics
Descriptive Statistics: Presenting and Describing Data
Introduction to Summary Statistics
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Introduction to Summary Statistics
MEASURES OF CENTRAL TENDENCY
Module 8 Statistical Reasoning in Everyday Life
Introduction to Summary Statistics
Introduction to Summary Statistics
Basic Statistical Terms
Lesson 1: Summarizing and Interpreting Data
Introduction to Summary Statistics
Dispersion How values arrange themselves around the mean
Random sample of patrol officers, each scored 1-5 on a cynicism scale
CATEGORICAL VARIABLES
Statistics: The Interpretation of Data
DEFINITIONS Population Sample Unit of analysis Case Sampling frame.
VARIABILITY Distributions Measuring dispersion
Chapter 3: Central Tendency
Numerical Descriptive Measures
VARIABILITY Distributions Measuring dispersion
Introduction to Summary Statistics
Summary (Week 1) Categorical vs. Quantitative Variables
Summary (Week 1) Categorical vs. Quantitative Variables
Good morning! Please get out your homework for a check.
Introduction to Summary Statistics
Descriptive Statistics
Introduction to Summary Statistics
Data analysis LO: Identify and apply different methods of measuring central tendencies and dispersion.
Central Tendency & Variability
Presentation transcript:

DEFINITIONS Population Sample Unit of analysis Case Sampling frame

Some essential definitions Population Largest group to which we intend to project (apply) the findings of a study All the prisoners in Jay’s prison Sample Any subgroup of the population Samples intended to represent a population must be selected in special ways (will come up later) Unit of analysis The “container” for the variables Here, the variables under study are sentence length and type of crime (property or violent) What “contains” them? Prisoners! Case A single occurrence of a unit of analysis Here, it’s any one prisoner Cases are “members” or “elements” of the population from which one or more samples are drawn Sampling frame A list of all “elements” or members of the population Jay’s prison Population Sample

A more complicated situation Hypothesis: Youth demeanor  officer disposition Research Question: Do officers treat uncooperative youths more harshly? Hypothesis: Uncooperative youths are more likely to be arrested Graduate students rode around in police cars and coded interactions between officers and youths where arrest was not mandatory So… What is the independent (causal) variable? What is the dependent (effect) variable? What is the population? What is the sample? What is the unit of analysis? (container for the variables) What is a case? (a single occurrence of a unit of analysis) Officer’s disposition From “Police Encounters With Juveniles,” Irving Piliavin and Scott Briar, The American Journal of Sociology, Vol. 70, No. 2 (Sep., 1964)

depicting THE DISTRIBUTION OF CATEGORICAL VARIABLES Frequency “X” axis and “Y” axis Bar graph Table

Depicting distribution of a categorical variable: the bar graph Distributions depict the frequency (number of cases) at each value of a variable. Here there is one variable, gender, with two values, M/F. Thirty-two students are the “population.” Each student is a case. Frequency means the number of cases – students – at a single value of a variable. Frequencies are always on the Y axis Values of the variable are always on the X axis Y - axis N = 32 n=17 n=15 How many at each value/score X - axis Value or score of variable Bars are “made” of cases. Here they’re made of students, who are arranged by the variable gender Distributions depict how cases “distribute” along the values or scores of the variable. Here the proportions of male and female seem nearly equal.

Using a table to display the distribution of two categorical variables Value or score of variable “cells” Value or score of variable Officer’s disposition Number of cases (frequency) at each value/score

depicting THE DISTRIBUTION OF CONTINUOUS VARIABLES Histogram Trend line

Depicting the distribution of continuous variables: the histogram Distributions depict the frequency (number of cases) at each value of a variable. Here there is one variable: age, measured on a scale of 20-33. A case is a single unit that “contains” all the variables of interest. Here each student is a case Frequency means the number of cases – students – at a single value of a variable. Frequencies are always on the Y axis Values of the variable are always on the X axis “Trend” line Y - axis How many at each value/score X - axis Value or score of variable What is the area under the trend line “made of”? Cases, meaning students (arranged by age)

Sometimes, bar graphs are used for continuous variables Y - axis How many at each value/score Value or score of variable X - axis What are the bars “made of”? Cases, meaning homicides (arranged by the variable homicides per year)

Continuous variables: What “makes up” the areas under the trend lines? Cases, that’s what! Trend line This graph displays the distribution of one continuous variable: youths murdered with guns each month How many at each value/score Each murdered youth is one “case” Value or score of variable Value or score of variable How many at each value/score Trend line This graph displays distributions of two continuous variables: violent crime rate and imprisonment rate Each violent crime is one “case” Each commitment to prison is one “case”

Categorical variables Summarizing the distribution of Categorical variables

Summarizing the distribution of categorical variables using percentage Instead of using graphs or a lot of words, is there a single statistic that can convey what a distribution “looks like”? Percentage is a “statistic.” It’s a proportion with a denominator of 100. Percentages are used to summarize categorical data 70 percent of students are employed; 60 percent of parolees recidivate Since per cent means per 100, any decimal can be converted to a percentage by multiplying it by 100 (moving the decimal point two places to the right) .20 = .20 X 100 = 20 percent (twenty per hundred) .368 = .368 X 100 = 36.8 percent (thirty-six point eight per hundred) When converting, remember that there can be fractions of one percent .0020 = .0020 X 100 = .20 percent (two tenths of one percent) To obtain a percentage for a category, divide the number of cases in the category by the total number of cases in the sample 50,000 persons were asked whether crime is a serious problem: 32,700 said “yes.” What percentage said “yes”?

Using percentages to compare datasets Percentages are “normalized” numbers (e.g., per 100), so they can be used to compare datasets of different size Last year, 10,000 people were polled. Eight-thousand said crime is a serious problem This year 12,000 people were polled. Nine-thousand said crime is a serious problem. Calculate the second percentage and compare it to the first

Draw a bar graph for each class depicting proportions for gender, then compare the proportions TTH Class Friday Class

Calculating increases in percentage Increases in percentage are computed off the base amount Example: Jail with 120 inmates. How many will there be... …with a 100 percent increase? 100 percent of the base amount, 120, is 120 (120 X 100/100) 120 base + 120 increase = 240 (2 times the base amount) …with a 150 percent increase? 150 percent of 120 is 180 (120 X 150/100) 120 base plus 180 increase = 300 (2½ times the base amount) How many will there be with a 200 percent increase? 200% 100% Original larger larger 2 times 3 times larger (2X) larger (3X)

Percentage changes can mislead Answer to preceding slide – jail with 120 inmates 200 percent increase 200 percent of 120 is 240 (120 X 200/100) 120 base plus 240 = 360 (3 times the base amount) Percentages can make changes seem large when bases are small Example: Increase from 1 to 3 convictions is two-hundred percent 3-1 = 2 2/base = 2/1 = 2 2 X 100 = 200% Percentages can make changes seem small when bases are large Example: Increase from 5,000 to 6,000 convictions is 20 (twenty) percent 6,000 - 5,000 = 1,000 1,000/base = 1000/5,000 = .20 = 20%

Summarizing the distribution of Continuous variables

Four summary statistics for continuous variables Continuous variables – review Can take on an infinite number of values (e.g., age, height, weight, sentence length) Precise differences between cases Equivalent differences: Distances between 15-20 years same as 60-70 years Summary statistics for continuous variables Mean: arithmetic average of scores Median: midpoint of scores (half higher, half lower) Mode: most frequent score (or scores, if tied) Range: Difference between low and high scores 3.5 1.3

Summarizing the distribution of continuous variables - the mean Arithmetic average of scores Add up all the scores Divide the result by the number of scores Example: Compare arrest productivity for officers in two precincts, each with 20 officers, during dayshift Method: Use mean to summarize arrests at each precinct, then compare the means Variable: no. of arrests per officer Unit of analysis: police precincts Case: one precinct arrests per officer number of officers arrests arrests 1 X 0 + 2 X 1 + 4 X 2 + 6 X 3…= 60 /20 = 3 1 X 0 + 2 X 1 + 4 X 2 + 6 X 3…= 70 /20 = 3.5 arrests arrests Mean 3.0 Mean 3.5 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 16 Means are pulled in the direction, of extreme scores, possibly distorting comparisons

Severity of disposition mean = 2.24 Transforming categorical/ordinal variables into continuous variables, then using the mean Ordinal variables are categorical variables with an inherent order Small, medium, large Cooperative, uncooperative Can summarize in the ordinary way: proportions / percentages Can also transform them into continuous variables by assigning categories points on a scale, then calculating a mean Not always recommended because “distances” between points on scale may not be equal, causing misleading results Is the distance between “Admonished” and “Informal” same as between “Informal and Citation”? “Citation” and “Arrest”? Value Severity of Disposition Youths Freq. % 4 Arrested 16 24 3 Citation or official reprimand 9 14 2 Informal reprimand 1 Admonished & released 25 38 Total (N) 66 100 Severity of disposition mean = 2.24 (25 X 1) + (16 X 2) + (9 X 3) + (16 X 4) / 66

Summarizing the distribution of continuous variables - the median Median is the physical center score (when there are two, use their average) Median can be used with continuous or ordinal variables Median is an especially useful summary statistic when extreme scores are present, as they tend to make the mean misleading Although one distribution has an outlier (16), which pulls the mean higher (3.0 to 3.5), the medians for both distributions are the same (3.0) arrests Mean 3.0 Mean 3.5 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 16 3 + 3 / 2 = 3 3 + 3 / 2 = 3

Summarizing the distribution of continuous variables - the median Median is the physical center score (when there are two, use their average) Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21 Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21 Compute the median for each sample...

Summarizing the distribution of continuous variables - the mode Score that occurs most often (with the greatest frequency) Here the mode is 3 Modes are a useful summary statistic when cases cluster at particular scores – an interesting condition that might otherwise be overlooked Symmetrical distributions, like this one, are called “normal” distributions. In such distributions the mean, mode and median are the same. Near-normal distributions are common. There can be more than one mode (bi-modal, tri-modal, etc.). Identify the modes: Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21 Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21 arrests

A final way to depict the distribution of continuous variables - the range Depicts the lowest and highest scores in a distribution 2, 3, 5, 5, 8, 12, 17, 19, 21 – range is “2 to 21” Range can also be defined as the difference between the scores (21-2 = 19). If so, minimum and maximum scores should also be given. Useful to cite range if there are outliers (extreme scores) that misleadingly distort the shape of the distribution

Practical exercise Calculate summary statistics for age and height – mean, median, mode and range Pictorially depict the distributions for age and height, placing the variables and frequencies on the correct axes TTH Class Friday Class

A preview about dispersal TTH Class Friday Class For age, cases (students) “cluster” at the lower values in both classes. Even so, the means are “pulled” to the right by a sizeable number of relatively old students. Height is far more “dispersed.” So which is a more accurate descriptor: mean age or mean height? “Dispersal” is measured with two statistics: variance and standard deviation. We’ll deal with them later… Mean 23.41 ---------------- Without 25+ mean 21.65 Mean 22.83 ---------------- Without 25+ mean 21.88 Mean 65.53 Mean 65.85

Next week – Every week: Without fail – bring an approved calculator – the same one you will use for the exam. It must be a basic calculator with a square root key. NOT a scientific or graphing calculator. NOT a cell phone, etc.

(link on weekly schedule) HOMEWORK (link on weekly schedule) 1. Calculate all appropriate summary statistics for each distribution 2. Pictorially depict the distribution of arrests 3. Pictorially depict the distribution of gender Case No.   Income No. of arrests Gender 1 15600 4 M 2 21380 3 F 17220 5 18765 23220 6 44500 7 34255 8 21620 9 14890 10 16650 11 12 16730 13 23980 14 14005 15 21550 16 26780 17 18050 18 34500 19 33785 20 21450