Download presentation
Presentation is loading. Please wait.
Published byClinton Henderson Modified over 9 years ago
1
Statistics for Social and Behavioral Sciences Session #5: The Regression Line (Agresti and Finlay, Chapter 9) Prof. Amine Ouazad
2
Statistics Course Outline P ART I. I NTRODUCTION AND R ESEARCH D ESIGN P ART II. D ESCRIBING DATA P ART III. D RAWING CONCLUSIONS FROM DATA : I NFERENTIAL S TATISTICS P ART IV. : C ORRELATION AND C AUSATION : R EGRESSION A NALYSIS Week 1 Weeks 2-4 Weeks 5-9 Weeks 10-14 This is where we talk about Zmapp and Ebola! Firenze or Lebanese Express? Where we are right now! Describing associations between two variables
3
Last Session Descriptive statistics summarize data, to make it easier to assimilate the information. Measuring the distribution of a variable Mean, Median. Range, standard deviation. – Applies both to bell-shaped and non bell-shaped distributions (e.g. the superstar distribution). Bell-shaped distributions. ➥ Empirical rule applies! Measuring associations Contingency table. Scatter plot.
4
Outline 1.Scatter plot, linear relationship – Unemployment and Crime 1.The regression line – What is the relationship between height and weight? 2.Warning: Correlation is not causation – Spurious relationships Next session:Bivariate analysis Chapter 9 of A&F, continued
5
Unemployment ⇒ Crime ? Is there really a link? ATLANTIC CITY - "With the layoffs the city is going to have, we'll have to expect that increase in crime." With an increase in unemployment and crime typically going hand-in-hand, Atlantic City PBA President Paul Barbere believes a challenging time lies ahead for the Atlantic City Police Department. That's going require us to respond to more calls for service, more calls for services requires more time out of service for our patrol units, with fewer patrol units, it's going to be difficult," said Barbere. This potential spike in crime comes during a time when Barbere says the department is already short-handed. "With the police department, we're running about 30 men and women short of what the ordinance calls for."
6
Unemployment ➮ Crime ? On the boardwalk, the potential for more crime has the valuable tourist the city relies on questioning what lies ahead. "They should have a plan designed for that, because they certainly don't want to dissuade people from coming here," said Yvette Dilworth of Queens, New York. "I don't know what Atlantic City is going to do to prepare for that but obviously when you're losing jobs the crime rate could come up," said Chris Mascioli of Camden County. "So yeah I'm concerned about it." In addition to the potential increase in calls stemming from unemployment, police will also have to keep an eye on the newly vacant casinos. "We'll have to maintain a certain staff to keep mechanicals going and to ensure the integrity and safety of the buildings themselves. That's not to say people won't try to break in," said Barbere And even with less officers and more unemployment in the city, Barbere is confident the department is capable of rising to the challenge. "The men and women of the Atlantic City Police Department are well trained and have been dealing with this staffing for sometime now,” said Barbere. “So it's nothing they can't handle."
7
United States data Data set: County Characteristics 2000-2007. Observation: County. Number of observations? Variables: Unemployed persons, 2005. Number of Murders reported to police, 2004. Comments? Self Check Observational data Experimental data Unemployed persons Categorical variable Quantitative variable Unemployed persons Discrete variable Continuous variable Number of murders Categorical variable Quantitative variable Number of murders Discrete variable Continuous variable Survey data Online data Administrative data
8
Scatter plot Number of murders reported to police Number of observations: 2,957 Mean: 5.07 Median:0 Std. Dev:28.30 Min: 0 Max: 1,038 P25:0P75:2 Unemployed persons Number of observations: 3,133 Mean: 2,414.56 Median:665 Std. Dev:7,985 Min: 4 Max: 256,236 P25:285P75:1683 Which is the response variable and which is the explanatory variable?
9
Distribution of Murders Kind of distribution Bell shaped Superstar distribution (Spotify) The Empirical Rule applies True False County Name Murders in 2004 Los Angeles County 1038 Wayne County 415 Harris County 346 Philadelphia County 330 Maricopa County 281 Dallas County 278 Baltimore city 276
10
Scatter plot Number of murders reported to police Number of observations: 2,957 Mean: 5.07 Median:0 Std. Dev:28.30 Min: 0 Max: 1,038 P25:0P75:2 Unemployed persons Number of observations: 3,133 Mean: 2,414.56 Median:665 Std. Dev:7,985 Min: 4 Max: 256,236 P25:285P75:1683
11
Linear Relationship? y = + x Murders = + Unemployed + 20,000 unemployed + 20,000 unemployed An increasing relationship, >0
12
What a Linear Relationship Implies A increase in the number of unemployed raises the number of murders by * the increase. A decline in the number of unemployed raises the number of murders by * the decline. An increase in the number of unemployed by, say, 10,000, raises the number of murders by the same amount regardless of whether there were initially 0 murders or 300 murders. – No gang formation? A decline in the number of unemployed by, say, 10,000, lowers the number of murders by the same amount regardless of whether there were initially 0 murders or 300 murders. – Shouldn’t it be tougher to lower the number of murders than to raise it? This is a model, a simplification of the world
13
What we can do with a linear relationship Extrapolate – Predict. – With more local data (census block, census tract, ZIP code level) – With individual data. (Minority report style, possible with Danish or Swedish data). Interpolate – Fill in the gaps. – When data is missing.
14
The Los Angeles Police Department, like many urban police forces today, is both heavily armed and thoroughly computerised. The Real-Time Analysis and Critical Response Division in downtown LA is its central processor. Rows of crime analysts and technologists sit before a wall covered in video screens stretching more than 10 metres wide. Multiple news broadcasts are playing simultaneously, and a real-time earthquake map is tracking the region’s seismic activity. Half-a-dozen security cameras are focused on the Hollywood sign, the city’s icon. In the centre of this video menagerie is an oversized satellite map showing some of the most recent arrests made across the city – a couple of burglaries, a few assaults, a shooting. On a slightly smaller screen the division’s top official, Captain John Romero, mans the keyboard and zooms in on a comparably micro-scale section of LA. It represents just 500 feet by 500 feet. Over the past six months, this sub-block section of the city has seen three vehicle burglaries and two property burglaries – an atypical concentration. And, according to a new algorithm crunching crime numbers in LA and dozens of other cities worldwide, it’s a sign that yet more crime is likely to occur right here in this tiny pocket of the city. The algorithm at play is performing what’s commonly referred to as predictive policing. Using years – and sometimes decades – worth of crime reports, the algorithm analyses the data to identify areas with high probabilities for certain types of crime, placing little red boxes on maps of the city that are streamed into patrol cars. “Burglars tend to be territorial, so once they find a neighbourhood where they get good stuff, they come back again and again,” Romero says. “And that assists the algorithm in placing the boxes.” The dashboard for New York Police Department's 'Domain Awareness System'. Photograph: Shannon Stapleton/Reuters
15
Outline 1.Scatter plot, linear relationship – Back to height and weight. 1.The regression line – What is the relationship between height and weight? 2.Warning: Correlation is not causation – Spurious relationships Next session:Bivariate analysis Chapter 9 of A&F, continued
16
Finding the regression line Any line is imperfect…
17
Finding the regression line Which line is the right one? A line is entirely determined by the choice of and . An essential formula. Notice the difference between b and , between a and . x is the explanatory variable y is the response variable If y increases when x increases, then b>0 If y decreases when x increases, then b<0
18
Why do we call this regression? “Regression towards mediocrity in Hereditary Stature”, Sir Francis Galton, 1886. What are y,x,b here? Sir F. Galton
19
Outline 1.Scatter plot, linear relationship – Back to height and weight. 1.The regression line – What is the relationship between height and weight? 2.Warning: Association is not causation – Spurious relationships Next session:Bivariate analysis Chapter 9 of A&F, continued
20
“More than a fifth of people on unemployment benefits have a criminal record, government figures have revealed. The new data showed an estimated 22 per cent of all people claiming out of work claimants - such as Jobseeker’s Allowance - were made by people who had been to prison or convicted of an offence in the previous 12 years.” Chris Grayling, the Justice Secretary, is pushing through reforms which aim to provide more support to offenders who are released from jail back into the community. Jeremy Wright, the justice minister, said: “We are committed to delivering long-needed changes that will see all offenders released from prison receive targeted support to finally turn themselves around and start contributing to society.”
21
Unemployment and Crime “The figures also showed 44 per cent of offenders were claiming benefits a month after being convicted, cautioned or released from jail.” “More than half of offenders - 54 per cent - released from prison were claiming out-of-work benefits one month later, gradually decreasing to 42 per cent two years after.” “In all, 214,000 people claiming out-of-work benefits had been to prison at least once in the previous 12 years, or 4 per cent of the total.” “Previous data published in 2011 estimated the proportion of criminal claimants was slightly higher, at 26 per cent, but a Ministry of Justice spokesman said the sets of figures were not directly comparable.” Chris Grayling, Justice Secretary (UK)
22
Reading is an important skill, and elementary school teachers have observed that the reading ability of their students tends to increase with their shoe size. To help boost reading skills, should policymakers offer prizes to scientists to devise methods to increase the shoe size of elementary school children? Obviously, the tendency for shoe size and reading ability to increase together does not mean that big feet cause improvements in reading skills. Older children have bigger feet, but they also have more developed brains. This natural development of children explains the simple observation that shoe size and reading ability have a tendency to increase together—that is, they are positively correlated. But clearly there is no relationship: bigger shoe size does not cause better reading ability. In economics, correlations are common. But identifying whether the correlation between two or more variables represents a causal relationship is rarely so easy. Countries that trade more with the rest of the world also have higher income levels—but does this mean that trade raises income levels? People with more education tend to have higher earnings, but does this imply that education results in higher earnings? Knowing precise answers to these questions is important. If additional years of schooling caused higher earnings, then policymakers could reduce poverty by providing more funding for education. If an extra year of education resulted in a $20,000 a year increase in earnings, then the benefits of spending on education would be a lot larger than if an extra year of education caused only a $2 a year increase.
23
Association is not causation dd What is the point of this example?
25
Association is not causation The response variable may be the explanatory variable and vice verse (reverse causation). There may be other factors that affect the response variable, other than the explanatory variable. ☞ Multivariate statistics coming up in week 12. Univariate statistics Inspecting the distribution of one variable. Am I taller than the average? Than the median? What percentile of the distribution do I belong to? Bivariate statistics Discovering associations between 2 variables. What is the relationship between parents’ height and children’s height? What is the relationship between unemployment and crime? Multivariate statistics Uncovering causality: looking at the impact of multiple explanatory variables on one response variable What factors cause crime? Poverty, unemployment, guns, police headcounts? Weeks 1 and 2 Now and next week Week 12
26
Wrap up From a scatter plot to a linear relationship – A linear relationship is a model, imperfect. – A linear relationship implies constant gradients. – A linear relationship helps predict/extrapolate, interpolate to fill missing statistics. Finding the regression line – The regression line minimizes the sum of squared errors. – The formula for a and b are essential to learn. Association is not causation – Does x cause y or does y cause x? – Is there any other factor that may cause y?
27
Next session: Minority Report continues We keep working on chapter 9: Reading: Agresti and Finlay, Chapter 9. (yes, we’re jumping there and then back to chapter 4 to understand inferential statistics in week 5) For help: Note Irene’s new office hours. Amine Ouazad Office 1135, Social Science building amine.ouazad@nyu.edu Office hour: Wednesday from 4 to 5pm. GAF: Irene Paneda Irene.paneda@nyu.edu Sunday recitations. At the Academic Resource Center, Monday from 2 to 4pm.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.