Air pollution is the introduction of chemicals and biological materials into the atmosphere that causes damage to the natural environment. We focused on Sulfur Dioxide as a major contributor to air pollution. Sulfur is: Highly reactive gas Cause of acid rain Precursor to respiratory and cardiovascular problems Air pollution is an ongoing problem worldwide, now more than ever. We conduct a cross-sectional study of the air pollution levels in terms of Sulfur and related factors for 41 US cities using the means over the years By running several regressions we attempt to determine the likely causes of air pollution.
CitySO 2 TemperatureManPopulationWindRainRainDays Phoenix Little Rock San Francisco Denver Hartford Wilmington Washington Jacksonville …….…. The data are means over the years
1. City: City 2. SO 2 : Sulfur dioxide content of air in micrograms per cubic meter 3. Temp: Average annual temperature in degrees Fahrenheit 4. Man: Number of manufacturing enterprises employing 20 or more workers 5. Pop: Population size in thousands from the 1970 census 6. Wind: Average annual wind speed in miles per hour 7. Rain: Average annual precipitation in inches 8. RainDays: Average number of days with precipitation per year
Histogram of sulfur levels: Since the data has a high Jarque-Bera test and are positively skewed, sulfur levels are not normally distributed.
We ran a number of bi-variate regressions to find out which independent variables significantly explain SO 2 levels, both including and excluding dummy variables. Next we ran a multi-variate regression to see if the variables that we found to be significant are significant in explaining SO 2 levels when combined. We then tested for multicollinearity and lastly investigated an interesting problem.
Dependent Variable: SO2 Method: Least Squares Date: 11/22/09 Time: 14:03 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. TEMPERATURE C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Temperature significantly explains SO 2 levels due to the high t-statistic and low p-values. The coefficient of temperature is negative meaning SO 2 levels decrease as temperature increases.
Dependent Variable: SO2 Method: Least Squares Date: 11/22/09 Time: 14:22 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. MAN C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Manufacturing Enterprises significantly explains SO 2 levels due to the high t-statistic and low p-values. The positive coefficient of man means that as number of manufacturing enterprises increases so do SO 2 levels.
Dependent Variable: SO2 Method: Least Squares Date: 11/22/09 Time: 14:16 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. POPULATION C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Population significantly explains SO 2 levels due to the high t-statistic and low p-values. The coefficient of population is positive meaning as population increases, so does SO 2.
Dependent Variable: SO2 Method: Least Squares Date: 11/22/09 Time: 14:07 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. WIND C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Wind does not significantly explain SO 2 levels as can be seen by the low t-statistic and low R-square. It thus makes sense to take the wind variable out of our regression model.
Dependent Variable: SO2 Method: Least Squares Date: 11/22/09 Time: 14:14 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. RAIN C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Rain does not significantly explain SO 2 levels due to the low t-statistic and low R-squared. We thus remove the wind variable from our regression model.
Dependent Variable: SO2 Method: Least Squares Date: 11/22/09 Time: 14:09 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. RAINYDAYS C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) RainyDays does significantly explain the SO 2 levels due to the high t- statistic and low p-value. The coefficient of rainydays is positive meaning the SO 2 levels will increase as the number of rainy days increases.
Dependent Variable: SO2 Method: Least Squares Date: 12/03/09 Time: 00:56 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. TEMPERATURE RAINYDAYS POPULATION MAN C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)
Dependent Variable: RAINYDAYS Method: Least Squares Date: 11/22/09 Time: 14:35 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. TEMPERATURE C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Multicollinearity does exist because the two variables are significantly correlated; they have a high t-statistic and high R-square. RainyDays and Temperature are negatively correlated, as temperature goes up, rainy days goes down.
Dependent Variable: SO2 Method: Least Squares Date: 11/22/09 Time: 14:44 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. TEMPERATURE RAINYDAYS C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Since multicollinearity exists, we cannot look at the t-statistic for a regression using these two variables as the independent variables. We can however, continue to use the F- statistic to determine if these two variables collectively significantly impact SO 2 levels. As it turns out we cannot tell which variable significantly impacts the SO 2 level.
Box plot indicating the two outliers : Providence (94) and Chicago (110) Smallest = 8 (Wichita) Q1 = 12.5 Median = 26 (Richmond) Q3 = 35.5 Largest = 110 (Chicago) IQR =
Dependent Variable: SO2 Method: Least Squares Date: 12/02/09 Time: 14:26 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. TEMPERATURE C C C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Temperature still significantly explains SO 2 levels due to the high t-statistic and low p-values.
Dependent Variable: SO2 Method: Least Squares Date: 12/02/09 Time: 14:30 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. MAN C C C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Manufacturing Enterprises still significantly explains SO 2 levels due to the high t-statistic and low p-values.
Dependent Variable: SO2 Method: Least Squares Date: 12/02/09 Time: 14:31 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. POPULATION C C C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Population no longer significantly explains SO 2 levels due to the low t- statistic and high p-values.
Dependent Variable: SO2 Method: Least Squares Date: 12/02/09 Time: 14:28 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. WIND C C C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Wind still does not significantly explain SO 2 levels as can be seen by the low t-statistic and low R-square.
Dependent Variable: SO2 Method: Least Squares Date: 12/02/09 Time: 14:30 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. RAIN C C C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Rain still does not significantly explain SO 2 levels due to the low t- statistic and low R-squared.
Dependent Variable: SO2 Method: Least Squares Date: 12/02/09 Time: 14:29 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. RAINYDAYS C C C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Rainy Days still significantly explains the SO 2 levels due to the high t- statistic and low p-value.
Dependent Variable: TEMPERATURE Method: Least Squares Date: 12/02/09 Time: 15:38 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. RAINYDAYS C C C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Multicollinearity still exists because the two variables are significantly correlated.
Dependent Variable: SO2 Method: Least Squares Date: 12/02/09 Time: 15:30 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. TEMPERATURE RAINYDAYS C C C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) According to the F-statistic temperature and rainy days are significantly related to SO 2 levels. However since multicollinearity exists we cannot refer to the t-statistic and therefore do not know how significant each variable is.
Our final model includes the two dummy variables. This regression model has a significant F-statistic and a small p-value. Dependent Variable: SO 2 Method: Least Squares Date: 12/02/09 Time: 14:37 Sample: 1 41 Included observations: 41 VariableCoefficientStd. Errort-StatisticProb. TEMPERATURE RAINYDAYS MAN C C C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)
Histogram of sulfur levels with dummy variables: The data has a low Jarque-Bera test, a high probability and is slightly positively skewed, so sulfur levels are normally distributed.
According to the figure above, there is an indication of heteroskedasticity. However since this is a cross sectional analysis, it does not have a significant impact on our final regression.
From our regression model, we find that temperature, rainy days and manufacturing all have a significant effect on SO 2 levels, explaining 72% of the sulfur levels. Out of the three variables however, manufacturing enterprises is the most significant explanatory variable. Economic Impact: Given that SO 2 is a threat to human wellbeing and the environment, lowering the SO 2 levels can reduce future costs. SO 2 pollution is preventable as it stems from human activity. Lower SO 2 levels could be achieved by future restrictions on the number of manufacturing enterprises or on the emission levels of SO 2 they release.