Download presentation
Presentation is loading. Please wait.
1
Lecture 2. Data Compression for One Variable George Duncan 90-786 Intermediate Empirical Methods for Public Policy and Management
2
Lecture 2: Data Compression for One Variable Forms of data compression Complex thinking about simple means Links between centers and spreads Use of Minitab
3
Forms of Data Compression: Relation to Level of Measurement Level of Measurement
4
Example How prevalent is the mayor-council form of government? What are the units of analysis? How many units have been observed? How many cases are in the sample? What type of analysis do we have? What variables are being measured? What is the level of measurement?
5
Form of Government in Cities Under 25,000 Population in Kansas Form of Government...... CM = 1, council-manager MC = 2, mayor-council CO = 3, commission
6
Governance Frequency Table
7
Governance Bar Chart
8
Governance Pie Chart 1. Council-manager 50% (37) 2. Mayor-council 43.2% (32) 3. Commission 6.8% (5)
9
Quality of Fire Departments
10
Fire Insurance Bar Chart
11
Garbage Collection Tons of Trash Collected by the City of Normal, Oklahoma for the Week of June 8, 1992
12
Garbage Histogram 50-60 60-70 70-80 80-90 90-100 30 25 20 15 10 5 0 Frequency Tons of Garbage
13
Measures of Central Tendency Median = 73 tons Mode = 75 tons Mean (average of all observed values ) x = 72.97 x = x i n Where:
14
Measures of Dispersion S = 2 (x - x) 2 i n - 1 Variance = S Standard Deviation = S Range = Max - Min 2 where: Coefficient of Variation = S x
15
Measure of Dispersion: Garbage Example Range = 97 - 50 = 47 Variance = 151.3 Standard Deviation = 12.3 Coefficient of Variation = 0.17
16
Box Plot Median Q 25th percentile Q 75th percentile 1 3 Whisker Interquartile range, IQR = ( Q - Q ) 13 o Outlier (extreme data value) Inner fence = Q - 1.5 *IQR 1 Inner fence = Q + 1.5 *IQR 3 Outer fence = Q - 3.0 *IQR 1 Outer fence = Q + 3.0 *IQR 3
17
Garbage Box Plot Median = 73 Q = 64 Q = 82.25 Max = 97 Min = 50 1 3
18
Shapes of Distribution Positive skewness Mean > Median Symmetric distribution Mean = Median Negative skewness Mean < Median
19
Complex Thinking about Simple Means The mean time served for drug law violation by prisoners released from U.S. Federal prisons during 1965 to 1980 was 22.4 months. The median family income in Texas in 1975 was $12,672. The modal number of commercial TV stations in 1980 among the fifty U.S. states was 12 per state.
20
Applications of a Mean Earnings of workers in the automobile industry averaged $577.30 per week in the U.S. for 1986. The mean temperature in Minneapolis- St. Paul during January is minus 12 degrees Celsius. The U.S. national rate of motor-vehicle traffic deaths per 100,000 population in 1985 was 18.8. As a simple example, if a y-batch is the numbers 2, 6, and 7, then Sy is 2+6+7=15. The count is n = 3; so, = Sy/n = 15/3 = 5. Some examples of data compression using a mean follow: Earnings of workers in the automobile industry averaged $577.30 per week in the U.S. for 1986. The mean temperature in Minneapolis-St. Paul during January is minus 12 degrees Celsius. The U.S. national rate of motor- vehicle traffic deaths per 100,000 population in 1985 was 18.8.
21
Means can be tricky!
22
Links between Centers and Spreads Data = Fit + Residual XYZ Fit Locate Fit to Minimize a Function of the Residuals
23
Mean and Standard Deviation Average Deviation is Zero Sum of Squared Deviations is Minimized
24
Median and Average Absolute Deviation No more than half of the residuals are less than zero and no more than half of the residuals are greater than zero. The sum of the absolute values of the residuals is as small as possible.
25
Mode and Percentage of Misses As many as possible of the residuals are zero.
26
Next Time... Friday Workshop--Minitab Applications Lecture 3--Data Compression for Two Variables: Scatterplots
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.