Lecture 2. Data Compression for One Variable George Duncan 90-786 Intermediate Empirical Methods for Public Policy and Management.

Slides:



Advertisements
Similar presentations
DESCRIBING DISTRIBUTION NUMERICALLY
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
Class Session #2 Numerically Summarizing Data
Descriptive Statistics
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
Slides by JOHN LOUCKS St. Edward’s University.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Today: Central Tendency & Dispersion
Describing Data: Numerical
Programming in R Describing Univariate and Multivariate data.
Jeopardy Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500 Final Jeopardy.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Why use boxplots? ease of construction convenient handling of outliers construction is not subjective (like histograms) Used with medium or large size.
Numerical Descriptive Techniques
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
Methods for Describing Sets of Data
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Univariate Data Chapters 1-6. UNIVARIATE DATA Categorical Data Percentages Frequency Distribution, Contingency Table, Relative Frequency Bar Charts (Always.
Review Measures of central tendency
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Table of Contents 1. Standard Deviation
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
Chapter 2 Describing Data.
Data Analysis Qualitative Data Data that when collected is descriptive in nature: Eye colour, Hair colour Quantitative Data Data that when collected is.
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Measures of Dispersion How far the data is spread out.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Agenda Descriptive Statistics Measures of Spread - Variability.
Practice Page 65 –2.1 Positive Skew Note Slides online.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
BUSINESS STATISTICS I Descriptive Statistics & Data Collection.
Chapter 2 Descriptive Statistics Section 2.3 Measures of Variation Figure 2.31 Repair Times for Personal Computers at Two Service Centers  Figure 2.31.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Unit 3: Averages and Variations Week 6 Ms. Sanchez.
The field of statistics deals with the collection,
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
MODULE 3: DESCRIPTIVE STATISTICS 2/6/2016BUS216: Probability & Statistics for Economics & Business 1.
Measures of Central Tendency. Definition Measures of Central Tendency (Mean, Median, Mode)
Number of hurricanes that occurred each year from 1944 through 2000 as reported by Science magazine Histogram Dot plot Box plot.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor
Methods for Describing Sets of Data
Quantitative Data Continued
Descriptive Measures Descriptive Measure – A Unique Measure of a Data Set Central Tendency of Data Mean Median Mode 2) Dispersion or Spread of Data A.
Chapter 3 Describing Data Using Numerical Measures
Measures of Central Tendency
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
Descriptive Statistics
MEASURES OF CENTRAL TENDENCY
Lecture 2 Chapter 3. Displaying and Summarizing Quantitative Data
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Describing Data Coordinate Algebra.
Presentation transcript:

Lecture 2. Data Compression for One Variable George Duncan Intermediate Empirical Methods for Public Policy and Management

Lecture 2: Data Compression for One Variable Forms of data compression Complex thinking about simple means Links between centers and spreads Use of Minitab

Forms of Data Compression: Relation to Level of Measurement Level of Measurement

Example How prevalent is the mayor-council form of government? What are the units of analysis? How many units have been observed? How many cases are in the sample? What type of analysis do we have? What variables are being measured? What is the level of measurement?

Form of Government in Cities Under 25,000 Population in Kansas Form of Government CM = 1, council-manager MC = 2, mayor-council CO = 3, commission

Governance Frequency Table

Governance Bar Chart

Governance Pie Chart 1. Council-manager 50% (37) 2. Mayor-council 43.2% (32) 3. Commission 6.8% (5)

Quality of Fire Departments

Fire Insurance Bar Chart

Garbage Collection Tons of Trash Collected by the City of Normal, Oklahoma for the Week of June 8, 1992

Garbage Histogram Frequency Tons of Garbage

Measures of Central Tendency Median = 73 tons Mode = 75 tons Mean (average of all observed values ) x = x =  x i n Where:

Measures of Dispersion S = 2  (x - x) 2 i n - 1 Variance = S Standard Deviation = S Range = Max - Min 2 where: Coefficient of Variation = S x

Measure of Dispersion: Garbage Example Range = = 47 Variance = Standard Deviation = 12.3 Coefficient of Variation = 0.17

Box Plot Median Q 25th percentile Q 75th percentile 1 3 Whisker Interquartile range, IQR = ( Q - Q ) 13 o Outlier (extreme data value) Inner fence = Q *IQR 1 Inner fence = Q *IQR 3 Outer fence = Q *IQR 1 Outer fence = Q *IQR 3

Garbage Box Plot Median = 73 Q = 64 Q = Max = 97 Min =

Shapes of Distribution Positive skewness Mean > Median Symmetric distribution Mean = Median Negative skewness Mean < Median

Complex Thinking about Simple Means The mean time served for drug law violation by prisoners released from U.S. Federal prisons during 1965 to 1980 was 22.4 months. The median family income in Texas in 1975 was $12,672. The modal number of commercial TV stations in 1980 among the fifty U.S. states was 12 per state.

Applications of a Mean Earnings of workers in the automobile industry averaged $ per week in the U.S. for The mean temperature in Minneapolis- St. Paul during January is minus 12 degrees Celsius. The U.S. national rate of motor-vehicle traffic deaths per 100,000 population in 1985 was As a simple example, if a y-batch is the numbers 2, 6, and 7, then Sy is 2+6+7=15. The count is n = 3; so, = Sy/n = 15/3 = 5. Some examples of data compression using a mean follow: Earnings of workers in the automobile industry averaged $ per week in the U.S. for The mean temperature in Minneapolis-St. Paul during January is minus 12 degrees Celsius. The U.S. national rate of motor- vehicle traffic deaths per 100,000 population in 1985 was 18.8.

Means can be tricky!

Links between Centers and Spreads Data = Fit + Residual XYZ Fit Locate Fit to Minimize a Function of the Residuals

Mean and Standard Deviation Average Deviation is Zero Sum of Squared Deviations is Minimized

Median and Average Absolute Deviation No more than half of the residuals are less than zero and no more than half of the residuals are greater than zero. The sum of the absolute values of the residuals is as small as possible.

Mode and Percentage of Misses As many as possible of the residuals are zero.

Next Time... Friday Workshop--Minitab Applications Lecture 3--Data Compression for Two Variables: Scatterplots