Statistics and Data Analysis

Slides:



Advertisements
Similar presentations
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 14 Descriptive Statistics 14.1Graphical Descriptions of Data 14.2Variables.
Advertisements

Part 1: Data Presentation 1-1/41 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Exploratory Data Analysis (Descriptive Statistics)
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
1 Economics 240A Power One. 2 Outline w Course Organization w Course Overview w Resources for Studying.
Statistics Lecture 2. Last class began Chapter 1 (Section 1.1) Introduced main types of data: Quantitative and Qualitative (or Categorical) Discussed.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Part 0: Introduction 0-1/19 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Programming in R Describing Univariate and Multivariate data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 12: Describing Distributions with Numbers We create graphs to give us a picture of the data. We also need numbers to summarize the center and spread.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Statistics 3502/6304 Prof. Eric A. Suess Chapter 3.
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
1 Excursions in Modern Mathematics Sixth Edition Peter Tannenbaum.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Chapter 2 Describing Data.
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
Lecture 3 Describing Data Using Numerical Measures.
Statistics 2. Variables Discrete Continuous Quantitative (Numerical) (measurements and counts) Qualitative (categorical) (define groups) Ordinal (fall.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
To be given to you next time: Short Project, What do students drive? AP Problems.
Chapter 2 Descriptive Statistics Section 2.3 Measures of Variation Figure 2.31 Repair Times for Personal Computers at Two Service Centers  Figure 2.31.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
Part 1 – Data Presentation Statistics and Data Analysis.
StatisticsStatistics Unit 5. Example 2 We reviewed the three Measures of Central Tendency: Mean, Median, and Mode. We also looked at one Measure of Dispersion.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistics Unit 6.
Descriptive Statistics ( )
Prof. Eric A. Suess Chapter 3
Exploratory Data Analysis
Methods for Describing Sets of Data
Statistics 1: Statistical Measures
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
3 Averages and Variation
Unit 4 Statistical Analysis Data Representations
Descriptive Statistics
Module 6: Descriptive Statistics
Objective: Given a data set, compute measures of center and spread.
Unit 6 Day 2 Vocabulary and Graphs Review
Statistical Reasoning
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
IET 603 Quality Assurance in Science & Technology
Chapter 3 Describing Data Using Numerical Measures
Describing Data: Displaying and Exploring Data
Box and Whisker Plots Algebra 2.
Statistics Unit 6.
Topic 5: Exploring Quantitative data
Histograms: Earthquake Magnitudes
Descriptive and inferential statistics. Confidence interval
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Friday Lesson Do Over Statistics is all about data. There is a story to be uncovered behind the data--a story with characters, plots and problems. The.
Organizing Data AP Stats Chapter 1.
Numerical Descriptive Statistics
Welcome!.
Numerical Descriptive Measures
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Math 341 January 24, 2007.
Biostatistics Lecture (2).
Descriptive and elementary statistics
Presentation transcript:

Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

Statistics and Data Analysis Part 1 – Data Presentation Telling the story statistically

Samples are surprisingly small > 1010 Observations > Telephone sample > Sampling error

What Does it Mean? Slightly more than one-third of Americans have a favorable opinion of the Democratic-led Congress, a poll said Wednesday. The Pew Research Center for the People & the Press said the 37% expressing a positive opinion represents a decline of 13 points since April. The favorable percentage is one of the lowest in more than two decades of Pew surveys – if not the lowest, the poll said. The previous low was 40% in January, but the result is not statistically significant because of the margin of error. (USA Today) We will develop the idea of the “margin of error” and how it is computed.

Really? The following was taken from http://www.msnbc.msn.com/id/27339545/ An msnbc.com guide to presidential polls Why results, samples and methodology vary from survey to survey WASHINGTON - A poll is a small sample of some larger number, an estimate of something about that larger number. For instance, what percentage of people reports that they will cast their ballots for a particular candidate in an election? A sample reflects the larger number from which it is drawn. Let’s say you had a perfectly mixed barrel of 1,000 tennis balls, of which 700 are white and 300 orange. You do your sample by scooping up just 50 of those tennis balls. If your barrel was perfectly mixed, you wouldn’t need to count all 1,000 tennis balls — your sample would tell you that 30 percent of the balls were orange. Your sample might tell you that approximately 30 percent of the balls were orange.

The Visual Data Do Tell the Story: Napoleon’s March to and from Moscow

40 Informative Data Table Life Expectancy: Highest 15 Countries, 2010 Disability Adjusted Life Expectancy 40

A Dynamic Picture

Bar Charts vs. Data Tables

Probability of Survival to Age 50, Female at Birth U. S Probability of Survival to Age 50, Female at Birth U.S. and 20 Other Wealthy Countries It is possible to be misled by a presentation such as this one. Note the vertical axis. What does this graph tell you? What do the probabilities mean? Are the differences meaningful?

Does living longer make people happier Does living longer make people happier? Or do people live longer because they are happier?

Does the Picture Tell the Story? This is the only graphic in the article. The article compares default rates on VA vs. FHA mortgages. Is there anything wrong with this picture? The very technical looking graph/table is unrelated to the article. New York Times, Page RE1, July 24, 2014

Data Presentation Agenda Data Types: Cross Section and Time Series Summarizing Data Graphically Pie chart, bar chart Box plot, histogram Summarizing Data with Descriptive Statistics Central tendency Spread Distribution (shape)

Data = A Set of Facts A picture of some aspect of the world Pizza Sales by Type What do the data tell you? How can you use the information? What additional information would make these data (more) informative?

Data Types and Measurement Quantitative Discrete = count: Number of car accidents by city by time Continuous = quantitative measurement: Housing prices Qualitative Categorical: Shopping mall, car brand, trip mode Ordinal: Survey data on attitudes; “How do you feel about…?” Strongly disagree  Disagree  Neutral  Agree  Strongly agree Moody’s bond ratings: Aaa, Aa, A, Bbb, Bb, B, and so on. Frameworks Cross section Time series

Discrete, Count Data, Time Series

Continuous Quantitative Data Housing Prices and Incomes

Unordered Qualitative Data Travel Mode Between Sydney and Melbourne by 210 Travelers

Ordered Qualitative Data German Health Satisfaction Survey; 27,326 individuals. On a scale from 0 to 10, how do you feel about your health?

Aggregated Data May Be Easier to Understand (7-8) (4-6) (9-10) (0-3) Bad Fair Good Excellent

Ordered Qualitative Outcomes Bond Ratings Movie Ratings Arithmetic Mean may not be meaningful. (a) Ordinal measure – rankings (b) Look at that distribution!

A Problem with Ordered Survey Response Data 61 Stern Students’ Ranking of Subway Safety (1994)* Safety Count Percent Cum Pct 1 17 27.87 2 15 24.59 52.46 3 80.33 4 10 16.39 96.72 5 3.28 100.00 Very Unsatisfactory Unsatisfactory OK Satisfactory Very Satisfactory There is no objective meaning to “3” on some standard scale. Does everyone’s “1” or “2” or “3” … mean the same thing? * Jeff Simonoff: Data Presentation and Summary, pp. 3-4

Cross Section Data Housing Prices and Incomes

Time Series Data: Oil Price Graph is much more useful and informative than a table for time series data.

Representing Data In raw form Transformed to a visual form Summarized graphically Summarized statistically

Pie Chart vs. Frequency Table Pizza Pies Sold, by Type Same Information. Which is more useful for your audience?

Data Representation: Bar Chart vs. Pie Chart BAR CHART PIE CHART Same data. Which is easier to understand?

Table vs. Bar Chart (or both) 2013 data. Source: Bloomberg

2013 Valuation of U.S. Sports Teams These figures reveal a league strategy. Football Baseball

A Box Plot Describes the Distribution of Values in a Set of Data Hawaii Box and Whisker Plot for House Price Listings

Raw Data on Housing Prices and Incomes

Making a Box Plot for Per Capita Income Maximum=31136 3rd Quartile = 24933 Median =22610 Interquartile Range = IQR = 24933-21677 = 3256 1st Quartile = 21677 Minimum=17043

Box and Whisker Plot = extreme observations What is an outlier? Why do we believe a particular point is an outlier? Outliers Smaller of (Maximum, Median + 1.5 IQR 75th Percentile Interquartile range=IQR Median 25th Percentile Larger of (Minimum, Median – 1.5 IQR

Histogram for House Price Listings A histogram describes the sample data and suggests the nature of the underlying data generating process. Note the “skewness” of the distribution of listings.

Distribution of House Price Listings … shows up in the box and whisker plot. Note the long whisker at the top of the figure. Asymmetry (skewness) in the histogram of listing prices…

House Price Listings and Per Capita Incomes. States. Regression and Correlation. Are these two variables correlated? r = .48 How to describe/summarize them. How to explain the variation across states How to determine if there is any correlation between the two variables.

Big Data: Netflix Cinematch Rating/Recommendation System

Summary What story does the data presentation tell? Data in raw form tell no story. Visual representation of data tells something about the data The representation of the data may reveal something about the underlying process that the data measure. What tool is most informative? Reduction to a small number of features Visual displays of data Data Table – Organizing the data is often a good start. Pie chart Box and whisker plots Bar charts Histograms Time series plots “There are lies, damned lies and statistics.” (Benjamin Disraeli)