Introduction to Descriptive Statistics 17.871 Spring 2006.

Slides:



Advertisements
Similar presentations
Introduction to Descriptive Statistics Key measures Describing data MomentNon-mean based measure Center MeanMode, median Spread Variance (standard.
Advertisements

EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES
Chapter 1 Data Presentation Statistics and Data Measurement Levels Summarizing Data Symmetry and Skewness.
Exploratory Data Analysis (Descriptive Statistics)
Descriptive Statistics
BASIC STATISTICAL TOOLS
Introduction to Descriptive Statistics Spring 2007.
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
QM Spring 2002 Statistics for Decision Making Descriptive Statistics.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Analysis of Research Data
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 4: The Normal Distribution and Z-Scores.
Data observation and Descriptive Statistics
CHAPTER 1: Picturing Distributions with Graphs
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Programming in R Describing Univariate and Multivariate data.
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
With Statistics Workshop with Statistics Workshop FunFunFunFun.
Stats 95 Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures.
EPE/EDP 557 Key Concepts / Terms –Empirical vs. Normative Questions Empirical Questions Normative Questions –Statistics Descriptive Statistics Inferential.
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
CHAPTER 7: Exploring Data: Part I Review
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Introduction to Descriptive Statistics Objectives: 1.Explain the general role of statistics in assessment & evaluation 2.Explain three methods for describing.
Chapter 2 Describing Data.
Descriptive Statistics becoming familiar with the data.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Univariate EDA. Quantitative Univariate EDASlide #2 Exploratory Data Analysis Univariate EDA – Describe the distribution –Distribution is concerned with.
Unit 4 Statistical Analysis Data Representations.
BUSINESS STATISTICS I Descriptive Statistics & Data Collection.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
CHAPTER 1 Basic Statistics Statistics in Engineering
The field of statistics deals with the collection,
Sampling ‘Scientific sampling’ is random sampling Simple random samples Systematic random samples Stratified random samples Random cluster samples What?
UNIT #1 CHAPTERS BY JEREMY GREEN, ADAM PAQUETTEY, AND MATT STAUB.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Descriptive Statistics – Graphic Guidelines
LIS 570 Summarising and presenting data - Univariate analysis.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
1 Day 1 Quantitative Methods for Investment Management by Binam Ghimire.
CHAPTER 1 Basic Statistics Statistics in Engineering
1 Take a challenge with time; never let time idles away aimlessly.
Copyright © 2009 Pearson Education, Inc. 3.2 Picturing Distributions of Data LEARNING GOAL Be able to create and interpret basic bar graphs, dotplots,
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Homework solution#1 Q1: Suppose you have a sample from Palestine University and the distribution of the sample as: MedicineDentistEngineeringArtsCommerce.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Descriptive Statistics – Graphic Guidelines Pie charts – qualitative variables, nominal data, eg. ‘religion’ Bar charts – qualitative or quantitative variables,
AP Statistics. Chapter 1 Think – Where are you going, and why? Show – Calculate and display. Tell – What have you learned? Without this step, you’re never.
COMPLETE BUSINESS STATISTICS
Methods for Describing Sets of Data
EHS 655 Lecture 4: Descriptive statistics, censored data
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Module 6: Descriptive Statistics
Description of Data (Summary and Variability measures)
Univariate Descriptive Statistics
Laugh, and the world laughs with you. Weep and you weep alone
Introduction to Descriptive Statistics
Univariate Statistics
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
Presentation transcript:

Introduction to Descriptive Statistics Spring 2006

First, Some Words about Graphical Presentation Aspects of graphical integrity (following Edward Tufte, Visual Display of Quantitative Information) –Represent number in direct proportion to numerical quantities presented –Write clear labels on the graph –Show data variation, not design variation –Deflate and standardize money in time series

Population vs. Sample Notation PopulationVsSample GreeksRomans , ,  s, b

Types of Variables Nominal (Qualitative) U&H: “categorical” ~Nominal (Quantitative) Ordinal Interval or ratio

Describing data MomentNon-mean based measure CenterMeanMode, median SpreadVariance (standard deviation) Range, Interquartile range SkewSkewness-- PeakedKurtosis--

Mean

Variance, Standard Deviation

Variance, S.D. of a Sample Degrees of freedom

The z-score or the “standardized score”

Skewness Symmetrical distribution IQ SAT “No skew” “Zero skew” Symmetrical

Skewness Asymmetrical distribution GPA of MIT students “Negative skew” “Left skew”

Skewness (Asymmetrical distribution) Income Contribution to candidates Populations of countries “Residual vote” rates “Positive skew” “Right skew”

Skewness

Kurtosis leptokurtic platykurtic mesokurtic Beware the “coefficient of excess”

A few words about the normal curve Skewness = 0 Kurtosis = 3

More words about the normal curve 34% 47% 49%

“Empirical rule”

SEG example The instructor and/or section leader: Means.d.SkewKurtGraph Gives well-prepared, relevant presentations Explains clearly and answers questions well Uses visual aids well Uses information technology effectively Speaks well Encourages questions & class participation Stimulates interest in the subject Is available outside of class for questions Overall rating of teaching

Graph some SEG variables The instructor and/or section leader: Means.d.SkewKurtGraph Uses visual aids well Encourages questions & class participation

Binary data

Commands in STATA for getting univariate statistics summarize varname summarize varname, detail histogram varname, bin() start() width() density/fraction/frequency normal graph box varnames tabulate [NB: compare to table]

Example of Sophomore Test Scores High School and Beyond, 1980: A Longitudinal Survey of Students in the United States (ICPSR Study 7896) totalscore = % of questions answered correctly on a battery of questions recodedtype = (1=public school, 2=religious private private, 3 = non-sectarian private)

Explore totalscore some more. table recodedtype,c(mean totalscore) recodedty | pe | mean(totals~e) | | |

Graph totalscore. hist totalscore

Divide into “bins” so that each bar represents 1% correct hist totalscore,width(.01) (bin=124, start= , width=.01)

Add ticks at each 10% mark histogram totalscore, width(.01) xlabel(-.2 (.1) 1) (bin=124, start= , width=.01)

Superimpose the normal curve (with the same mean and s.d. as the empirical distribution). histogram totalscore, width(.01) xlabel(-.2 (.1) 1) normal (bin=124, start= , width=.01)

Do the previous graph by school types.histogram totalscore, width(.01) xlabel(-.2 (.1)1) by(recodedtype) (bin=124, start= , width=.01)

Main issues with histograms Proper level of aggregation Non-regular data categories (see next)

A note about histograms with unnatural categories (start here) From the Current Population Survey (2000), Voter and Registration Survey How long (have you/has name) lived at this address? -9 No Response -3 Refused -2 Don't know -1 Not in universe 1 Less than 1 month months months years years 6 5 years or longer

Simple graph

Solution, Step 1 Map artificial category onto “natural” midpoint -9 No Response  missing -3 Refused  missing -2 Don't know  missing -1 Not in universe  missing 1 Less than 1 month  1/24 = months  3.5/12 = months  9/12 = years  years  years or longer  10 (arbitrary)

Graph of recoded data

Density plot of data Total area of last bar =.557 Width of bar = 11 (arbitrary) Solve for: a = w h (or).557 = 11h => h =.051

Density plot template CategoryFX-minX-maxX-length Height (density) < 1 mo / * 1-6 mo /12½ mo..0430½ yr yr yr * =.0156/.082

Draw the previous graph with a box plot. graph box totalscore Upper quartile Median Lower quartile } Inter-quartile range } 1.5 x IQR

Draw the box plots for the different types of schools. graph box totalscore,by(recodedtype)

Draw the box plots for the different types of schools using “over” option graph box totalscore,over(recodedtype)

Issue with box plots Sometimes overly highly stylized

Three words about pie charts: don’t use them

So, what’s wrong with them For non-time series data, hard to get a comparison among groups; the eye is very bad in judging relative size of circle slices For time series, data, hard to grasp cross- time comparisons

Time series example

An exception to the no pie chart rule

The worst graph ever published