Data Once the data starts to flow, our attention turns to data analysis –Data preparation – includes editing, coding and data entry –Exploring, displaying.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Chapter 2 Exploring Data with Graphs and Numerical Summaries
Descriptive Measures MARE 250 Dr. Jason Turner.
CHAPTER 1 Exploring Data
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 2 Exploring Data with Graphs and Numerical Summaries Section 2.2 Graphical Summaries.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
IB Math Studies – Topic 6 Statistics.
Chapter 5: Understanding and Comparing Distributions
Chapter Sixteen EXPLORING, DISPLAYING, AND EXAMINING DATA
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Experimental Statistics I.  We use data to answer research questions  What evidence does data provide?  How do I make sense of these numbers without.
Understanding and Comparing Distributions
CHAPTER 1: Picturing Distributions with Graphs
2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display.
Statistics: Use Graphs to Show Data Box Plots.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
REPRESENTATION OF DATA.
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
© Copyright McGraw-Hill CHAPTER 2 Frequency Distributions and Graphs.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
1.1 Displaying Distributions with Graphs
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Chapter 1: Exploring Data
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Chapter 2 Describing Data.
Unit 4 Statistical Analysis Data Representations.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Exploratory Data Analysis Exploratory Data Analysis Dr.Lutz Hamel Dr.Joan Peckham Venkat Surapaneni.
Engineering Statistics KANCHALA SUDTACHAT. Statistics  Deals with  Collection  Presentation  Analysis and use of data to make decision  Solve problems.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 5 – Slide 1 of 21 Chapter 3 Section 5 The Five-Number Summary And Boxplots.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 1 Exploring Data 1.2 Displaying Quantitative.
MATH 2311 Section 1.5. Graphs and Describing Distributions Lets start with an example: Height measurements for a group of people were taken. The results.
Class Two Before Class Two Chapter 8: 34, 36, 38, 44, 46 Chapter 9: 28, 48 Chapter 10: 32, 36 Read Chapters 1 & 2 For Class Three: Chapter 1: 24, 30, 32,
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
UNIT ONE REVIEW Exploring Data.
Graphing options for Quantitative Data
Exploratory Data Analysis
Part Four ANALYSIS AND PRESENTATION OF DATA
Exploring, Displaying, and Examining Data
Chapter 1: Exploring Data
Warm Up.
Statistical Reasoning
Laugh, and the world laughs with you. Weep and you weep alone
Displaying Distributions with Graphs
Topic 5: Exploring Quantitative data
Describing Distributions of Data
Exploring, Displaying, and Examining Data
10.5 Organizing & Displaying Date
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Honors Statistics Review Chapters 4 - 5
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Data Once the data starts to flow, our attention turns to data analysis –Data preparation – includes editing, coding and data entry –Exploring, displaying and examining data – the search for meaningful patterns –Data mining – used to extract patterns and predictive trends from databases

Data Editing Checking entries for correctness, consistency Coding – assigning numbers Data entry – spreadsheet, data editor of a statistical program or database

Exploring Data You could move directly into the statistical analysis … When the study’s purpose is not the production of causal inferences, confirmatory data analysis is not required When it is, you should discover as much as possible about the data before selecting the appropriate means of confirmation

Exploratory Data Analysis Set of techniques The flexibility to respond to the patterns revealed by successive iterations in the discovery process is an important attribute EDA can be compared to the role of the police detectives and other investigators Confirmatory analysis can be compared to the role of the judge The former are involved in the search for clues the latter are preoccupied with evaluating the strength

EDA Free to take many paths in revealing mysteries in the data Emphasizes visual representations and graphical techniques over summary statistics Summary statistics, may obscure, conceal the underlying structure of the data When numerical summaries are used exclusively and accepted without visual inspection, the selection of confirmatory modes may be based on flawed assumptions and may produce erroneous conclusions

Techniques for Displaying Data Frequency Tables Bar Charts Pie Charts

Frequency Tables Information –Displays the data from the lowest value to the highest –Columns for percent –Percent adjusted for missing values –Cumulative percent

A Frequency Table for Market Sector Value Label Value Frequency % Valid % Cum. % Chemicals Consumer Products Durables Energy Financial Health High-Tech Insurance Retailing Other Total Valid Cases 100 Missing Cases 0

Sector Bar Chart Display

Sector Pie Chart Display

Analysis The values and percentages are more readily understood in graphic format. The relative sizes of the sectors can be visualized with the bar and pie

Another Frequency Table (Ratio-Interval Data) Row Value Freq. % Cum.% Row Value Freq. % Cum.% Row Value Freq. % Cum.% …

Interval-Ratio Data The last chart was not informative Primary contribution was an ordered list of values If converted to a bar chart, it would have 48 bars of equal length and two bars with two occurrences A pie chart would also be pointless Notice that when the variable of interest is measured on an interval-ration scale and is one of many potential values, these techniques are not particularly informative

Histogram Conventional solution for display of interval-ratio data Group the variable’s values into intervals Useful –Displaying all intervals in a distribution even those without observed values –Examining the shape of the distribution for skewness, kurtosis and the modal pattern

Histogram Questions to ask –Is there a single hump? –Are subgroups identifiable when multiple modes are present? –Are straggling data values detached from the central concentration?

Histogram when grouping in increments of 20

Observations Intervals with 0 counts show gaps in the data and alert the analyst to look for problems with spread There are two extreme values Along with the peaked midpoint and reduced number of observations in the upper tail, this histogram warns us of irregularities in the data.

Stem and Leaf Displays Closely related to the histogram Shares features but offers unique advantages Easy to construct by hand for small samples In contrast to histograms which lose information by grouping values into intervals, actual data can be inspected directly Range of data is apparent at a glance Also shape and spread impressions immediate

Stem and Leaf Displays To develop, the first digit of each data item are arranged to the left of a vertical line. Each row is referred to as a stem and each piece of information leaf

Example of a Stem and Leaf Display

Boxplots Another technique for exploratory data analysis Boxplot reduces the detail of the stem-and-leaf display and provides a different visual image of the distribution’s location, spread, shape, tail length, and outliers Summary consists of the median, upper and lower quartiles, and the largest and smallest observations. The median and quartiles are used because they are particularly resistant statistics.

Resistant Statistics Example: data set = [5,6,6,7,7,7,8,8,9] The mean is 7 and the standard deviation 1.23 Replace the 9 with 90 and the mean becomes 16 and the standard deviation Changing only one of the nine values has disturbed the location and spread summaries to the point where they no longer represent the other eight values. Both mean and standard deviation are considered nonresistant statistics The median remained at 7 and the lower and upper quartiles stayed at 6 and 8, respectively.

Boxplots Rectangular plot that encompasses 50 percent of the data values A center line ( or other notation) marking the median and going through the width of the box The edges of the box are called hinges The whiskers that extend from the right and left hinges to the largest and smallest values

Boxplot Components Outside Value Or outlier Smallest observed value within 1.5 IQR of lower hinge Largest observed value within 1.5 IQR of upper hinge Whiskers Median IQR1.5IQR 50% of observed Values are within the box Outside Value Or outlier Extreme Or far Outside value Inner fence 1.5(IQR) plus Upper hinge Outer fence 3(IQR) plus Upper hinge Inner fence Lower hinge Minus 1.5(IQR) Outer fence Lower hinge Minus 3(IQR)

Example Minimum = 54.9 Lower hinge = 60.3 Median = Upper hinge = Maximum = IQR = – 60.3 = (IQR) = Inner fence lower hinge = 60.3 – ( ) = Inner fence upper hinge = ( ) = The smallest and largest values from the distribution within the fences are used to determine the whisker length

Observations In preliminary analysis, it is important to separate legitimate outliers from errors in measurement, editing, coding and data entry Outliers that are mistakes should be corrected or removed

Other Observations Symmetric Right Skewed Left Skewed Small Spread

Visual Techniques of EDA Gain insight into the data More common ways of summarizing location, spread, and shape Used resistant statistics From these we could make decisions on test selection and whether the data should be transformed or reexpressed before further analysis