Exploratory Data Analysis Statistics 2126. Introduction If you are going to find out anything about a data set you must first understand the data Basically.

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

C. D. Toliver AP Statistics
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 14 Descriptive Statistics 14.1Graphical Descriptions of Data 14.2Variables.
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Descriptive Statistics Summarizing data using graphs.
Beginning the Visualization of Data
It’s an outliar!.  Similar to a bar graph but uses data that is measured.
Reasoning in Psychology Using Statistics
Stem and Leaf Display Stem and Leaf displays are an “in between” a table and a graph – They contain two columns: – The left column contains the first digit.
Descriptive Statistics
Chapter 3: Frequency Distributions
Distributions & Graphs. Variable Types Discrete (nominal) Discrete (nominal) Sex, race, football numbers Sex, race, football numbers Continuous (interval,
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Programming in R Describing Univariate and Multivariate data.
Objective To understand measures of central tendency and use them to analyze data.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Chapter 1: Exploring Data AP Stats, Questionnaire “Please take a few minutes to answer the following questions. I am collecting data for my.
Statistics 3502/6304 Prof. Eric A. Suess Chapter 3.
Univariate Data Chapters 1-6. UNIVARIATE DATA Categorical Data Percentages Frequency Distribution, Contingency Table, Relative Frequency Bar Charts (Always.
StatisticsStatistics Graphic distributions. What is Statistics? Statistics is a collection of methods for planning experiments, obtaining data, and then.
Exploratory Data Analysis
Percentiles and Box – and – Whisker Plots Measures of central tendency show us the spread of data. Mean and standard deviation are useful with every day.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Categorical vs. Quantitative…
3-5: Exploratory Data Analysis  Exploratory Data Analysis (EDA) data can be organized using a stem and leaf (as opposed to a frequency distribution) 
1 Further Maths Chapter 2 Summarising Numerical Data.
Displaying Quantitative Data Graphically and Describing It Numerically AP Statistics Chapters 4 & 5.
Unit 4 Statistical Analysis Data Representations.
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
Revision Analysing data. Measures of central tendency such as the mean and the median can be used to determine the location of the distribution of data.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
Chapter 3 Displaying Data. 2 Major Points Plotting data: Why bother? Plotting data: Why bother? Histograms Histograms Frequency polygon Frequency polygon.
Vocabulary to know: *statistics *data *outlier *mean *median *mode * range.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
MATH 2311 Section 1.5. Graphs and Describing Distributions Lets start with an example: Height measurements for a group of people were taken. The results.
Chapter 3: Displaying and Summarizing Quantitative Data Part 1 Pg
4.2 Displays of Quantitative Data. Stem and Leaf Plot A stem-and-leaf plot shows data arranged by place value. You can use a stem-and-leaf plot when you.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Interpreting Categorical and Quantitative Data. Center, Shape, Spread, and unusual occurrences When describing graphs of data, we use central tendencies.
COMPLETE BUSINESS STATISTICS
UNIT ONE REVIEW Exploring Data.
Prof. Eric A. Suess Chapter 3
Types of variables Discrete VS Continuous Discrete Continuous
Exploratory Data Analysis
Please copy your homework into your assignment book
Descriptive Statistics
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Warm Up.
Unit 4 Statistical Analysis Data Representations
STATISTICS ELEMENTARY MARIO F. TRIOLA
1st Semester Final Review Day 1: Exploratory Data Analysis
Statistical Reasoning
Jeopardy Final Jeopardy Chapter 1 Chapter 2 Chapter 3 Chapter 4
Box and Whisker Plots Algebra 2.
Topic 5: Exploring Quantitative data
Describing Distributions of Data
Give 2 examples of this type of variable.
DS4 Interpreting Sets of Data
Means & Medians.
Unit 4: Describing Data After 10 long weeks, we have finally finished Unit 3: Linear & Exponential Functions. Now on to Unit 4 which will last 5 weeks.
Honors Statistics Review Chapters 4 - 5
Chapter 1: Exploring Data
Section 1.1: Displaying Distributions
What does it mean to “Interpret Data”?
Chapter 1: Exploring Data
Types of variables. Types of variables Categorical variables or qualitative identifies basic differentiating characteristics of the population.
Math 341 January 24, 2007.
Chapter 1: Exploring Data
Dot Plot A _________ is a type of graphic display used to compare frequency counts within categories or groups using dots. __________ help you see how.
Presentation transcript:

Exploratory Data Analysis Statistics 2126

Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers –Easier to find mistakes –Easier to guess what actually happened –Easier to find odd values

Introduction One of the most important and overlooked part of statistics is Exploratory Data Analysis or EDA Developed by John Tukey Allows you to generate hypotheses as well as get a feel for you data Get an idea of how the experiment went without losing any richness in the data

Hey look, numbers! x (the value)f (frequency)

Frequency tables make stuff easy 10(1)+23(2)+25(5)+30(2)+33(1)+35(10 = 309

Relative Frequency Histogram You can use this to make a relative frequency histogram Lose no richness in the data Easy to reconstruct data set Allows you to spot oddities

Categorical Data With categorical data you do not get a histogram, you get a bar graph You could do a pie chart too, though I hate them (but I love pie) Pretty much the same thing, but the x axis really does not have a scale so to speak So say we have a STAT 2126 class with 38 Psych majors, 15 Soc, 18 CESD majors and five Bio majors

Like this

Quantitative Variables So with these of course we use a histogram We can see central tendency Spread shape

Skewness

Kurtosis Leptokurtic means peaked Platykurtic means flat

More on shape A distribution can be symmetrical or asymmetrical It may also be unimodal or bimodal It could be uniform

An example Number of goals scored per year by Mario Lemieux A histogram is a good start, but you probably need to group the values

Mario could sorta play Wait a second, what is with that 90? Labels are midpoints, limits are 5-14 … Real limits are 85.5 – 94.5

Careful You have to make sure the scale makes sense Especially the Y axis One of the problems with a histogram with grouped data like this is that you lose some of the richness of the data, which is OK with a big data set, perhaps not here though

Stem and Leaf Plot This one is an ordered stem and leaf You interpret this like a histogram Easy to sp ot outliers Preserves data Easy to get the middle or 50 th percentile which is 44 in this case

The Five Number Summary You can get other stuff from a stem and leaf as well Median First quartile (17.5 in our case) Third quartile (61.5 here) Quartiles are the 25th and 75th percentiles So halfway between the minimum and the median, and the median and the maximum

You said there were five numbers.. Yeah so also there is the minimum 1 And the maximum, 85 –These two by the way, give you the range Now you take those five numbers and make what is called a box and whisker plot, or a boxplot Gives you an idea of the shape of the data

And here you go…