Ch3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical Treatment of Data Before doing ANYTHING with data: Understand the question. –

Slides:



Advertisements
Similar presentations
Descriptive Measures MARE 250 Dr. Jason Turner.
Advertisements

Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Measures of Dispersion
Numerically Summarizing Data
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Chapter 1 Introduction Individual: objects described by a set of data (people, animals, or things) Variable: Characteristic of an individual. It can take.
Plotting of the data Dot diagram When Analyzing data, always plot the data! A dot diagram: XLXTStren 11.8* * 11.7* ** * * 11.6* ** * * 11.5* * ** 11.4*
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Statistics for Managers using Microsoft Excel 6th Global Edition
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution Business Statistics: A First Course 5 th.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
STAT 211 – 019 Dan Piett West Virginia University Lecture 2.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 1 Exploring Data
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Statistics for Managers.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
3.3 Density Curves and Normal Distributions
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Exploration of Mean & Median Go to the website of “Introduction to the Practice of Statistics”website Click on the link to “Statistical Applets” Select.
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
Copyright © 2010 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Copyright © 2004 Pearson Education, Inc.. Chapter 2 Descriptive Statistics Describe, Explore, and Compare Data 2-1 Overview 2-2 Frequency Distributions.
Chapter 2 Describing Data.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 2 Modeling Distributions of Data 2.2 Density.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Describing distributions with numbers
Copyright © 2009 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Applied Quantitative Analysis and Practices LECTURE#09 By Dr. Osman Sadiq Paracha.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 2 Modeling Distributions of Data 2.2 Density.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Announcements First quiz next Monday (Week 3) at 6:15-6:45 Summary:  Recap first lecture: Descriptive statistics – Measures of center and spread  Normal.
Lecture 6 Normal Distribution By Aziza Munir. Summary of last lecture Uniform discrete distribution Binomial Distribution Mean and Variance of binomial.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Describing Distributions Numerically.
Unit 3: Averages and Variations Week 6 Ms. Sanchez.
Ch 2 The Normal Distribution 2.1 Density Curves and the Normal Distribution 2.2 Standard Normal Calculations.
Stat 2411 Statistical Methods Chapter 2: Summarizing data.
Ch3 Elementary Descriptive Statistics. Section 3.1: Elementary Graphical Treatment of Data Before doing ANYTHING with data: Understand the question. –
© 2002 Prentice-Hall, Inc.Chap 5-1 Statistics for Managers Using Microsoft Excel 3 rd Edition Chapter 5 The Normal Distribution and Sampling Distributions.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
6.3 One- and Two- Sample Inferences for Means. If σ is unknown Estimate σ by sample standard deviation s The estimated standard error of the mean will.
Stat 2411 Statistical Methods Chapter 2: Summarizing data.
Statistical Methods © 2004 Prentice-Hall, Inc. Week 3-1 Week 3 Numerical Descriptive Measures Statistical Methods.
Class Two Before Class Two Chapter 8: 34, 36, 38, 44, 46 Chapter 9: 28, 48 Chapter 10: 32, 36 Read Chapters 1 & 2 For Class Three: Chapter 1: 24, 30, 32,
Density Curves & Normal Distributions Textbook Section 2.2.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Descriptive Statistics ( )
Parameter, Statistic and Random Samples
MATH-138 Elementary Statistics
Chapter 6 ENGR 201: Statistics for Engineers
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Numerical Descriptive Measures
Descriptive Statistics
Chapter 2: Modeling Distributions of Data
Numerical Descriptive Measures
Honors Statistics Review Chapters 4 - 5
The Normal Distribution
Descriptive Statistics Civil and Environmental Engineering Dept.
Presentation transcript:

Ch3 Elementary Descriptive Statistics

Section 3.1: Elementary Graphical Treatment of Data Before doing ANYTHING with data: Understand the question. – An approximate answer to the exact question is always better than an exact answer to an approximate question. John Tukey. Know how the experiment was conducted.

The FIRST thing to do with the data is to PLOT THE DATA – Plot all individual points. – If there are connections between points, e.g. points are from same pairs (or sometimes separate blocks), show connections between related points.

Plotting data is an extremely important step. More often than not data I get when consulting have problems like incorrect data or attributes they didn’t tell me about. Plotting helps reveal relationships and answers. Plotting is a very effective way to present results. – “A picture is worth a thousand words.”

Example: 8 lb. test fishing line question: Which type(s) of line are strongest? Listing numerical data Trilene XL Trilene XT Stren It’s hard to see what’s happening without organizing the data.

A “dot” diagram XL XT Stren 11.8 ** 11.7***** 11.6***** 11.5**** 11.4*** 11.3** 11.2** 11.1**** 11.0** 10.9*

Stem and leaf plot It shows the distribution shape and at the same time preserves the original values. In the gears’ runouts example, for the gears hung group, we have data points of 7, 8, 8, 10, 10, 10, 10, 11, 11, 11, 12, 13… A stem and leaf plot is

Two groups can be compared with back to back stem and leaf diagrams E.g. Stopping distances of bikes Treaded tireSmooth tire Or dot diagrams | | | * | ** | | * |**Treaded |*** | * | | * | | * |Smooth

When there are associations between sets of data values, plot the data accordingly. E.g., Snowfall for duluth and White Bear Lake A not very good way to plot the data WB Lake Duluth 130* 120* 110** ** 100*** * 90***** 80****** ****** 70** *** 60** ********** 50**** *** 40*** *** 30* *** 20

Duluth White Bear

A study of trace metals in South Indian River T=top water zinc concentration (mg/L) B=bottom water zinc (mg/L) Top Bottom

One of the first things to do when analyzing data is to PLOT the data This is not a useful way to plot the data. There is not a clear distinction between bottom water and top water zinc—even though Bottom>Top at all 6 locations. TopBottom

A better way TopBottom Connect points in the same pair.

Another way (scatter plot) Bottom=Top

This following plot would imply a natural ordering of sites from 1 to 6. This would not be the best way to plot the data unless the sites 1-6 correspond to a natural ordering such as distance downstream of a factory.

Run charts (a version of scatter plot) The variable on the x axis is a time variable. Table: 30 consecutive outer diameters turned on a lathe

Moving along time, the outer diameters tend to get smaller until part 16, where there is a large jump, followed by a pattern of diameter generally decreasing in time.

Section 3.2: Quantiles and Related Graphical Tools Quantile: Roughly speaking, for a number p between 0 and 1, the p quantile of a distribution is a number such that a fraction p of the distribution lies to the left and a fraction 1-p of the distribution lies to the right.

p quantile = 1O0*p th percentile Q(0.10) = 0.10 quantile = 10 th percentile Q(0.50) = 0.50 quantile = 50 th percentile = median Q(0.25) =0.25 quantile = 25 th percentile= first quartile Q(0.75) =0.75 quantile = 75 th percentile= third quartile

The p th quantile is ordered point corresponding to the point with index So the comulative probability corresponding to the i th point is

Consider the following n=10 points Q(0.25) = 0.25 quantile = 857 Q(0.50) = median =. Q(0.75) = 9614 IQR = Interquartile Range = Q(0.75) - Q(0.25)= = 1042

To find the 93 rd percentile: 0.93 is part way between 0.85 and So the Q(0.93) is 0.8 of the way from Q(0.85) to Q(0.95) Q(0.85) + 0.8(Q(0.95)-Q(0.85)) =0.2*Q(0.85) + 0.8*Q(0.95) = 0.2(9614)+ 0.8(10,688) = 10,473.

Boxplots are useful summaries, particularly when there are too many points for a dot plot. To make a boxplot, we need essentially 5 numbers.

Section Q-Q Plots and Comparing Distributional Shapes Most of the statistical tools we will use in this class assume normal distributions (a bell shaped distribution for the population of possible values). In order to know if these are the right tools for a particular job, we need to be able to assess if the data appear to have come from a normal population.

With large amounts of data, one can draw a histogram of the measured values and see if it is bell-shaped. A normal plot is a method for assessing normality that works well with big or small data sets. It gives a good visual check for normality.

Simulation: 100 observations, normal with mean=5, st dev=1 x<-rnorm(100, mean=5, sd=1) qqnorm(x)

A normal plot is a plot of the data in a way such that data from normal populations will come out pretty much in a straight line. We plot the corresponding quantiles of a "standard normal'' distribution versus ordered y values

In other words In order to plot the data and check for normality, we compare our observed data to what we would expect from a sample of standard normal data.

So if we plot ordered values from a normal population against corresponding quantiles of a standard normal population, we expect to get a reasonably straight line, since any normal distribution is linearly related to the standard normal distribution.

The textbook plots the standard normal quantiles on the vertical axis and the ordered data points on the horizontal axis. Many software packages and other books plot the standard normal quantiles on the horizontal axis and the ordered data points on the vertical axis. Either way, the plot should look ``fairly'' straight if the data are from a normal distribution.

Excel File of Lifetime of Springs Data

Section 3.3: Numerical Summaries Measures of Location: The data are found spread around what value ? Median = Q(O.50) = 50 th percentile. Sample mean = arithmetic mean = average The mean is more affected by unusual values than the median.

Measures of Spread: R = Range = Biggest – Smallest The size of the range can be affected by how many values we have. Many number will tend to have a larger range than fewer numbers. IQR = lnterquartile Range = Q(0.75) – Q(0.25) Range that include half of the values.

Sample variance = Essentially an average squared deviation from the mean. Sample standard deviation =

Example: X 1 = 8 X 2 = 9 X 3 = 4

Statistics and Parameters A statistic is a numerical summary of the sample data. = sample mean s 2 = sample variance

A parameter is a summary of an entire population or a theoretical distribution, for example a normal distribution.  = population mean  2 = population variance Average squared deviation from the mean.  = population standard deviation

For a sample of size n, the sample variance is Why divide by n -1? This makes an unbiased estimator of. Unbiased means on the average correct.

Suppose we have a large population of ball bearings with diameters  =1cm and Sample ∞ Mean If we knew  we would find Fact So and would be too small for  . Dividing by n-1 makes s 2 come out right (   )on average.

Notice that s 2 is undefined if n=1; we can't divide by zero. This makes sense. If we have only one number, that number tells us nothing about potential spread in the population.

Plotting summary statistics over time is useful for issues such as quality control. Read section for general information.