The symmetry statistic

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Lecture 2 Summarizing the Sample. WARNING: Today’s lecture may bore some of you… It’s (sort of) not my fault…I’m required to teach you about what we’re.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Descriptive Statistics: Numerical Measures
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
1 Empirical and probability distributions 0.4 exploratory data analysis.
Governing equation for concentration as a function of x and t for a pulse input for an “open” reactor.
Those who don’t know statistics are condemned to reinvent it… David Freedman.
Use of Quantile Functions in Data Analysis. In general, Quantile Functions (sometimes referred to as Inverse Density Functions or Percent Point Functions)
1 CHAPTER 7 Homework:5,7,9,11,17,22,23,25,29,33,37,41,45,51, 59,65,77,79 : The U.S. Bureau of Census publishes annual price figures for new mobile homes.
Chapter 3 - Part B Descriptive Statistics: Numerical Methods
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chap 6-1 Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall Chapter 6 The Normal Distribution Business Statistics: A First Course 6 th.
Numerical Descriptive Techniques
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Chapter 6: Interpreting the Measures of Variability.
Lecture 1 Fadzly, N Normality and data transformation + Non-parametric tests.
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6
Parameter, Statistic and Random Samples
Step 1: Specify a null hypothesis
MATH-138 Elementary Statistics
BAE 6520 Applied Environmental Statistics
Statistical Data Analysis - Lecture /04/03
BAE 5333 Applied Water Resources Statistics
Descriptive Measures Descriptive Measure – A Unique Measure of a Data Set Central Tendency of Data Mean Median Mode 2) Dispersion or Spread of Data A.
Statistical Data Analysis - Lecture 05 12/03/03
Two-way ANOVA with significant interactions
Data Mining: Concepts and Techniques
Size of a hypothesis test
Data description and transformation
Statistical Data Analysis - Lecture10 26/03/03
STAT 206: Chapter 6 Normal Distribution.
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Numerical Descriptive Measures
Regression model Y represents a value of the response variable.
Statistical Methods For Engineers
Quartile Measures DCOVA
Shape of Distributions
Describing Data with Numerical Values (Chapter 2)
Chapter 1: Exploring Data
Statistics for Managers Using Microsoft® Excel 5th Edition
Box-and-Whisker Plots
Summary (Week 1) Categorical vs. Quantitative Variables
Honors Statistics Review Chapters 4 - 5
Indicator Variables Response: Highway MPG
Chapter 1: Exploring Data
Chapter 1: Exploring Data
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Chapter 1: Exploring Data
Sampling Distributions (§ )
Chapter 1: Exploring Data
The Five-Number Summary
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Advanced Algebra Unit 1 Vocabulary
Chapter 1: Exploring Data
Unit 4 Quiz: Review questions
Chapter 1: Exploring Data
Chapter 1: Exploring Data
The Normal Distribution
Chapter 1: Exploring Data
Introductory Statistics
Presentation transcript:

The symmetry statistic Statistical Data Analysis - Lecture 04 11/03/03 The symmetry statistic If S is less than 0.5 then the data is right skewed The  quantile is closer to the median than the 1 -  quantile If S is greater than 0.5 then the data is left skewed The 1 -  quantile is closer to the median than the  quantile Statistical Data Analysis - Lecture 04 11/03/03

Statistical Data Analysis - Lecture 04 11/03/03  d1= m - q d2= q1- - q 2d1 < d2  S = d1/d2 < 0.5 d1 d2 2d1 > d2  S = d1/d2 >0.5 d1 d2 Statistical Data Analysis - Lecture 04 11/03/03

Data description and transformation continued. Note: for p < 0 we must transform the data x to –xp to preserve the order of the data. Box-Cox transformations Box-Cox transformations try to make the data more normal Box-Cox method assumes data raised to some power p, comes from a normal distribution with mean  and std. dev.  To find p,  and  we use maximum likelihood estimation (mle) Statistical Data Analysis - Lecture 04 11/03/03

Box-Cox transformations Minimise: This is a “profile likelihood” – which you might learn about in the third or fourth year. YOU DO NOT NEED TO REMEMBER THIS OR EVEN KNOW HOW IT WORKS! Typically this is minimised using a graphical method or by some sort of numerical optimisation routing. R can do both. A routine to perform this, is available on the web page and from the server. Statistical Data Analysis - Lecture 04 11/03/03

Statistical Data Analysis - Lecture 04 11/03/03

Transformations - summary Only works on positive data Box-Cox usually works well Make sure you can explain your transformation Taking the log often is the easiest thing to do Don’t transform if you don’t need to! Statistical Data Analysis - Lecture 04 11/03/03

Symmetry and skewness measures We’ve seen the symmetry statistic S. Robust to outliers Crude symmetry statistic can be obtained using the quartiles. These can be gained in R by typing: quantile(x) For the standard symmetry statistic, there is a function called s in the file called symmetry.r on the web and in the course folder. We type s(x) Statistical Data Analysis - Lecture 04 11/03/03

Statistical Data Analysis - Lecture 04 11/03/03 Skewness statistic Traditional measure of skewness is defined as: where s is the sample standard deviation. This is sometimes called the 3rd central moment (the variance is the second) If this statistic is positive, then the data are right skewed and if it is negative then they are left skewed. Statistical Data Analysis - Lecture 04 11/03/03

Statistical Data Analysis - Lecture 04 11/03/03 Quantile plots Box-Cox transformations try to “transform to normality” Is there any way we can find out whether the data behave as though they came from a normal distribution? Yes – a normal quantile plot or a norplot. Statistical Data Analysis - Lecture 04 11/03/03

Statistical Data Analysis - Lecture 04 11/03/03 Norplots Define the quantiles of a standard normal distribution N(0,1) If the data come from a standard normal distribution then the empirical quantiles will be in approximately the same position as the theoretical quantiles Statistical Data Analysis - Lecture 04 11/03/03

Statistical Data Analysis - Lecture 04 11/03/03 Example of a norplot Statistical Data Analysis - Lecture 04 11/03/03

Statistical Data Analysis - Lecture 04 11/03/03 Norplots What happens if the data don’t come from a standard normal? Remember that if Y~N(,) and Z~N(0,1) then Therefore if the data are normal, but not standard normal, the data will have slope  and intercept  Statistical Data Analysis - Lecture 04 11/03/03

Statistical Data Analysis - Lecture 04 11/03/03

Statistical Data Analysis - Lecture 04 11/03/03 Quantile Plots It is relatively easy to construct quantile plots for any of the standard distributions. However, the parameters are often unknown and so most people don’t bother for anything other than a uniform distribution Sometimes, however, we wish to see if two sets of data are similarly distributed Statistical Data Analysis - Lecture 04 11/03/03

Statistical Data Analysis - Lecture 04 11/03/03 QQ-Plots QQ or quantile-quantile plots compare the quantiles of two sets of data If the data sets are similarly distributed then then quantiles will be similar and hence the points will cluster around the 45° line If the data sets have the same spread but differing means the relationship will be linear but the intercept will be different from zero If the data sets have the differing means and spreads the relationship will be linear but the slope will be different from 1 and the intercept non-zero Statistical Data Analysis - Lecture 04 11/03/03

Constructing a QQ-plot Let X represent data set 1, and Y data set 2. If X and Y are of equal length then we can plot the order statistics of X vs. the order statistics of Y If nx > nY then we need to interpolate the quantiles from Y with Statistical Data Analysis - Lecture 04 11/03/03

Statistical Data Analysis - Lecture 04 11/03/03

Statistical Data Analysis - Lecture 04 11/03/03

QQ-Plot – different distributions Statistical Data Analysis - Lecture 04 11/03/03

Some points about QQ-plots Statistical Data Analysis - Lecture 04 11/03/03 Some points about QQ-plots Easier to interpret if aspect ratio is 1:1 Good for detect mean and variance/std. dev. shifts Not really convincing evidence against normality (need a lot of data) Heavy curvature usually means the data sets are from different distributions Statistical Data Analysis - Lecture 04 11/03/03