Description and measurement

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

Describing Quantitative Variables
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Descriptive Measures MARE 250 Dr. Jason Turner.
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Measures of Dispersion
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES
Descriptive Statistics
Descriptive Statistics A.A. Elimam College of Business San Francisco State University.
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
1 Economics 240A Power One. 2 Outline w Course Organization w Course Overview w Resources for Studying.
Descriptive Statistics
Basic Business Statistics 10th Edition
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Edpsy 511 Homework 1: Due 2/6.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data: Numerical
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Chapter 3 Statistical Concepts.
EPE/EDP 557 Key Concepts / Terms –Empirical vs. Normative Questions Empirical Questions Normative Questions –Statistics Descriptive Statistics Inferential.
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.
Applied Quantitative Analysis and Practices LECTURE#08 By Dr. Osman Sadiq Paracha.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
Describing distributions with numbers
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
Descriptive Statistics Unit 6. Variable Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Descriptive Statistics(Summary and Variability measures)
Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Descriptive Statistics Printing information at: Class website:
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Outline Sampling Measurement Descriptive Statistics:
COMPLETE BUSINESS STATISTICS
Descriptive Statistics ( )
Chapter 3 Describing Data Using Numerical Measures
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Descriptive Statistics
Description of Data (Summary and Variability measures)
Numerical Descriptive Measures
Descriptive Statistics
Basic Statistical Terms
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Quartile Measures DCOVA
Honors Statistics Review Chapters 4 - 5
Numerical Descriptive Measures
Biostatistics Lecture (2).
Presentation transcript:

Description and measurement Dr Kwang Lee Email: k.h.lee@sheffield.ac.uk 03/05/2013

Outline 1. Concepts of scale of measurement (types of data e.g. categorical, continuous) 2. Sampling methods, frequency and probability distributions. 3. Summary statistics and graphs, outliers, stem-and-leaf plots, Box plots, scattergrams.

Scales of Measurement: categorical data  Nominal Scale - Labels represent various levels of a categorical variable. Gender, Ethnicity, or Marital Status. Statistical test: chi square  Ordinal Scale - Labels represent an order that indicates either preference or ranking. quality of food (0, 1, or 2) etc statistical tests: Spearman's Rank Order Correlation (rho), Mann-Whitney U * Nominal (unordered; male, female) vs ordinal (ordered; food quality score 0,1,2,3)

Scales of Measurement: continuous data  Interval Scale - Numerical labels indicate order and distance between elements. There is no absolute zero and multiples of measures are not meaningful. Most personality measures & scale scores statistical tests: t-test, ANOVA, regression, factor analysis etc  Ratio Scale - Numerical labels indicate order and distance between elements. There is an absolute zero and multiples of measures are meaningful. Length or distance in centimeters, inches etc that have the absolute zero.

Ordinal vs. interval scale Most personality measures & scale scores

Classify the data according to the level of measurement. 1. Temperature, 2. Salary, 3. time, 4. postcode, 5. grade A ) interval, nominal, interval, ratio, interval B ) nominal, ratio, interval, ordinal, ratio C ) ratio, ordinal, ordinal, interval, ratio D ) interval, ratio, ratio, nominal, ordinal

A study was conducted to investigate the effect of a coal-fire generating plant upon the water quality of a river. As part of an environmental impact study, fish were captured, tagged, and released. The following information was recorded for each fish: sex(0=female, 1=male), length(cm), maturation (0=young, 1=adult), weight(g). The scale of these variables is: (a) nominal, ratio, nominal, ratio (b) nominal, interval, ordinal, ratio (c) nominal, ratio, ordinal, ratio (d) ordinal, ratio, nominal, ratio (e) ordinal, interval, ordinal, ratio

Descriptive statistics Methods of organising, summarising, and presenting data in a convenient and informative way. These methods include: numerical techniques graphical techniques The actual method used depends on what information you would like to extract. Are you interested in: measures of central location and/or measures of variability (dispersion)?

Measures of central location

MEAN Mean is probably the most common indicator. The mean can be defined as as the arithmetic average of all values. The mean measures the central tendency of a variable.                   where n      is the sample size.

Median – a different kind of average “Middle value” Order data When n is odd  middle value When n is even  average two middle values 05 11 21 24 27 28 30 42 50 52  median = average of 27 and 28 = 27.5

Median is “robust” Robust  resistant to skews and outliers This data set has a mean (xbar) of 1600: 1362 1439 1460 1614 1666 1792 1867 This data set has an outlier and a mean of 2743: 1362 1439 1460 1614 1666 1792 9867 Outlier The median is 1614 in both instances. The median was not influenced by the outlier.

Mode Mode  value with greatest frequency e.g., {4, 7, 7, 7, 8, 8, 9} has mode = 7 Used only in very large data sets The mode is used less frequently than the mean or the median.

Mean, Median, Mode Symmetrical data: mean = median positive skew: mean > median [mean gets “pulled” by tail] negative skew: mean < median

Measures of variability

Range Simplest way to describe the spread of dataset is to quote the minimum (lowest) and maximum (highest) value. e.g., Minimum: 116, maximum: 170: range: 54 Affected by extreme values

Quartiles Quartiles divide the values of a data set into four subsets of equal size, each comprising 25% of the observations.

Inter-Quartile Range 50% 25% 25% Q1 Q3 Inter-Quartile Range = IQR = Q3 - Q1

Variance and Standard deviation. The variance of a set of data is a measure of spread about the mean of a distribution. The variance uses all the data The standard deviation is the square root of the variance

The Variance Variance is one of the most frequently used measures of spread, for population, for sample,

The Standard Deviation Since variance is given in squared units, we often find uses for the standard deviation, which is the square root of variance: for a population, for a sample,

Shape of the Distribution: Skewness Values need not be symmetrically distributed around the central point; distributions can be skewed Mean and standard deviation are insufficient to describe the distribution Frequency This distribution is skewed to the right (positively skewed) Mode Mean x Median

Consequences of a Skewed Distribution Especially socio-economic data (wages, income, wealth and related variables) is frequently skewed Skewed variables can lead to undesirable effects Test statistics and confidence intervals are biased If the variable is not significantly skewed, continue If the variable is skewed, transform the variable: For this reason you often find the logarithm of income, the square root of the mortality rate, etc.

Kurtosis: a measure of the "peakedness" Two variables with equal mean and standard deviation, and symmetrically distributed, but a different kurtosis f(x) f(y) f(y)  Here, variable y has the larger kurtosis than variable x sy sx f(x) m x,y

Describe Samples: graphs Box plot and stem-and-leaf diagram,

Box Plot Visual display of Max value Third quartile Mean Median First quartile Min value Visual display of Central tendency, Variability, Departure from symmetry, Outliers give a good graphical image of the concentration of the data. They also show how far from most of the data the extreme values are. 26 26

STEM AND LEAF DIAGRAMS STEM LEAVES A Stem and Leaf diagram is a way of sorting data. They look like this. The data is split into tens (the stem) and the units (the leaves).

STEM AND LEAF DIAGRAMS We are going to put this data into a stem and leaf diagram. 12, 32, 22, 16, 24, 34, 12, 10, 25, 30, 28 STEM LEAVES 1 2 3 2 We have numbers in the tens, twenties and thirties so this becomes our stem. Now we need to enter the leaves. The first number twelve has a 2 in the unit column so this becomes the leaf.

3 2 STEM AND LEAF DIAGRAMS 12, 32, 22, 16, 24, 34, 12, 10, 25, 30 STEM LEAVES 3 2 1 2 3 2 6 2 2 4 5 8 The next number is 32. This has a 2 in the units column so it goes as shown. 2 4 The rest go as shown. Key: 1 2 = 12

STEM AND LEAF DIAGRAMS STEM LEAVES If an ORDERED stem and leaf diagram is required then you have to put the leaves in numerical order. 1 2 3 0 2 2 6 2 4 5 8 0 2 4 We can now use this to find the median. There are 11 pieces of data so the median is the 6th number. Key: 1 2 = 12 Median = 24 It is a good choice when the data sets are small!

If most of the measurements in a large data set are of approximately the same magnitude except for a few measurements that are quite a bit larger, how would the mean and median of the data set compare and what shape would a histogram of the data set have? (a) The mean would be smaller than the median and the histogram would be skewed with a long left tail. (b) The mean would be larger than the median and the histogram would be skewed with a long right tail. (c) The mean would be larger than the median and the histogram would be skewed with a long left tail. (d) The mean would be smaller than the median and the histogram would be skewed with a long right tail. (e) The mean would be equal to the median and the histogram would be symmetrical.

When extreme values are present in a set of data, which of the following descriptive summary measures are most appropriate? (a) Coefficient variation and range. (b) Mean and standard deviation. (c) Median and inter-quartile range. (d) Mode and variance.

The weights of the male and female students in a class are summarized in the following boxplots: Which of the following is NOT correct? (a) About 50% of the male students have weights between 150 and 185 lbs. (b) About 25% of female students have weights more than 130 lbs. (c) The median weight of male students is about 162 lbs. (d) The mean weight of female students is about 120 because of symmetry. (e) The male students have less variability than the female students.

The following is a stem-plot of the birth weights of male babies born to a group of mothers who smoked during pregnancies. The stems are in units of kg. The median birth weight is: (a) 13.5 (b) 3.2 (c) 3.5 (d) 3.7 (e) Average of 13 and 14. The first quartile (25th) percentile of the weights is (a) 2.3 (b) 2.7 (c) .25 (d) 6.5 (e) 2.8

Thank you