6 - 1 Basic Univariate Statistics Chapter 6. 6 - 2 Basic Statistics A statistic is a number, computed from sample data, such as a mean or variance. The.

Slides:



Advertisements
Similar presentations
Chapter 3 Properties of Random Variables
Advertisements

Statistical Techniques I EXST7005 Start here Measures of Dispersion.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 4. Measuring Averages.
SUMMARIZING DATA: Measures of variation Measure of Dispersion (variation) is the measure of extent of deviation of individual value from the central value.
Agricultural and Biological Statistics
QUANTITATIVE DATA ANALYSIS
Calculating & Reporting Healthcare Statistics
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Biostatistics Unit 2 Descriptive Biostatistics 1.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Measures of Dispersion
Business Research Methods William G. Zikmund Chapter 17: Determination of Sample Size.
Central Tendency and Variability
12.3 – Measures of Dispersion
Exploring Marketing Research William G. Zikmund
Continuous Probability Distribution  A continuous random variables (RV) has infinitely many possible outcomes  Probability is conveyed for a range of.
Six Sigma Training Dr. Robert O. Neidigh Dr. Robert Setaputra.
 Deviation is a measure of difference for interval and ratio variables between the observed value and the mean.  The sign of deviation (positive or.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Quiz 2 Measures of central tendency Measures of variability.
Describing Data: Numerical
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 2 NUMERICAL DATA REPRESENTATION.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Chapter 3 – Descriptive Statistics
Descriptive Statistics Anwar Ahmad. Central Tendency- Measure of location Measures descriptive of a typical or representative value in a group of observations.
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
Business Research Methods William G. Zikmund Chapter 17: Determination of Sample Size.
Statistics 1 Measures of central tendency and measures of spread.
BUS250 Seminar 4. Mean: the arithmetic average of a set of data or sum of the values divided by the number of values. Median: the middle value of a data.
Biostatistics: Measures of Central Tendency and Variance in Medical Laboratory Settings Module 5 1.
Business Research Methods William G. Zikmund Chapter 17: Determination of Sample Size.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Module 11: Standard Deviations and the Like This module describes measures of dispersion or unlikeness, including standard deviations, variances,
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Descriptive Statistics: Numerical Methods
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Lecture 3 Describing Data Using Numerical Measures.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
INVESTIGATION 1.
Basic Measurement and Statistics in Testing. Outline Central Tendency and Dispersion Standardized Scores Error and Standard Error of Measurement (Sm)
Measures of Central Tendency: The Mean, Median, and Mode
What does Statistics Mean? Descriptive statistics –Number of people –Trends in employment –Data Inferential statistics –Make an inference about a population.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
LIS 570 Summarising and presenting data - Univariate analysis.
Describing Samples Based on Chapter 3 of Gotelli & Ellison (2004) and Chapter 4 of D. Heath (1995). An Introduction to Experimental Design and Statistics.
Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.
Applied Quantitative Analysis and Practices LECTURE#07 By Dr. Osman Sadiq Paracha.
Descriptive Statistics for one variable. Statistics has two major chapters: Descriptive Statistics Inferential statistics.
CHAPTER 2: Basic Summary Statistics
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
By Tatre Jantarakolica1 Fundamental Statistics and Economics for Evaluating Survey Data of Price Indices.
Chapter 3 Descriptive Statistics: Numerical Methods.
Descriptive Statistics(Summary and Variability measures)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistical Methods Michael J. Watts
Measures of Dispersion
Statistical Methods Michael J. Watts
Measures of Central Tendency
Numerical Descriptive Measures
BUS7010 Quant Prep Statistics in Business and Economics
Numerical Descriptive Measures
Numerical Descriptive Measures
CHAPTER 2: Basic Summary Statistics
Numerical Descriptive Measures
Presentation transcript:

6 - 1 Basic Univariate Statistics Chapter 6

6 - 2 Basic Statistics A statistic is a number, computed from sample data, such as a mean or variance. The objective of Statistics is to make inferences (predictions, decisions) about a population based upon information contained in a sample. » Mendenhall The objective of Statistics is to make predictions about the cost of a weapon system based upon information in analogous systems. » DoD Cost Analyst

6 - 3 Measures of Central Tendency These statistics describe the “middle region” of the sample. – Mean » The arithmetic average of the data set. – Median » The “middle” of the data set. – Mode » The value in the data set that occurs most frequently. These are almost never the same, unless you have a perfectly symmetric, unimodal population. Mode Median Mean Mode = Median = Mean

6 - 4 Mean The Sample Mean ( ) is the arithmetic average of a data set. It is used to estimate the population mean, ( . Calculated by taking the sum of the observed values (y i ) divided by the number of observations (n). SystemFY97$K Historical Transmogrifier Average Unit Production Costs

6 - 5 Median The Median is the middle observation of an ordered (from low to high) data set Examples: – 1, 2, 4, 5, 5, 6, 8 » Here, the middle observation is 5, so the median is 5 – 1, 3, 4, 4, 5, 7, 8, 8 » Here, there is no “middle” observation so we take the average of the two observations at the center

6 - 6 Mode The Mode is the value of the data set that occurs most frequently Example: – 1, 2, 4, 5, 5, 6, 8 » Here the Mode is 5, since 5 occurred twice and no other value occurred more than once Data sets can have more than one mode, while the mean and median have one unique value Data sets can also have NO mode, for example: – 1, 3, 5, 6, 7, 8, 9 » Here, no value occurs more frequently than any other, therefore no mode exists » You could also argue that this data set contains 7 modes since each value occurs as frequently as every other

6 - 7 Dispersion Statistics The Mean, Median and Mode by themselves are not sufficient descriptors of a data set Example: – Data Set 1: 48, 49, 50, 51, 52 – Data Set 2: 5, 15, 50, 80, 100 Note that the Mean and Median for both data sets are identical, but the data sets are glaringly different! The difference is in the dispersion of the data points Dispersion Statistics we will discuss are: – Range – Variance – Standard Deviation

6 - 8 Range The Range is simply the difference between the smallest and largest observation in a data set Example – Data Set 1: 48, 49, 50, 51, 52 – Data Set 2: 5, 15, 50, 80, 100 The Range of data set 1 is = 4 The Range of data set 2 is = 95 So, while both data sets have the same mean and median, the dispersion of the data, as depicted by the range, is much smaller in data set 1

6 - 9 Variance The Variance, s 2, represents the amount of variability of the data relative to their mean As shown below, the variance is the “average” of the squared deviations of the observations about their mean The Variance, s 2, is the sample variance, and is used to estimate the actual population variance,  2

Standard Deviation The Variance is not a “common sense” statistic because it describes the data in terms of squared units The Standard Deviation, s, is simply the square root of the variance The Standard Deviation, s, is the sample standard deviation, and is used to estimate the actual population standard deviation, 

Standard Deviation The sample standard deviation, s, is measured in the same units as the data from which the standard deviation is being calculated SystemFY97$K Average9.06 This number, $6.67K, represents the average estimating error for predicting subsequent observations In other words: On average, when estimating the cost of transmogrifiers that belongs to the same population as the ten systems above, we would expect to be off by $6.67K

Coefficient of Variation For a given data set, the standard deviation is $100,000. Is that good or bad? It depends… – A standard deviation of $100K for a task estimated at $5M would be very good indeed. – A standard deviation of $100K for a task estimated at $100K is clearly useless. What constitutes a “good” standard deviation? The “goodness” of the standard deviation is not its value per se, but rather what percentage the standard deviation is of the estimated value. The Coefficient of Variation (CV) is defined as the “average” percent estimating error when predicting subsequent observations within the representative population. The CV is the ratio of the standard deviation to the mean.

Coefficient of Variation In the first example, the CV is $100K/$5M = 2% In the second example, the CV is $100K/$100K = 100% These values are unitless and can be readily compared. The CV is the “average” percent estimating error for the population when using as the estimator. Or, the CV is the “average” percent estimating error when estimating the cost of future tasks. Calculate the CV from our previous transmogrifier cost database: – CV = $6.67K/$9.06K = 73.6% Therefore, for subsequent observations we would expect to be off on “average” by 73.6% when using $9.06K as the estimated cost.

Normal Distribution Continuous random variables are associated with populations which contain an infinitely large number of observations The heights of trees, the weight of humans, the cost of weapons, and the error when predicting the cost of a weapon system are all examples of continuous random variables. These continuous random variables can take on a variety of shapes, many of which approximate a bell-shaped curve. These bell-shaped curves approximate the normal probability distribution.