Measures of Center and Variation Prof. Felix Apfaltrer Office:N518 Phone: X7421 Office hours: Tue, Thu 1:30-3pm.

Slides:



Advertisements
Similar presentations
Probability Chapter 3 Prof. Felix Apfaltrer Office:N518 Office Hours: 10:30am-noon Phone:
Advertisements

Overview Fundamentals
Chapter 4 Probability and Probability Distributions
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Probability. Probability Definitions and Relationships Sample space: All the possible outcomes that can occur. Simple event: one outcome in the sample.
Chapter 3 Probability 3-1 Overview 3-2 Fundamentals 3-3 Addition Rule
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Measures of Center and Variation Sections 3.1 and 3.3
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Created by Tom Wegleitner, Centreville, Virginia Section 3-1.
Slides by JOHN LOUCKS St. Edward’s University.
Discrete probability distributions Chapter 6 - Sullivan
Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods
Slide 1 Statistics Workshop Tutorial 4 Probability Probability Distributions.
Probability (cont.). Assigning Probabilities A probability is a value between 0 and 1 and is written either as a fraction or as a proportion. For the.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Lecture 6: Descriptive Statistics: Probability, Distribution, Univariate Data.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Section 4-2 Basic Concepts of Probability.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data: Numerical
Describing distributions with numbers
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Chapter 4 Probability 4-1 Overview 4-2 Fundamentals 4-3 Addition.
Chapter 4 Probability 4-1 Overview 4-2 Fundamentals 4-3 Addition Rule
Sections 4-1 and 4-2 Review and Preview and Fundamentals.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Chapter 3 – Descriptive Statistics
Slide 1 Lecture 4: Measures of Variation Given a stem –and-leaf plot Be able to find »Mean ( * * )/10=46.7 »Median (50+51)/2=50.5 »mode.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Created by Tom Wegleitner, Centreville, Virginia Section 3-1 Review and.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
Statistics Class 4 February 11th , 2012.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Basic Principle of Statistics: Rare Event Rule If, under a given assumption,
Chapter 4 Probability 4-1 Overview 4-2 Fundamentals 4-3 Addition Rule
1  Event - any collection of results or outcomes from some procedure  Simple event - any outcome or event that cannot be broken down into simpler components.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Slide Slide 1 Section 3-3 Measures of Variation. Slide Slide 2 Key Concept Because this section introduces the concept of variation, which is something.
Descriptive Statistics: Numerical Methods
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Statistics Numerical Representation of Data Part 2 – Measure of Variation.
Describing distributions with numbers
DISCRETE PROBABILITY DISTRIBUTIONS
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Measures of Center.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Lecture 2 Dustin Lueker.  Center of the data ◦ Mean ◦ Median ◦ Mode  Dispersion of the data  Sometimes referred to as spread ◦ Variance, Standard deviation.
 IWBAT summarize data, using measures of central tendency, such as the mean, median, mode, and midrange.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
1 Measures of Center. 2 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
Chapter 2 Descriptive Statistics
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Slide 1 Lecture # 4&5 CHS 221 DR. Wajed Hatamleh.
Chapter 5 Probability Distributions 5-1 Overview 5-2 Random Variables 5-3 Binomial Probability Distributions 5-4 Mean, Variance and Standard Deviation.
Honors Statistics Chapter 3 Measures of Variation.
Computing Fundamentals 2 Lecture 7 Statistics, Random Variables, Expected Value. Lecturer: Patrick Browne
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
Probability Distributions. Constructing a Probability Distribution Definition: Consists of the values a random variable can assume and the corresponding.
1 Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman Probability Chapter 3 M A R I O F. T R I O L A Copyright © 1998, Triola, Elementary.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Section 3.3 Measures of Variation.
Midrange (rarely used)
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Presentation transcript:

Measures of Center and Variation Prof. Felix Apfaltrer Office:N518 Phone: X7421 Office hours: Tue, Thu 1:30-3pm

2 Measures of center - mean A measure of center is a value that represents the center of the data set The mean is the most important measure of center (also called arithmetic mean) sample mean population mean addition of values variable (indiv. data vals) sample size population size Example. Lead (Pb) in air at BMCC (mmg/m3), 1.5 high: 5.4, 1.1, 0.42, 0.73, 0.48, 1.1 Outlier has strong effect on mean!

3 Measures of center - median Mean is good but sensitive to outliers! Large values can have dramatic effect! The median is the middle value of the original data arranged in increasing order –If n odd:exact middle value –If n even:average 2 middle values Previous example: -reorder data: 0.42, 0.48, 0.73, 1.1, 1.1, 5.4 If we had an extra data point: 5.4, 1.1, 0.42, 0.73, 0.48, 1.1, 0.66 After reordering we have 0.42, 0.48, 0.66, 0.73, 1.1, 1.1, 5.4 Outlier has strong effect on mean, not so on median! Used for example in median household income: $ 36,078

4 Measures of Center - mode and midrange Mode M value that occurs most frequently –if 2 values most frequent: bimodal –if more than 2: multimodal –Iif no value repeated: no mode Needs no numerical values Midrange = (highest-lowest value)/2 Outliers have very strong weight Examples: a.5.4, 1.1,0.42, 0.73, 0.48, 1.1 b.27, 27, 27, 55, 55, 55, 88, 88, 99 c.1, 2, 3, 6, 7, 8, 9, 10 Solutions: unimodal: 1.1 Bimodal 27 and 55 No mode a. ( )/2=2.91 b. (27+99)/2=63 c. (1+10)/2= 5.5

5 Mode and more … Mode: not much used with numerical data Example: Survey shows students own: 84% TV 76% VCR 69% CD player 39% video game player 35% DVD Mean from frequency distribution Weighted mean: (example on page 23 of BMCC booklet) Dis-Advantages of different measures of center Skewness TV is the mode! No mean, median or midrange! Round-off: carry one more decimal than in data!

6 Measures of variation Variation measures consistency Range = (highest value - lowest value)/2 Standard deviation: Precision arrows jungle arrows Same mean length, but different variation!

7 Standard deviation Measure of variation of all values from mean Positive or zero (data = ) Larger deviations, larger s Can increase dramatically with outliers Same units as original data values Recipe: Compute mean Substract mean from individual values 3.Square the differences 4.Add the squared differences 5.Divide by n-1. 6.Take the square root. Example: waiting times Bank Consistency Bank Unpredictable Mean: ( )/6=5 1.(6-5)=1,(5-5)=0, (4-5)=-1, (4-5)=-1, (6-5)=1, =1, 0 2 =0, (-1) 2 =1, (-1) 2 =1, 1 2 =1,0 2 =0 3.∑ = 4 4.n-1=6-1=5 4/5=0.8 5.√0.8 = 0.9 minvs 6.3 min

8 Standard deviation of sample and population Example using fast formula: Find values of n,, n=6 6 values in sample = 30 adding the values = = 154 Standard deviation of a population divide by N - mu (population mean) Sigma (st. dev. of population) Different notations in calculators –Excell: STDEVP instead of –STDEV Estimating s and  : (highest value - lowest value)/4

9 Example: class grades A statistics class of 20 students obtains the following grades: To rapidly approximate the mean, we take a random sample of 5 students. At random, we pick x = ( )/5=395/5 = 79 s =√( (78-79) 2 +(92-79) 2 +(64-79) 2 +(83-79) 2 +(78-79) 2 )/4 =√( ( -1) 2 + ( 13 ) 2 + ( -15 ) 2 + ( 4 ) 2 +( -1 ) 2 )/4 =√( )/4 =√( 412 )/4 =√( 103 ) = The population mean is obtained by adding all grades and dividing by 20, which is The population variance is Which we can obtain using Excell:

10 Variance and coefficient of variation Variance Variance = square of standard deviation sample population General terms refering to variation: dispersion, spread, variation Variance: specific definition Ex: finding a variance 0.8, 40 Examples: In class grade case, sample standard deviation was Therefore, s 2 =103. The population standard deviation was 10.71, therefore,  2 = =

11 Coefficient of variation Coefficient of variation allows to compare dispersion of completely different data sets –ex: consistent bank data set 6,5,4,4,6,5; x=5, s=0.9 CV=.9/5=0.18 Class sample: x=79, s=10.1 CV=10.1/79=0.13 –Variation of consistent bank is larger than that of the class in relative terms! Coefficient of variation CV Describes the standard deviation relative to the mean: In previous example, CV sample =10.1/79 =12.8% CV population =10.71/ =13.4%

12 More on variance and standard deviation Why use variance, standard deviation is more intuitive? –(Independent) variances have additive properties –Probabilistic properties –Standard deviation is more intuitive Why divide sample st. dev by n-1? –Only n-1 free parameters Skewness: Pearson’s index –I=3( mean-median )/s –If I 1: significantly squewed Empirical rule for data with normal distribution 68% of data 95% of data 99.7% of data Example: Adult IQ scores have a bell-shaped distribution with mean of 100 and a standard deviation of 15. What percentage of adults have IQ in 55:145 range? s=15, 3s=45, x-3s=55, x+3s=145 Hence, 99.7% of adults have IQs in that range. Chebyshev’s theorem: At least 1-1/k 2 percent of the data lie between k standard deviations from the mean. Ex: At least 1-1/32=8/9=89% of the data lie within 2 st. dev. of the mean.

13 And more on variance and standard deviation Finding s from a frequency distributionInterpreting a known value of the standard deviation s: If the standard deviation s is known, use it to find rough estimates of the minimum and maximum “usual” sample values by using max “usual” value ≈ mean + 2  (st. dev) min “usual” value ≈ mean - 2  (st. dev) N-1: DATA 3,6,9  =6,  2 =6 Samples (replacement): x = ∑(x-x ) 2 = S 2 =(divide by n-1=2-1) Mean value of s 2 = 54/9 = 6 S  2 =(divide by n=2) Mean value of s  2 = 27/9 = 3 Example: cotinine levels of smokers using Excel we obtain with which we calculate:

14 Measures of relative standing Useful for comparing different data sets z scores –Number of standard deviations that a value x is above of below the mean sample population Example: NBA Jordan 78,  =69,  =2.8 WNBA Lobo 76,  =63.6,  =2.5 Number of standard deviations that a value x is above of below the mean –J: z=(x-  )/  =(78-69)/2.8=3.21 –L: z=(x-  )/  =( )/2.5=4.96 Percentiles: –Percentile of value x P x Example data point 48 in Smoker data 8/40*100=20 th percentile = P 20 Exercise: Locate the percentiles of data points 1, 130 and 250. total number of values Px=Px= number of values less than x

15 Quartiles and percentiles

16 Percentiles and Quartiles Conversely, if you are looking for data in the k th percentile: L=(k/100)*n ntotal number of values kpercentiles being used Llocator that gives position of a value (the 12th value in the sorted list L=12) P k k th percentile (ex: P 25 is 25 th percentile) Example: In class table ( n = 20 ) find value of 21 percentile –L=21/100 * 20 = 4.2 –round up to 5 th data point –--> P 21 = 71 find the 80 th percentile: –L=80/100 * 20 = 16, –WHOLE NUMBER: –P 80 =(89+92)/2=90.5 Quartiles: –Q 1,= P 25, Q 2 = P 50 =median, Q 3 = P 75 P k : k = (L – 1)/n 100 Example: data point 48 in Smoker data is 9 th on table, n= 40. (9 – 1)/40 100=20  48 is in P 20 or 20 th percentile or the first quartile Q 1. Data point 234 is 28 th. k=(28 – 1)/40 100= 68 th percentile, or the 3rd quartile Q 3. total number of values P k: k= number of values less than x Yes: take average of L th and (L+1) st value as Pk No: ROUND UP Pk is the Lth value Compute L=(k/100)*n n=number of values k=percentile SORT DATA START L whole number?

17 Exploratory Data Analysis Exploratory data analysis is the process of using statistical tools (graphs, measures of center and variation) to investigate data sets in order to understand their characteristics. Box plots have less information than histograms and stem-and-leaf plots Not that often used with only one set of data Good when comparing many different sets of data Outlier: Extreme value. (often they are typos when collecting data, but not always). can have a dramatic effect on mean can have dr. effect on standard deviation … on histogram

18 Probability - Chapter 3 False positives and negatives False positive: test incorrectly indicates woman pregnant when she is not. False negative: test incorrectly indicates woman is not pregnant when she is pregnant. True positive: test correctly indicates woman pregnant when she is. True negative: test correctly indicates woman not pregnant when she is not. Test sensitivity: the probability of a true positive. Test specificity: the probability of a true negative. Ex: Abbot test pack indicates that their urinte test has a 0.2% false positive and a 0.6% false negative rate.

19 Overview Rare event rule: If under a given assumption (lottery is fair) the probability of a particular observed event (5 consecutive lottery wins by the same person) is extremely small, the assumption is probably not correct.

20 Fundamentals Definitions: Procedure: rolling a die, 2 dice, tossing a coin, … A procedure is an action whose outcome(s) (result) is (are) random. Event: Any collection of outcomes of a procedure. Simple events: an event that cannot be simplified even further. Sample space of a procedure: The set of all simple events. Examples: Procedure: rolling a die, 2 dice, Event: For 1 die, any of 1,2,3,4, 5,6, “even”, greater than 3”. For 2 dice: “sum is 7”, “sum is bigger than 10”, “1-1”, “1-2”, “2- 1”, “both even”. Simple events: for 1 die:1, 2, 3,4, 5, 6. For 2 dice: 1-1, 1-2,1-3,1- 4,1-5,1-6, 2-1, 2-2, 2-3, 2-4, 2-5, 2-6, 3-1, …, 6-6 Sample space of a procedure: The set of all simple events. Notation: P probability A, B, C specific events P(A)the probability of the event A occurring

21 Defining a probability Relative Frequency Approach: Observe a procedure a large number of times and count the number of times that event A occurs, then P(A) is estimated by Examples: A tack falls up: repeat the experiment 1000 times and count how many times the tack falls up, then P(A) is the ratio of number it falls up over the number of times the tack was thrown. number of trials P(A)= number of times A occurs Classical Approach: If a procedure has n simple (different) events that can occur that are equally likely, and there are s different ways that A can occur then number of simple events P(A)= number of ways A can occur = s n Subjective Probability: P(A), the probability of the event A, is found by based on knowledge of relevant circumstances. Total # of options P(even)= # of ways face even = 3 6 Weather forecast: need to be expert to estimate wisely if it will rain tomorrow or not. Rolling a die: assuming the die is not loaded each face has the same chance of falling upside

22 More examples Flying on a commercial plane. Find the probability that a random selected adult has flown on a plane. 2 events: flown, or not. events not equally likely (cannot use classical approach) use relative frequency approach. Gallup poll: 815 randomly selected adults, 710 indicated the have flown Roulette: Bet on number 13 on a roulette game. What is the probability that you will lose? 38 slots, all equally likely, use classic approach. 37 result in loss. P(flew on commercial plane)= = P(loss)= Meteorites: What is the probability that your house will be hit by a meteorite? In absence of historical data, need 3rd approach. We know the chance is very small, say 0.000,000,001. This is a subjective estimate. A general ballpark.

23 Law of large numbers Law of large numbers: As a procedure is repeated again and again, the relative frequency probability of an event tends to approach the actual probability. 319 for 133against 39 no opinion 491 total P(for)= = Example: 2 boys, 1 girl. What is that when a couple has 3 children, exactly 2 out of the 3 are boys. Assuming that having boys or girls is equally likely, use classical approach. Options are: –boy-boy-boy –boy-boy-girl –boy-girl-boy –boy-girl-girl –girl-boy-boy –girl-boy-girl –girl-girl-boy –girl-girl-girl 8 possible outcomes, 3 correspond to exactly 2 boys P(A) s n Example: Death penalty. In a Gallup poll, adults are randomly selected and asked if they are in favor or against the death penalty. The responses include 319 who are for it, 133 who are against it, and 39 that have no opinion. Based on these results, estimate the probability that a randomly selected person is in favor of the death penalty. P(exaclty 3 boys)= =

24 Complementary probabilities and properties Thanksgiving day. What is the probability that Thanksgiving day falls on a a)Wednesday? b)Thursday? –Thanksgiving is always on a Thursday! a)Impossible: P(Thxgiv. Wed)=0 b)Always true: P(Thxgiv. Thu)=1 Examples: If X denotes the number the face a die shows when it lands, then –P( X = 7 ) = 0 –P( X ≤ 7 ) = 1 –P( X not even ) = 1- P( X even ) –P( { X ≤ 2} c ) = 1 - P( { X ≤ 2 } ) = 1 - 2/6 = 4/6 = 2/3 = P( X > 2 ) –P( X ≥0 ) = 1 –For any event A, P(A)≥0 –P(A)=0 only if A cannot happen –For any event A, P(A ) ≤ 1 –P(A)=1 exactly only if A happens for sure If Y denotes the sum of the numbers on the faces when throwing 2 dice: –P( Y = 1) =0 –P( 2 ≤ Y ≤ 12 ) =1 –P(Y=4) = 3/36 namely 1-3, 2-2, and 3-1 –P({Y=2} c ) = 1-P(Y=2)=1-1/36 = 35/36 The probability of the impossible event is 0. P(  ) =0. The probability of the certain event is 0. P(  ) =1. For any event A, 0 ≤ P(A) ≤ 1. If A c denotes the complement event to A, then P(A)+P(A c )=1 HW: p.120 #1-7

25 Addition Rule A compound event is an event combining 2 or more simple events. Events A and B are disjoint (or mutually exclusive) if they cannot both occur together. In such a case, the intersection of the events is empty: A  B = ø and we recall that P( ø ) = 0. We then have P(A  B) = P(A) + P(B) B A ABAB  A B  Notation: P(A  B)intersection of A and B (both A and B occur) P(A  B) union of A and B (either A or B or both occur) Addition Rule: P(A  B) = P(A) + P(B) – P(A  B) Mendel: hybridization experi- ments. Peas with purple (p) and white (w) flowers, green (g) and yellow (y) pods. 8 p 9 g 6 w5 y P(g  p) = 9/14 + 8/14 – 5/14 = P(g) + P(p) – P(g  p) Idea: count data only once! Venn diagrams Overlapping eventsNon-Overlapping events (disjoint) P(A  B ) = P(A ) + P( B ) – P( A  B) +=–

26 Examples: addition rule Clinical trials of pregnancy test: Assuming that 1 person is selected at random from the 99 people in the test, find the probability of selecting a subject who is pregnant or had a positive test result. P(pregnant) = (80 + 5)/99 P(test positive) = (80 +3 ) / 99 P(pregnant and test positive) = 80 / 99 P(pregnant or test positive) = P(pregnant) + P(test positive) - P(pregnant and test positive) = 85/ / /99 = 88/99 = 8/9 = Alternatively P(pregnant or positive)= P(pregnant and positive) + P(pregnant and negative) + P (not pregnant but positive) = 80/99 + 5/99 + 3/99 Note that Pregnant =(pregnant & pos) + (preg. & neg) Positive = (pregnant & pos) + (pos. & not preg.) Substract to avoid double counting!

27 Multiplication rule P( A and B ) = P( A  B ) Example: Answer at random 1.True/false: A pound of feathers is heavier than a pound of gold. 2.Which has affected society most: a)Remote control b)Sneakers with high heels c)Hostess twinkies d)Computers e)Phone To answer at random q. 1, each choice has probability 1/2. To answer at random q. 2, each choice has probability 1/5. P(both answers correct) = P( T and (d) ) = 1/2 * 1/5 =1 / 10 = P(T )  P(d ) a T F b c d e a b c d e

28 Multiplication rule: independent event If events A and B are independent, then P( A  B ) = P (A) P(B ) Example: Throwing 2 dice. What is the probability that the first number is even and the second one is larger than 4. Answer: Independent? YES! A: 1st die even B: second die larger than 4 P(A ) = 3/6 = 1/2 P(B) = P(“face shows 5 or 6”) = 2 / 6 = 1/3 P(A  B ) = P (A) P(B ) = 1/2 * 1/3 = 1/6 From graph there are 6 options that are good: 2-5, 2-6, 4-5,4-6, 6- 5,6-6: P(A  B ) = 6/36 = 1/6