Slide 1 Lecture # 4&5 CHS 221 DR. Wajed Hatamleh
Slide 2 i variance
Slide 3 Population variance: Square of the population standard deviation Definition The variance of a set of values is a measure of variation equal to the square of the standard deviation. Sample variance: Square of the sample standard deviation s
Slide 4 Variance - Notation standard deviation squared ss 2 2 } Notation Sample variance Population variance
Slide 5 ia Measures of Relative Standing
Slide 6 z Score (or standard score) the number of standard deviations that a given value x is above or below the mean. Definition
Slide 7 Sample Population x - µ z = Round to 2 decimal places Measures of Position z score z = x - x s
Slide 8 Interpreting Z Scores Whenever a value is less than the mean, its corresponding z score is negative Ordinary values: z score between –2 and 2 sd Unusual Values:z score 2 sd FIGURE 2-14
Slide Percentiles Measures of central tendency that divide a group of data into 100 parts At least n% of the data lie below the nth percentile, and at most (100 - n)% of the data lie above the nth percentile Example: 90th percentile indicates that at least 90% of the data lie below it, and at most 10% of the data lie above it The median and the 50th percentile have the same value. Applicable for ordinal, interval, and ratio data Not applicable for nominal data
Slide Percentiles: Computational Procedure Organize the data into an ascending ordered array. Calculate the percentile location: Determine the percentile’s location and its value. If i is a whole number, the percentile is the average of the values at the i and (i+1) positions. If i is not a whole number, round it up
Slide Percentiles: Example Raw Data: 14, 12, 19, 23, 5, 13, 28, 17 Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28 Location of 30th percentile: The location index, i, is not a whole number; round it up. Percentile is 13
Slide Quartiles Measures of central tendency that divide a group of data into four subgroups Q 1 : 25% of the data set is below the first quartile Q 2 : 50% of the data set is below the second quartile Q 3 : 75% of the data set is below the third quartile Q 1 is equal to the 25th percentile Q 2 is located at 50th percentile and equals the median Q 3 is equal to the 75th percentile Quartile values are not necessarily members of the data set
Slide 13 Definition Q 1 (First Quartile) separates the bottom 25% of sorted values from the top 75%. Q 2 (Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%. Q 1 (Third Quartile) separates the bottom 75% of sorted values from the top 25%.
Slide Quartiles 25% Q3Q3 Q2Q2 Q1Q1
Slide 15 Q 1, Q 2, Q 3 divides ranked scores into four equal parts Quartiles 25% Q3Q3 Q2Q2 Q1Q1 (minimum)(maximum) (median)
Slide Ordered array: 106, 109, 114, 116, 121, 122, 125, 129 Q 1 Q 2 : Q 3 : Quartiles: Example
Slide Interquartile Range Range of values between the first and third quartiles Range of the “middle half” Less influenced by extremes
Slide 18 Recap In this section we have discussed: z Scores z Scores and unusual values Quartiles Percentiles Converting a percentile to corresponding data values Other statistics
Slide 19 Exploratory Data Analysis (EDA)
Slide 20 Exploratory Data Analysis is the process of using statistical tools (such as graphs, measures of center, and measures of variation) to investigate data sets in order to understand their important characteristics Definition
Slide 21 Definition An outlier is a value that is located very far away from almost all the other values
Slide 22 Important Principles An outlier can have a dramatic effect on the mean An outlier have a dramatic effect on the standard deviation An outlier can have a dramatic effect on the scale of the histogram so that the true nature of the distribution is totally obscured
Slide 23 For a set of data, the 5-number summary consists of the minimum value; the first quartile Q 1 ; the median (or second quartile Q 2 ); the third quartile, Q 3 ; and the maximum value A boxplot ( or box-and-whisker-diagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q 1 ; the median; and the third quartile, Q 3 Definitions
Slide 24 Boxplots Figure 2-16
Slide 25 Figure 2-17 Boxplots
Slide 26 Recap In this section we have looked at: Exploratory Data Analysis Effects of outliers 5-number summary and boxplots
Slide 27 Probability
Slide 28 Copyright © 2004 Pearson Education, Inc. Definitions Event Any collection of results or outcomes of a procedure. Simple Event An outcome or an event that cannot be further broken down into simpler components. Sample Space Consists of all possible simple events. That is, the sample space consists of all outcomes that cannot be broken down any further.
Slide 29 Experiments & Outcomes 1.Experiment –Process of Obtaining an Observation, Outcome or Simple Event 2.Sample Point –Most Basic Outcome of an Experiment 3.Sample Space (S) –Collection of All Possible Outcomes
Slide 30 Outcome Examples Toss a Coin, Note FaceHead, Tail Toss 2 Coins, Note FacesHH, HT, TH, TT Select 1 Card, Note Kind 2, 2 ,..., A (52) Select 1 Card, Note ColorRed, Black Play a Football GameWin, Lose, Tie Observe GenderMale, Female ExperimentSample Space
Slide 31 Tree Diagram Outcome S = {HH, HT, TH, TT} Sample Space. Experiment: Toss 2 Coins. Note Faces. T H T H T HH HT TH TT H
Slide 32 Copyright © 2004 Pearson Education, Inc. Notation for Probabilities P - denotes a probability. A, B, and C - denote specific events. P (A) - denotes the probability of event A occurring.
Slide 33 Copyright © 2004 Pearson Education, Inc. Basic Rules for Computing Probability Rule 1: Relative Frequency Approximation of Probability Conduct (or observe) a procedure a large number of times, and count the number of times event A actually occurs. Based on these actual results, P(A) is estimated as follows: P(A) =P(A) = number of times A occurred number of times trial was repeated
Slide 34 Copyright © 2004 Pearson Education, Inc. Basic Rules for Computing Probability Rule 2: Classical Approach to Probability (Requires Equally Likely Outcomes) Assume that a given procedure has n different simple events and that each of those simple events has an equal chance of occurring. If event A can occur in s of these n ways, then P(A) = number of ways A can occur number of different simple events s n =
Slide 35 Copyright © 2004 Pearson Education, Inc. Basic Rules for Computing Probability Rule 3: Subjective Probabilities P(A), the probability of event A, is found by simply guessing or estimating its value based on knowledge of the relevant circumstances.
Slide 36 Copyright © 2004 Pearson Education, Inc. Law of Large Numbers As a procedure is repeated again and again, the relative frequency probability (from Rule 1) of an event tends to approach the actual probability.
Slide 37 Copyright © 2004 Pearson Education, Inc. Example Roulette You plan to bet on number 13 on the next spin of a roulette wheel. What is the probability that you will lose? Solution A roulette wheel has 38 different slots, only one of which is the number 13. A roulette wheel is designed so that the 38 slots are equally likely. Among these 38 slots, there are 37 that result in a loss. Because the sample space includes equally likely outcomes, we use the classical approach (Rule 2) to get P(loss) =
Slide 38 Copyright © 2004 Pearson Education, Inc. Probability Limits The probability of an event that is certain to occur is 1. The probability of an impossible event is 0. 0 P(A) 1 for any event A.
Slide 39 What is Probability? 1.Numerical Measure of Likelihood that Event Will Occur –P(Event) –P(A) –Prob(A) 2.Lies Between 0 & 1 3.Sum of outcome probabilities is Certain Impossible
Slide 40 Copyright © 2004 Pearson Education, Inc. Possible Values for Probabilities Figure 3-2
Slide 41 Copyright © 2004 Pearson Education, Inc. Definition The complement of event A, denoted by A, consists of all outcomes in which the event A does not occur.
Slide 42 Copyright © 2004 Pearson Education, Inc. Example Birth Genders In reality, more boys are born than girls. In one typical group, there are 205 newborn babies, 105 of whom are boys. If one baby is randomly selected from the group, what is the probability that the baby is not a boy? Solution Because 105 of the 205 babies are boys, it follows that 100 of them are girls, so P(not selecting a boy) = P(boy) = P(girl)
Slide 43 Copyright © 2004 Pearson Education, Inc. Rounding Off Probabilities When expressing the value of a probability, either give the exact fraction or decimal or round off final decimal results to three significant digits. (Suggestion: When the probability is not a simple fraction such as 2/3 or 5/9, express it as a decimal so that the number can be better understood.)
Slide 44 Copyright © 2004 Pearson Education, Inc. Definitions The actual odds against event A occurring are the ratio P(A)/P(A), usually expressed in the form of a:b (or “a to b”), where a and b are integers having no common factors. The actual odds in favor event A occurring are the reciprocal of the actual odds against the event. If the odds against A are a:b, then the odds in favor of A are b:a. The payoff odds against event A represent the ratio of the net profit (if you win) to the amount bet. payoff odds against event A = (net profit) : (amount bet)
Slide 45 Copyright © 2004 Pearson Education, Inc. Recap In this section we have discussed: Rare event rule for inferential statistics. Probability rules. Law of large numbers. Complementary events. Rounding off probabilities. Odds.