Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jerry Post Copyright © 2003 1 Database Management Systems: Data Mining Statistics Review.

Similar presentations


Presentation on theme: "Jerry Post Copyright © 2003 1 Database Management Systems: Data Mining Statistics Review."— Presentation transcript:

1 Jerry Post Copyright © 2003 1 Database Management Systems: Data Mining Statistics Review

2 DATABASE 2 Probability  Relative frequency approach: The number of times that an event occurs out of the total population of events.  You have 3 red balls and 7 white balls in a bag. The probability of drawing a white ball on the first try is 70%.  Your customers are distributed across five cities: 35% in City A, 25% in City B, 20% in City C, 15% in City D, 15% in City E.  Subjective probability: A belief in the likelihood of an outcome. Often subjective because of lack of full information. Generally modified over time based on acquisition of new information. It is important to separate belief from preference (but difficult), and also important that subjective probability maintain consistency.  There is a 65% chance that the Federal Reserve board will reduce interest rates at the next meeting.

3 DATABASE 3 Probability: Frequency  Need a complete count of events  Permutations: Order does count  Combinations: Order does not count  Basic multiplication rule. If a single action has k ways to be performed, and the action is performed n times; the total number of possible outcomes is: k*k*k*…*k  Flip a coin five times (n=5). A single act has two outcomes (k=2), so there are 2 5 = 32 possible outcomes.

4 DATABASE 4 Counting: Permutation  How many ways can objects (or actions) be rearranged?  You have four cards: A, K, Q, J. How many ways can they be arranged?  Four items (n) arranged one card at a time (r):4 * 3 * 2 * 1 A K Q J Q J K J K Q J Q J K Q K K, Q, J 4321

5 DATABASE 5 Permutation: General  Ways to rearrange n items taken r at a time:  n(n-1)(n-2)…(n-r+1)

6 DATABASE 6 Combinations  Number of ways of selecting items, and order does not count.  Combinations are smaller than permutations  You can divide the number of permutations by the number of ways of arranging the r objects (r!)  Elect three people from a group of ten. n = 10, r = 3

7 DATABASE 7 Probability Rules: Complement  Complement (opposite):  P(E) + P(E’) = 1  The probability of an event happening or not happening is one.

8 DATABASE 8 Probability Rules: Mutually Exclusive  Mutually Exclusive: Only one event of a group can happen. The probability of both occurring is zero.  P(A  B) = 0  Then, the probability of one or the other of the events occurring is computed by the sum of the probabilities:  P(A  B) = P(A) + P(B)  Example, pool balls, numbered 1 through 10  Event A: Draw a ball number <= 3  Event B: Draw a ball number >= 6  P(A or B) = 3/10 + 5/10 = 8/10  Can also find as complement: 1 – 2/10 = 8/10  In general, P(E 1  E 2  …  E n ) =  P(E i )

9 DATABASE 9 Probability Rules: Independence  Events are independent (pairwise) if they have no influence on each other.  If events are independent, the probability of both events occurring is found by multiplying their individual probabilities:  P(A  B) = P(A) P(B)  Example: An urn has 3 red balls and 7 white ones. Draw a ball and then flip a coin. What is the probability you draw a white ball and flip heads?  P(A  B) = 0.7 * 0.5 = 0.35

10 DATABASE 10 Conditional Probability  The probability that event A will occur given that event B has already happened: P(A | B)  Example 1: An urn has 3 red balls and 7 white ones. On the first draw you pull out a white ball (event B). If you do not replace that ball in the urn, what is the probability of drawing a red ball next (Event A). Answer: 3/9 Note that these events are not independent.  In general, the probability of two events occuring:  P(A  B) = P(A) P(B | A)  Example 2: Draw 2 cards from a 52-card deck without replacement. What is the probability that both are kings?  P(King 1 ) = 4/52P(King 2 | King 1 ) = 3/51  P(King 2  King 1 ) = 4/52 * 3/51

11 DATABASE 11 Probability: Joint and Conditional Table FemaleMale Married.42.18.60 Not Married.28.12.40.70.301.00 P(Female) =.70 P(Married  Female) =.42 P(Married | Female) = P(M  F)/P(F) =.42/.70

12 DATABASE 12 Joint Probability: Tree Diagram Manufacturing: Group A: 4 machines 5% defect rate Group B: 6 machines, 10% defect rate Choose a machine, then a product—probability defective? * * * * * * * P(A) =.4 P(B) =.6 P(D | A) =.05 P(D’ | A) =.95 P(D | B) =.10 P(D’ | B) =.90 P(A  D) =.02 P(A  D’) =..38 P(B  D) =.06 P(B  D’) =.54 1.00

13 DATABASE 13 Joint Probabilities: Table ProbabilityDefective (D)Non-defective (D’) P(A) = 0.40.050.95 P(B) = 0.60.100.90 ProductionDefective (D)Non-defective (D’) A0.020.38 B0.060.54 Total0.080.92 P(A  D) = P(A)*P(D|A) = 0.4(0.05) =.2

14 DATABASE 14 Bayes’ Theorem Now, in a sense, work backwards. We sample a part at random and it is defective. What is the probability that it came from machine A? Machine B? P(A | D) = 0.02/0.08 = 1/4 P(B | D) = 0.06/0.08 = 3/4 In this example, the machine is the state of nature we wish to identify, and defective or not is the information.

15 DATABASE 15 Bayes’ Theorem in General We know: (1)There are n states of nature S 1, S 2, …, S n (2)An initial (a priori) probability for each state (3)Some type of information I (4)The conditional probabilities: P(I | S i ) We can compute the posterior probabilities, given the new information:

16 DATABASE 16 Bayes’ Theorem Example  Chao: Statistics for Management/2e  States of economy: S1: recession, S2: stable, S3: prosperity  P(S1) =.25, P(S2) =.5, P(S3) =.25 (in general/a priori)  We have forecasts as information. The forecasts are either optimistic (I) or pessimistic (I’)  The results of the forecasts in the past are as follows: Prior Probability State of Economy Optimistic (I)Pessimistic (I’) P(S1) =.25S10.10.9 P(S2) =.50S20.5 P(S3) =.25S30.80.2

17 DATABASE 17 Example: Joint Probability Prior ProbabilityState of Economy Optimistic (I) P(I | Si) Pessimistic (I’) P(I’ | Si) P(S1) =.25S10.10.9 P(S2) =.50S20.5 P(S3) =.25S30.80.2 StateOptimistic (I)Pessimistic (I’) S1 P(S1  I) = 0.025 0.225 S2 P(S2  I) = 0.250 0.250 S3 P(S2  I) = 0.200 0.050 TotalP(I) = 0.475P(I’) = 0.525

18 DATABASE 18 Bayes’ Example StateOptimistic (I)Pessimistic (I’) S1 P(S1  I) = 0.025 0.225 S2 P(S2  I) = 0.250 0.250 S3 P(S2  I) = 0.200 0.050 TotalP(I) = 0.475P(I’) = 0.525 Probability next year is prosperous (S3) if the forecast is optimistic (I): P(S3 | I) = P(S3  I)/P(I) = 0.200/0.475 =.421

19 DATABASE 19 Bayes: Prior and Posterior Probabilities Probability estimates at the start (a priori) are naïve: P(S1) = 0.25 P(S2) = 0.50 P(S3) = 0.25 Probabilities after the forecast (posterior) reflect the new information: P(S1 | I) = 0.053P(S1 | I’) = 0.429 P(S2 | I) = 0.526P(S2 | I’) = 0.476 P(S3 | I) = 0.421P(S3 | I’) = 0.095

20 DATABASE 20 Mean and Standard Deviation Mean=0 Standard deviations: 1, 2, 3

21 DATABASE 21 Cumulative Normal P(X<=3) 0.9987 P(X<=0) 0.5000 P(X<=1) 0.8413 P(X<=2) 0.9773

22 DATABASE 22 Hypothesis Testing Critical value  


Download ppt "Jerry Post Copyright © 2003 1 Database Management Systems: Data Mining Statistics Review."

Similar presentations


Ads by Google