Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probability and Statistics

Similar presentations


Presentation on theme: "Probability and Statistics"— Presentation transcript:

1 Probability and Statistics
Joyeeta Dutta-Moscato May 24, 2016

2 There are three kinds of lies: lies, damned lies and statistics
- Mark Twain, attributed to Disraeli

3 Terms and concepts Descriptive Statistics
Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution Descriptive Statistics Hypothesis Null hypothesis (H0) Alternate hypothesis (HA) Significance P-value Confidence Interval Statistical Hypothesis Testing Method of least squares Euclidean distance Overfitting & generalization Statistical Models

4 Central tendency and Spread
Mean Median Mode Variance, standard deviation Normal distribution

5 Central tendency and Spread
Mean Median Mode Variance, standard deviation Normal distribution

6 But do numbers tell the full story?

7 Anscombe’s Quartet Good graphics reveal data Anscombe’s quartet

8 Building a model from data
Fitting the data to a model: y = f(x) Objective: Minimize mean square error Does mean square error = 0 mean this is the best model? What does this mean about the relationship between x and y?

9 Correlation When we say that two genes are correlated, we mean that they vary together. But how to quantify the degree of correlation? Pearson’s r measures the extent to which two random variables are linearly related. Perfect linear correlation = 1 No correlation = 0 Anti-correlation = -1

10 Positive Correlations

11 Negative Correlations

12 What do correlations tell us?
Interesting site: So how do we do make statements of causality? Can ask the question: How likely is event X given an event Y?

13 Probability: How likely is it?
How likely is a certain observation? Possible Outcomes P(Head) = ? P(Tail) = ? Head, Tail P(1) = ? P(2) = ? . P(6) = ? 1, 2, 3, 4, 5, 6

14 Probability of Multiple Events
Toss a coin twice. How likely are you to observe 2 Heads? Key condition: INDEPENDENCE P(2 Heads) = P(Head) x P(Head) What is the DISTRIBUTION of outcomes?

15 Probability of Multiple Events
Toss a coin twice. How likely are you to observe 2 Heads? Key condition: INDEPENDENCE P(2 Heads) = P(Head) x P(Head) What is the DISTRIBUTION of outcomes? P(2 Heads) = ¼ P(2 Tails) = ¼ P(1 Head) = P(1 Head, 1 Tail) + P( 1 Tail, 1 Head) = ¼ + ¼ = ½ Key condition: Must sum to 1

16 Probability of Multiple Events
Toss a coin twice. How likely are you to observe 2 Heads? Key condition: INDEPENDENCE P(2 Heads) = P(Head) x P(Head) What is the DISTRIBUTION of outcomes? P(2 Heads) = ¼ P(2 Tails) = ¼ P(1 Head) = P(1 Head, 1 Tail) + P( 1 Tail, 1 Head) = ¼ + ¼ = ½ Histogram of outcomes of 10 tosses Key condition: Must sum to 1

17 Probability of Multiple Events
Toss a coin twice. How likely are you to observe 2 Heads? Key condition: INDEPENDENCE P(2 Heads) = P(Head) x P(Head) What is the DISTRIBUTION of outcomes? P(2 Heads) = ¼ P(2 Tails) = ¼ P(1 Head) = P(1 Head, 1 Tail) + P( 1 Tail, 1 Head) = ¼ + ¼ = ½ Histogram of outcomes of 10 tosses Key condition: Must sum to 1 As the number of independent (random) events grows, the distribution approaches a NORMAL or GAUSSIAN distribution

18 Cumulative Distribution
The probability distribution shows the probability of the value X The cumulative distribution shows the probability of a value less than or equal to X Wikipedia:

19 Statistical Hypothesis Testing
You are running experiments to test the effect of a drug on subjects. How likely is it that the effect would be observed even if no real relation exists? If the likelihood is sufficiently small (eg. < 1%), then it can be assumed that a real relation exists. Otherwise, any observed effect may simply be due to chance H0 : Null hypothesis No relation exists HA : Alternate hypothesis There is some sort of relation

20 Statistical Hypothesis Testing
SIGNIFICANCE LEVEL is decided a priori to decide whether H0 is accepted or rejected. (Eg: 0.1, 0.5, 0.01) If P-VALUE < significance level, then H0 is rejected. i.e. The result is considered STATISTICALLY SIGNIFICANT Wikipedia:

21 Error reporting How reliable is the measurement?
(How reliable is the estimate?) Eg: 95% CONFIDENCE INTERVAL  We are 95% confident that the true value is within this interval STANDARD ERROR can be used to approximate confidence intervals Standard error = Standard deviation of the sampling distribution

22 Back to Probability 0 < Prob < 1
P(A) = 1 – P(AC) [AC = Complement of A] If events A and B are independent, (event B has no effect on the probability of event A) Then: P (A, B) = P(A) · P(B) If they are not independent, Then: P (A, B) = P(A|B) · P(B) P (A, B) = JOINT PROBABILITY of A and B P (A|B) = CONDITIONAL PROBABILITY of A given B

23 Example We are given 2 urns, each containing a collection of colored balls. Urn 1 contains 2 white and 3 blue balls; Urn 2 contains 3 white and 4 blue balls. A ball is drawn at random from urn 1 and put into urn 2, and then a ball is picked at random from urn 2 and examined. What is the probability that the ball is blue?

24 Example We are given 2 urns, each containing a collection of colored balls. Urn 1 contains 2 white and 3 blue balls; Urn 2 contains 3 white and 4 blue balls. A ball is drawn at random from urn 1 and put into urn 2, and then a ball is picked at random from urn 2 and examined. What is the probability that the ball is blue? Urn 1 Urn 2 3 5 2 4 23 x + x = = 0.575 5 8 5 8 40 Scenario 1: The ball picked from Urn 1 is blue Scenario 2: The ball picked from Urn 1 is white

25 Bayes Theorem P (B|A)· P(A) P (A|B) = P (B) How? P (B|A)· P(A)
P (A, B) = P(A|B) · P(B) P (A, B) = P(B, A) so P(A|B) = P (A, B) / P(B) P (B, A) = P(B|A)· P(A) or P(A|B) = P(B|A)· P(A) / P(B) Also, This is equivalent to: P (B|A)· P(A) P (A|B) = P (B|A)· P(A) + P (B|AC)· P(AC)

26 Contingency Table Courtesy: Rich Tsui, PhD

27 Contingency Table You have developed a test to detect a certain disease What is the True Positive Rate (TPR) and True Negative Rate (TNR) of this test? Sensitivity = TPR = TP / (TP + FN) = P(Test+ | Disease+) Specificity = TNR = TN / (TN + FP) = P(Test- | Disease-) What is the Positive Predictive Value (PPV) and Negative Predictive Value (NPV)? PPV = TP / (TP + FP) = P(Disease+ | Test+) NPV = TN / (TN + FN) = P(Disease- | Test-)

28 Sensitivity (TPR) The probability of sick people who are correctly identified as having the condition Specificity (TNR) The probability of healthy people who are correctly identified as not having the condition Positive predictive value (PPV) Given that you test positive, the probability that you actually have the condition. Negative predictive value (NPV) Given that you test negative, the probability that you actually do not have the condition.

29 The Prevalence of a particular disease is 1/10.
A test for this disease provides a correct diagnosis in 90% of cases (i.e. if you have the disease, 90% of the time you will test positive, and if you do not have the disease, 90% of the time you will test negative). Given that you test positive for the disease, what is the probability that you actually have the disease?

30 The Prevalence of a particular disease is 1/10.
A test for this disease provides a correct diagnosis in 90% of cases (i.e. if you have the disease, 90% of the time you will test positive, and if you do not have the disease, 90% of the time you will test negative). Given that you test positive for the disease, what is the probability that you actually have the disease? Prevalence = Prior probability in population T+  Test positive T-  Test negative D+  Disease present D-  Disease absent Solution: P (D+) = 0.1 P (T+|D+) = 0.9 P (T-|D-) = 0.9, therefore P(T+|D-) = 1 – 0.9 = 0.1 P (T+|D+)· P(D+) (0.1)· (0.9) P (D+|T+) = = P (T+|D+)· P(D+) + P (T+|D-)· P(D-) (0.1)· (0.9) + (0.9)· (0.1) = 0.5


Download ppt "Probability and Statistics"

Similar presentations


Ads by Google