Probability and Statistics

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Statistical Issues in Research Planning and Evaluation
Quantitative Skills 4: The Chi-Square Test
Probability - 1 Probability statements are about likelihood, NOT determinism Example: You can’t say there is a 100% chance of rain (no possibility of.
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
1 Midterm Review Econ 240A. 2 The Big Picture The Classical Statistical Trail Descriptive Statistics Inferential Statistics Probability Discrete Random.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
BHS Methods in Behavioral Sciences I
Chapter Topics Types of Regression Models
Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.
Introduction to Regression Analysis, Chapter 13,
Inferential Statistics
AM Recitation 2/10/11.
Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Basic statistics 11/09/13.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Individual values of X Frequency How many individuals   Distribution of a population.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Theory of Probability Statistics for Business and Economics.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
The Scientific Method Probability and Inferential Statistics.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
Experimental Psychology PSY 433 Appendix B Statistics.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
12/7/20151 Probability Introduction to Probability, Conditional Probability and Random Variables.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Data Analysis.
1 URBDP 591 A Lecture 12: Statistical Inference Objectives Sampling Distribution Principles of Hypothesis Testing Statistical Significance.
PCB 3043L - General Ecology Data Analysis.
© Copyright McGraw-Hill 2004
Review Session Chapter 2-5.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Probability: Introduction Definitions,Definitions, Laws of ProbabilityLaws of Probability Random VariablesRandom Variables DistributionsDistributions.
Elementary Probability.  Definition  Three Types of Probability  Set operations and Venn Diagrams  Mutually Exclusive, Independent and Dependent Events.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Chapter 13 Understanding research results: statistical inference.
Hypothesis Testing and Statistical Significance
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
CHAPTER 3 Key Principles of Statistical Inference.
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Methods of Presenting and Interpreting Information Class 9.
Outline Sampling Measurement Descriptive Statistics:
Applied statistics Usman Roshan.
GS/PPAL Section N Research Methods and Information Systems
Statistical analysis.
Comparing Systems Using Sample Data
Applied statistics Usman Roshan.
ESTIMATION.
Statistics for Managers using Microsoft Excel 3rd Edition
Data analysis Research methods.
Statistical analysis.
PCB 3043L - General Ecology Data Analysis.
Understanding Results
Chapter 2 Simple Comparative Experiments
Social Research Methods
Introduction to Inferential Statistics
SDPBRN Postgraduate Training Day Dundee Dental Education Centre
Chapter 9 Hypothesis Testing.
Modelling data and curve fitting
STATISTICS Topic 1 IB Biology Miss Werba.
Statistics II: An Overview of Statistics
Inferential Statistics
Analyzing and Interpreting Quantitative Data
Presentation transcript:

Probability and Statistics Joyeeta Dutta-Moscato May 24, 2016

There are three kinds of lies: lies, damned lies and statistics - Mark Twain, attributed to Disraeli

Terms and concepts Descriptive Statistics Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution Descriptive Statistics Hypothesis Null hypothesis (H0) Alternate hypothesis (HA) Significance P-value Confidence Interval Statistical Hypothesis Testing Method of least squares Euclidean distance Overfitting & generalization Statistical Models

Central tendency and Spread Mean Median Mode Variance, standard deviation Normal distribution http://ceaccp.oxfordjournals.org/content/7/4/127.full

Central tendency and Spread Mean Median Mode Variance, standard deviation Normal distribution http://ceaccp.oxfordjournals.org/content/7/4/127.full http://www.mathsisfun.com/data/standard-normal-distribution.html

But do numbers tell the full story? https://en.wikipedia.org/wiki/Anscombe's_quartet

Anscombe’s Quartet Good graphics reveal data Anscombe’s quartet

Building a model from data Fitting the data to a model: y = f(x) Objective: Minimize mean square error Does mean square error = 0 mean this is the best model? What does this mean about the relationship between x and y?

Correlation When we say that two genes are correlated, we mean that they vary together. But how to quantify the degree of correlation? Pearson’s r measures the extent to which two random variables are linearly related. Perfect linear correlation = 1 No correlation = 0 Anti-correlation = -1

Positive Correlations

Negative Correlations

What do correlations tell us? Interesting site: http://www.tylervigen.com/ So how do we do make statements of causality? Can ask the question: How likely is event X given an event Y?

Probability: How likely is it? How likely is a certain observation? Possible Outcomes P(Head) = ? P(Tail) = ? Head, Tail P(1) = ? P(2) = ? . P(6) = ? 1, 2, 3, 4, 5, 6

Probability of Multiple Events Toss a coin twice. How likely are you to observe 2 Heads? Key condition: INDEPENDENCE P(2 Heads) = P(Head) x P(Head) What is the DISTRIBUTION of outcomes?

Probability of Multiple Events Toss a coin twice. How likely are you to observe 2 Heads? Key condition: INDEPENDENCE P(2 Heads) = P(Head) x P(Head) What is the DISTRIBUTION of outcomes? P(2 Heads) = ¼ P(2 Tails) = ¼ P(1 Head) = P(1 Head, 1 Tail) + P( 1 Tail, 1 Head) = ¼ + ¼ = ½ Key condition: Must sum to 1

Probability of Multiple Events Toss a coin twice. How likely are you to observe 2 Heads? Key condition: INDEPENDENCE P(2 Heads) = P(Head) x P(Head) What is the DISTRIBUTION of outcomes? P(2 Heads) = ¼ P(2 Tails) = ¼ P(1 Head) = P(1 Head, 1 Tail) + P( 1 Tail, 1 Head) = ¼ + ¼ = ½ Histogram of outcomes of 10 tosses Key condition: Must sum to 1

Probability of Multiple Events Toss a coin twice. How likely are you to observe 2 Heads? Key condition: INDEPENDENCE P(2 Heads) = P(Head) x P(Head) What is the DISTRIBUTION of outcomes? P(2 Heads) = ¼ P(2 Tails) = ¼ P(1 Head) = P(1 Head, 1 Tail) + P( 1 Tail, 1 Head) = ¼ + ¼ = ½ Histogram of outcomes of 10 tosses Key condition: Must sum to 1 As the number of independent (random) events grows, the distribution approaches a NORMAL or GAUSSIAN distribution

Cumulative Distribution The probability distribution shows the probability of the value X The cumulative distribution shows the probability of a value less than or equal to X Wikipedia: http://en.wikipedia.org/wiki/Cumulative_distribution_function

Statistical Hypothesis Testing You are running experiments to test the effect of a drug on subjects. How likely is it that the effect would be observed even if no real relation exists? If the likelihood is sufficiently small (eg. < 1%), then it can be assumed that a real relation exists. Otherwise, any observed effect may simply be due to chance H0 : Null hypothesis No relation exists HA : Alternate hypothesis There is some sort of relation

Statistical Hypothesis Testing SIGNIFICANCE LEVEL is decided a priori to decide whether H0 is accepted or rejected. (Eg: 0.1, 0.5, 0.01) If P-VALUE < significance level, then H0 is rejected. i.e. The result is considered STATISTICALLY SIGNIFICANT Wikipedia: http://en.wikipedia.org/wiki/P-value

Error reporting How reliable is the measurement? (How reliable is the estimate?) Eg: 95% CONFIDENCE INTERVAL  We are 95% confident that the true value is within this interval STANDARD ERROR can be used to approximate confidence intervals Standard error = Standard deviation of the sampling distribution

Back to Probability 0 < Prob < 1 P(A) = 1 – P(AC) [AC = Complement of A] If events A and B are independent, (event B has no effect on the probability of event A) Then: P (A, B) = P(A) · P(B) If they are not independent, Then: P (A, B) = P(A|B) · P(B) P (A, B) = JOINT PROBABILITY of A and B P (A|B) = CONDITIONAL PROBABILITY of A given B

Example We are given 2 urns, each containing a collection of colored balls. Urn 1 contains 2 white and 3 blue balls; Urn 2 contains 3 white and 4 blue balls. A ball is drawn at random from urn 1 and put into urn 2, and then a ball is picked at random from urn 2 and examined. What is the probability that the ball is blue?

Example We are given 2 urns, each containing a collection of colored balls. Urn 1 contains 2 white and 3 blue balls; Urn 2 contains 3 white and 4 blue balls. A ball is drawn at random from urn 1 and put into urn 2, and then a ball is picked at random from urn 2 and examined. What is the probability that the ball is blue? Urn 1 Urn 2 3 5 2 4 23 x + x = = 0.575 5 8 5 8 40 Scenario 1: The ball picked from Urn 1 is blue Scenario 2: The ball picked from Urn 1 is white

Bayes Theorem P (B|A)· P(A) P (A|B) = P (B) How? P (B|A)· P(A) P (A, B) = P(A|B) · P(B) P (A, B) = P(B, A) so P(A|B) = P (A, B) / P(B) P (B, A) = P(B|A)· P(A) or P(A|B) = P(B|A)· P(A) / P(B) Also, This is equivalent to: P (B|A)· P(A) P (A|B) = P (B|A)· P(A) + P (B|AC)· P(AC)

Contingency Table Courtesy: Rich Tsui, PhD

Contingency Table You have developed a test to detect a certain disease What is the True Positive Rate (TPR) and True Negative Rate (TNR) of this test? Sensitivity = TPR = TP / (TP + FN) = P(Test+ | Disease+) Specificity = TNR = TN / (TN + FP) = P(Test- | Disease-) What is the Positive Predictive Value (PPV) and Negative Predictive Value (NPV)? PPV = TP / (TP + FP) = P(Disease+ | Test+) NPV = TN / (TN + FN) = P(Disease- | Test-)

Sensitivity (TPR) The probability of sick people who are correctly identified as having the condition Specificity (TNR) The probability of healthy people who are correctly identified as not having the condition Positive predictive value (PPV) Given that you test positive, the probability that you actually have the condition. Negative predictive value (NPV) Given that you test negative, the probability that you actually do not have the condition.

The Prevalence of a particular disease is 1/10. A test for this disease provides a correct diagnosis in 90% of cases (i.e. if you have the disease, 90% of the time you will test positive, and if you do not have the disease, 90% of the time you will test negative). Given that you test positive for the disease, what is the probability that you actually have the disease?

The Prevalence of a particular disease is 1/10. A test for this disease provides a correct diagnosis in 90% of cases (i.e. if you have the disease, 90% of the time you will test positive, and if you do not have the disease, 90% of the time you will test negative). Given that you test positive for the disease, what is the probability that you actually have the disease? Prevalence = Prior probability in population T+  Test positive T-  Test negative D+  Disease present D-  Disease absent Solution: P (D+) = 0.1 P (T+|D+) = 0.9 P (T-|D-) = 0.9, therefore P(T+|D-) = 1 – 0.9 = 0.1 P (T+|D+)· P(D+) (0.1)· (0.9) P (D+|T+) = = P (T+|D+)· P(D+) + P (T+|D-)· P(D-) (0.1)· (0.9) + (0.9)· (0.1) = 0.5