Probability and Statistics for Data Mining COMP5318.

Slides:



Advertisements
Similar presentations
1 Chapter 3 Probability 3.1 Terminology 3.2 Assign Probability 3.3 Compound Events 3.4 Conditional Probability 3.5 Rules of Computing Probabilities 3.6.
Advertisements

Presentation 5. Probability.
COUNTING AND PROBABILITY
From Randomness to Probability
MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 13.
SI485i : NLP Day 2 Probability Review. Introduction to Probability Experiment (trial) Repeatable procedure with well-defined possible outcomes Outcome.
Questions, comments, concerns? Ok to move on? Vocab  Trial- number of times an experiment is repeated  Outcomes- different results possible  Frequency-
Probability Sample Space Diagrams.
Probability Dr. Deshi Ye Outline  Introduction  Sample space and events  Probability  Elementary Theorem.
PROBABILITY IDEAS Random Experiment – know all possible outcomes, BUT
P robability Sample Space 郭俊利 2009/02/27. Probability 2 Outline Sample space Probability axioms Conditional probability Independence 1.1 ~ 1.5.
Engineering Probability and Statistics - SE-205 -Chap 2 By S. O. Duffuaa.
1 Basic Probability Statistics 515 Lecture Importance of Probability Modeling randomness and measuring uncertainty Describing the distributions.
Probability Theory Random Variables and Distributions Rob Nicholls MRC LMB Statistics Course 2014.
15-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 15 Elementary.
Fundamentals of Probability
Special Topics. Definitions Random (not haphazard): A phenomenon or trial is said to be random if individual outcomes are uncertain but the long-term.
Probability.
The Erik Jonsson School of Engineering and Computer Science Chapter 1 pp William J. Pervin The University of Texas at Dallas Richardson, Texas
PROBABILITY AND BAYES THEOREM 1. 2 POPULATION SAMPLE PROBABILITY STATISTICAL INFERENCE.
Lecture Slides Elementary Statistics Twelfth Edition
Probability. An experiment is any process that allows researchers to obtain observations and which leads to a single outcome which cannot be predicted.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
1 Probability. 2 Today’s plan Probability Notations Laws of probability.
Chapter 1 Probability Spaces 主講人 : 虞台文. Content Sample Spaces and Events Event Operations Probability Spaces Conditional Probabilities Independence of.
CPSC 531: Probability Review1 CPSC 531:Probability & Statistics: Review Instructor: Anirban Mahanti Office: ICT Class.
Chapter 4 Probability ©. Sample Space sample space.S The possible outcomes of a random experiment are called the basic outcomes, and the set of all basic.
Computing Fundamentals 2 Lecture 6 Probability Lecturer: Patrick Browne
1 CHAPTERS 14 AND 15 (Intro Stats – 3 edition) PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY.
Dr. Ahmed Abdelwahab Introduction for EE420. Probability Theory Probability theory is rooted in phenomena that can be modeled by an experiment with an.
Slide 1 © 2002 McGraw-Hill Australia, PPTs t/a Introductory Mathematics & Statistics for Business 4e by John S. Croucher 1 n Learning Objectives –Understand.
Class 2 Probability Theory Discrete Random Variables Expectations.
1 CHAPTER 7 PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY.
Probability Basic Concepts Start with the Monty Hall puzzle
Probability You’ll probably like it!. Probability Definitions Probability assignment Complement, union, intersection of events Conditional probability.
Introduction to Probability 1. What is the “chance” that sales will decrease if the price of the product is increase? 2. How likely that the Thai GDP will.
Sixth lecture Concepts of Probabilities. Random Experiment Can be repeated (theoretically) an infinite number of times Has a well-defined set of possible.
Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Review of Statistics I: Probability and Probability Distributions.
AP Statistics Notes Chapter 14 and 15.
STT 315 This lecture note is based on Chapter 3
+ Chapter 5 Overview 5.1 Introducing Probability 5.2 Combining Events 5.3 Conditional Probability 5.4 Counting Methods 1.
Stat 1510: General Rules of Probability. Agenda 2  Independence and the Multiplication Rule  The General Addition Rule  Conditional Probability  The.
§2 Frequency and probability 2.1The definitions and properties of frequency and properties.
Chapter 4 Probability Concepts Events and Probability Three Helpful Concepts in Understanding Probability: Experiment Sample Space Event Experiment.
Probability. Randomness When we produce data by randomized procedures, the laws of probability answer the question, “What would happen if we did this.
PROBABILITY AND BAYES THEOREM 1. 2 POPULATION SAMPLE PROBABILITY STATISTICAL INFERENCE.
PROBABILITY 1. Basic Terminology 2 Probability 3  Probability is the numerical measure of the likelihood that an event will occur  The probability.
Probability and Probability Distributions. Probability Concepts Probability: –We now assume the population parameters are known and calculate the chances.
Concepts of Probability Introduction to Probability & Statistics Concepts of Probability.
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 2: Probability CIS Computational Probability and Statistics.
Chapter 3 Probability.
Chapter 4 Probability Concepts
Chapter 6 6.1/6.2 Probability Probability is the branch of mathematics that describes the pattern of chance outcomes.
PROBABILITY AND PROBABILITY RULES
Natural Language Processing
PROBABILITY.
What is Probability? Quantification of uncertainty.
Probability and Sample space
MAT 446 Supplementary Note for Ch 2
Basic Probability aft A RAJASEKHAR YADAV.
Natural Language Processing
Introduction to Probability
Lecture 11 Sections 5.1 – 5.2 Objectives: Probability
CSCI 5832 Natural Language Processing
Probability Probability underlies statistical inference - the drawing of conclusions from a sample of data. If samples are drawn at random, their characteristics.
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 2: Probability CIS Computational Probability and.
How to Interpret Probability Mathematically, the probability that an event will occur is expressed as a number between 0 and 1. Notationally, the.
Chapter 1 Probability Spaces
Presentation transcript:

Probability and Statistics for Data Mining COMP5318

Question 1 Question: Suppose you randomly select a credit card holder and the person has defaulted on their credit card. What is the probability that the person selected is a ‘Female’? Gender% of credit card holders % of gender who default Male6055 Female4035

Probability Probability is the mathematical language to understand uncertainty. We need to make decisions in the presence of uncertainty which is ever present. Example: The Earth is warming- a phenomenon that is known as Global Warming (GW). Is modern human activity the cause of GW. –Physics driven approach –Data driven approach

Experiments and Observation When an experiment is carried out we observe the outcome – which is often uncertain. –If not uncertain then why carry out the experiment? We look into a random shopping basket. Does it contain a a packet of “Tofu”? We toss a coin, does it land on “Heads”? We ask a question: “Is it raining in Broom, WA, right now”?

Building Blocks of Probability The space of all possible outcomes is called the sample space. –Non-trivial to decide. Single Coin Toss. The space is {H,T}. Shopping Basket. The space of all possible combinations of all items sold in the store. Shopping Basket: {Tofu, Not-Tofu}.

Events Events are subsets of the sample space. Events are often defined in familiar terms. In the shopping basket scenario –A vegetarian shopping basket is an event. –all possible vegetarian item combinations. Throw of a dice. The event we are looking for could be: Even Number = {2,4,6}, where the sample space = {1,2,3,4,5,6}

Events Let G be the set of all galaxies. Characterize each galaxy by three number –d: distance from earth –a: major axis –b: minor axis Elliptic Galaxies (EG) –EG ={(a,b,d) | a/b > 1.5} Distant Spiral Galaxies (DSG) –DSG ={(a,b,d) | a/b 10}

Events Let G be the set of all genes. Each gene can be “on” or “off”. Let E correspond to the event: all genes which are “on” when the skin cells are “starved”.

Events are Sets At the most basic level events are sets. Therefore we can carry out set union, difference and intersection on events. For example: –E1: shopping baskets which contain Tofu –E2: shopping baskets which contain Milk –E1 U E2: shopping baskets which contain either Tofu or Milk

Probability Let S be the space of all possible elementary outcomes. Let  = Power(S) be the power set of S. Then the probability P is function: P :   [0,1] that satisfy the following properties (axioms): 

Interpretation of Probability Physical or Ontological: Long term frequency –50% chance that a coin will land on heads. –20% of all Woolworth shopping baskets are vegetarian. –22% of all Woolworth shopping baskets in Northbridge plaza are vegetarian. Epistemological : Degree of Belief –20% chance that my neighbours are watering their lawn on “dry” days. –99% chance that the green immovable object outside my house is a Tree. –90% chance that Australia will win the cricket world cup.

Consequences of Axioms

Example Two coin tosses. Let H1 be the event that a heads occurs on toss 1 and H2 a heads on toss 2. All events are equally likely. Sample space = {HH, HT, TH, TT} –H1 = {HH, HT} –H2 = {HH,TH} –P(H1 U H2) = ½ + ½ - ¼ = 3/4

Example Two events A and B are independent if –P(A ∩ B) = P(A)P(B) P(A∩B) is also written as P(AB) and P(A,B). If A and B are disjoint event then A and B such that P(A) > 0 and P(B) > 0 then A and B cannot be independent –P(A ∩ B) = 0. Yet P(A)P(B) > 0 Except for this case you cannot determine independence by looking at a Venn diagram

Question A shopping basket can either be kosher or not. The probability that it will be kosher is 3/4. Examine 10 baskets at a check out counter. What is the probability that there will be at least one kosher basket.

Answer Let E be the event “At least one kosher basket.” Let NK i be the event that the i-th basket is non-kosher. Independence

Example For an Online Book Seller (OBS) the conversion rate is 1/100, i.e., every 100 th visitors ends up making a purchase. What is the probability that at least one purchase will be made in 10 consecutive visits (by distinct customers).

Example Two people take turns to sink a basketball. P1 succeeds with probability 1/3 and P2 with ¼. What is the probability that P1 succeeds before P2. Requires clever setting up of the events. –Let E be the event that P1 succeeds before P2. –Let A i be the event that P1 succeeds before P2 on the ith trial. –A i ∩A j = Ø and E = [ i=1 1 A i

Conditional Probability Very Important Concept P(A|B) is “fraction of occurrences of B in which A also occurs” –P(A|B) = P(A ∩ B)/P(B); P(B) > 0 For a fixed B, P(.|B) is a probability –Therefore if A1 and A2 are disjoint then –P(A1 U A2 |B) = P(A1|B) + P(A2|B) Note, P(A|B U C) =/= P(A|B) + P(A|C) Also P(A|B) =/= P(B|A)

Standard Example DDcDc Suppose a test is positive. What is the probability of disease? D is disease +/-; Test positive or negative

Standard Data Mining Example Suppose the data above closely resembles the behaviour of the population at large. What is the chance that those who buy a Diaper will also buy Beer. = P(Diaper ∩ Beer)/P(Diaper) = 0.6/0.8 = 0.75 Is Diaper an Event?

Conditional Independence If A and B are independent then P(A|B)=P(A) P(AB) = P(A|B)P(B) Law of Total Probability.

Bayes Theorem

Question 1 Question: Suppose you randomly select a credit card holder and the person has defaulted on their credit card. What is the probability that the person selected is a ‘Female’? Gender% of credit card holders % of gender who default Male6055 Female4035

Answer to Question 1 But what does G=F and D=Y mean? We have not even formally defined them.