Data analysis and uncertainty

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Introduction Simple Random Sampling Stratified Random Sampling
A Tutorial on Learning with Bayesian Networks
Brief introduction on Logistic Regression
Mean, Proportion, CLT Bootstrap
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Statistical Estimation and Sampling Distributions
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
Outline input analysis input analyzer of ARENA parameter estimation
Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.
Chapter 4 Probability and Probability Distributions
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Chapter 7 Sampling and Sampling Distributions
Evaluating Hypotheses
Machine Learning CMPT 726 Simon Fraser University
Statistical Background
Thanks to Nir Friedman, HU
Maximum likelihood (ML)
Formalizing the Concepts: Simple Random Sampling.
Recitation 1 Probability Review
Chapter Two Probability Distributions: Discrete Variables
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 4 and 5 Probability and Discrete Random Variables.
Statistical Decision Theory
Model Inference and Averaging
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Chapter 4 Statistics. 4.1 – What is Statistics? Definition Data are observed values of random variables. The field of statistics is a collection.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Theory of Probability Statistics for Business and Economics.
Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Confidence Interval & Unbiased Estimator Review and Foreword.
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
CHAPTER 9 Inference: Estimation The essential nature of inferential statistics, as verses descriptive statistics is one of knowledge. In descriptive statistics,
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
Analysis of Experimental Data; Introduction
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Chapter 6 Sampling and Sampling Distributions
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Appendix A: Probability Theory
Ch3: Model Building through Regression
Review of Probability and Estimators Arun Das, Jason Rebello
Introduction to Instrumentation Engineering
Statistical NLP: Lecture 4
Simple Linear Regression
LECTURE 09: BAYESIAN LEARNING
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Data analysis and uncertainty

Outline Random Variables Estimate Sampling

Introduction Reasons for Uncertainty Prediction Sample Making a prediction about tomorrow based on data we have today Sample Data maybe a sample from the population, and we don’t know the difference between our data and other sample(or population) Missing value or unknown value We need to guess these value Example : Censored Data

Introduction Dealing with Uncertainty Probability Fuzzy Probability Theory v.s. Probability Calculus Probability Theory Mapping from real world to the mathematical representation Probability Calculus Based on well-defined and generally accepted axioms The aim is to explore the consequences of those axioms

Introduction Frequentist (Probability is objective) The probability of an event is defined as the limiting proportion of times that the event would occur in identical situations Example The proportion of times a head comes up in tossing a same coin repeatedly Assess the probability that a customer in a supermarket will buy a certain item(Use similarly customer)

Introduction Bayesian(Subjective probability) Explicit characterization of all uncertainty including any parameters estimated from the data Probability is an individual degree of belief that a given event will occur Frequentist v.s. Bayesian Toss a coin 10 times, get 7 head In Frequentist, probability is P(A) = 7/10 In Bayesian, I guess a probability P(A) = 0.5, then use this prior idea and the data to estimate probability

Random variable Mapping from property of objects to a variable that can take a set of possible values via a process that appears to the observer to have an element of unpredictability Example Coin toss (domain is the set [heads , tails]) No of times a coin has to be tossed to get a head Domain is integers Student’s score Domain is a set of integers between 0~100

Properties of single random variable X is random variable and x is its value Domain is finite: probability mass function p(x) Domain is real line: probability density function f(x) Expectation of X

Multivariate random variable Set of several random variables For p-dimensional vector x={x1,..,xp} The joint mass function

The joint mass function For example Rolling two fair dice, X represent first dice’s result and Y represent another Then p(x=3, y=3) = 1/6 * 1/6 = 1/36

The joint mass function

Marginal probability mass function The marginal probability mass function of X and Y are

Continuous Marginal probability density function of X and Y are

Conditional probability Density of a single variable (or a subset of complete set of variables) given (or “conditioned on”) particular values of other variables Conditional density of X given some value of Y is denoted f(x|y) and defined as

Conditional probability For example If a student’s score is given at random Sample space is S = {0,1,…,100} What’s the probability that the student is fail? Given that student’s score is even(including 0), then what’s the probability that the student is fail?

Supermarket data

Conditional independence Generic problem in data mining is finding relationships between variables Is purchasing item A likely to be related to purchasing item B? Variables are independent if there is no relationship; otherwise they are dependent Independent if p(x,y)=p(x)p(y)

Conditional Independence: More than 2 variables X is conditional independence of Y Given Z if for all values of X, Y, Z we have

Conditional Independence: More than 2 variables Example P(F)=60/101 P(E∩F)=30/51 Now E and F are dependence If student’s score !=100, then P(F|B)=60/100 P(E|B)=1/2 P(E∩F|B)=30/100=60/100*1/2 Given B condition,E and F are independence

Conditional Independence: More than 2 variables Example If student’s score == 100,then P(F|C)=0 P(E|C)=1 P(E ∩ F|C)=0=1*0 Given C condition,E and F are independence Now we can calculate P(E ∩ F)

Conditional Independence Conditional independence don’t imply marginal independence Note that X and Y may be unconditionally independence but conditionally dependent given Z

On assuming independence Independence is a strong assumption frequently violated in practice But provides modeling Fewer parameters Understandable models

Dependence and Correlation Covariance measures how X and Y vary together Large positive if large X is associated with large Y,and small X with small Y Negative if large X is associated with small Y Two variables may be dependent but no linearly correlated

Correlation and Causation Two variables may be highly correlated without a causal relationship between the two Yellow stained finger and lung cancer may be correlated but causally linked only by a third variable : smoking Human reaction time and earned income are negatively correlated Does not mean one causes the other A third variable “age” is causally related to both

Samples and Statistical inference Samples can be used to model the data If goal is to detect the small deviations form the data,the size of samples will effect the result

Dual Role of Probability and Statistics in Data Analysis

Outline Random Variable Estimate Sampling Maximum Likelihood Estimation Bayesian Estimation Sampling

Estimation In inference we want to make statements about entire population from which sample is drawn The two important methods for estimating parameters of a model Maximum Likelihood Estimation Bayesian Estimation

Desirable properties of estimators Let be an estimate of parameter Two measures of estimator quality Expected value of estimate (Bias) Difference between expected and true value Variance of Estimate

Mean squared error The mean of the squared difference between the value of the estimator and the true value of parameter Mean squared error can be partitioned as sum of squared bias and variance

Mean squared error

Maximum Likelihood Estimation Most widely used method for parameter estimation Likelihood Function is probability that data D would have arisen for a given value of θ Value of θ for which the data has the highest probability is the MLE

Example of MLE for Binomial Customers either purchase or not purchase milk We want estimate of proportion purchasing Binomial with unknown parameterθ Samples x(1),…,x(1000) where r purchase milk Assuming conditional independence,likelihood function is

Log-likelihood Function We want the highest probability,so change to Log-likelihood function Then Differentiating and setting equal to zero

Example of MLE for Binomial r milk purchases out of n customers θis the probability that milk is purchased by random customer For 3 data set r = 7,n =10 r = 70,n =100 r = 700,n =1000 Uncertainty becomes smaller as n increases

Example of MLE for Binomial

Likelihood under Normal Distribution For 1 variance,Unknown mean Likelihood function

Log-likelihood function To find the MLE set derivative d/dθ to zero

Likelihood under Normal Distribution θis the estimated mean For 2 data set(By random) 20 data points 200 data points

Likelihood under Normal Distribution

Sufficient statistic Quantity s(D) is a sufficient statistic forθ if the likelihood l(θ) only depends on the data through s(D) no other statistic which can be calculated from the same sample provides any additional information as to the value of the parameter

Interval estimate Point estimate doesn’t convey uncertainty associated with it Interval estimate provide a confidence interval

Likelihood under Normal Distribution

Mean

Variance

Outline Random Variable Estimate Sampling Maximum Likelihood Estimation Bayesian Estimation Sampling

Bayesian approach Frequestist approach Bayesian approach The parameters of population are fixed but unknown Data is a random sample Intrinsic variability lies in data Bayesian approach Data are known Parameters θ are random variables θhas a distribution of values reflects degree of belief on where true parameters θ may be

Bayesian estimation Modification done by Bayesian rule Leads to a distribution rather than single value Single value can be obtained by mean or mode

Bayesian estimation P(D) is a constant independent of θ For a given data set D and a particular model(model = distribution for prior and likelihood) If we have a weak belief about parameter before collecting data, choose a wide prior(normal with large variance)

Binomial example Single binary variable X : wish to estimate Prior for parameter in [0, 1] is the Beta distribution

Binomial example Likelihood function Combining likelihood and prior We get another Beta distribution With parameters and

Beta(5,5) and Beta(145,145)

Beta(5,5)

Beta(45,50)

Advantages of Bayesian approach Retain full knowledge of all problem uncertainty Calculating full posterior distribution onθ Natural updating of distribution

Predictive distribution In equation to modify prior to posterior Denominator is called predictive distribution of D Useful for model checking If observed data have only small probability then it is unlikely to be correct

Normal distribution example Suppose x comes from a normal distribution With unknown mean θand known variance α Prior distribution for θis

Normal distribution example

Jeffrey’s prior A reference prior Fisher information Jeffrey’s prior

Conjugate priors p(θ) is a conjugate prior for p(x| θ) if the posterior distribution p(θ|x) is in the same family as the prior p(θ) Beta to Beta Normal distribution to Normal distribution

Outline Random Variable Estimate Sampling Maximum Likelihood Estimation Bayesian Estimation Sampling

Sampling in Data Mining The data set is only fit statistical analysis “Experimental design” in statistics is concerned with optimal ways of collecting data Data miners can’t control the data collection process The data may be ideally suited to the purposes for which it was collected, but not adequate for its data mining uses

Sampling in Data Mining Two ways in which sample arise Database is sample of population Database contains every cases, but the analysis is based on the sample Not appropriate when we want to find unusual records

Why sampling Draw a sample from the database that allows us to construct a model reflects the structure of the data in the database Efficiency, quicker, easier The sample must representative of the entire database

Systematic sampling Try to ensure representativeness Taking one out of every two records Can lead to problems when there are regularities in database Data set where records are of married couples

Random Sampling Avoiding regularities Epsem Sampling Each record has same probability of being chosen

Variance of Mean of Random Sample If variance of population of size N is , the variance of mean of a simple random sample of size n without replacement is Usually N >> n, so the second term is small, and variance decreases as sample size increases

Example 2000 points, population mean = 0.0108 Random sample n = 10, 100, 1000, repeat 200 times

Example

Stratified Random Sampling Split population into non-overlapping subpopulations or strata Advantages Enable making statements about each of the subpopulations separately For example, one of the credit card companies we work with categorizes transactions into 26 categories : supermarket, gas station, and so on

Mean of Stratified Sample The total size of population is N stratum has elements in it are chosen for the sample from this stratum Sample mean within stratum is Estimate of population mean

Cluster Sampling Every cluster contains many elements Simple random sample on elements is not appropriate Select cluster, not element