Download presentation
1
Data analysis and uncertainty
2
Outline Random Variables Estimate Sampling
3
Introduction Reasons for Uncertainty Prediction Sample
Making a prediction about tomorrow based on data we have today Sample Data maybe a sample from the population, and we don’t know the difference between our data and other sample(or population) Missing value or unknown value We need to guess these value Example : Censored Data
4
Introduction Dealing with Uncertainty
Probability Fuzzy Probability Theory v.s. Probability Calculus Probability Theory Mapping from real world to the mathematical representation Probability Calculus Based on well-defined and generally accepted axioms The aim is to explore the consequences of those axioms
5
Introduction Frequentist (Probability is objective)
The probability of an event is defined as the limiting proportion of times that the event would occur in identical situations Example The proportion of times a head comes up in tossing a same coin repeatedly Assess the probability that a customer in a supermarket will buy a certain item(Use similarly customer)
6
Introduction Bayesian(Subjective probability)
Explicit characterization of all uncertainty including any parameters estimated from the data Probability is an individual degree of belief that a given event will occur Frequentist v.s. Bayesian Toss a coin 10 times, get 7 head In Frequentist, probability is P(A) = 7/10 In Bayesian, I guess a probability P(A) = 0.5, then use this prior idea and the data to estimate probability
7
Random variable Mapping from property of objects to a variable that can take a set of possible values via a process that appears to the observer to have an element of unpredictability Example Coin toss (domain is the set [heads , tails]) No of times a coin has to be tossed to get a head Domain is integers Student’s score Domain is a set of integers between 0~100
8
Properties of single random variable
X is random variable and x is its value Domain is finite: probability mass function p(x) Domain is real line: probability density function f(x) Expectation of X
9
Multivariate random variable
Set of several random variables For p-dimensional vector x={x1,..,xp} The joint mass function
10
The joint mass function
For example Rolling two fair dice, X represent first dice’s result and Y represent another Then p(x=3, y=3) = 1/6 * 1/6 = 1/36
11
The joint mass function
12
Marginal probability mass function
The marginal probability mass function of X and Y are
13
Continuous Marginal probability density function of X and Y are
14
Conditional probability
Density of a single variable (or a subset of complete set of variables) given (or “conditioned on”) particular values of other variables Conditional density of X given some value of Y is denoted f(x|y) and defined as
15
Conditional probability
For example If a student’s score is given at random Sample space is S = {0,1,…,100} What’s the probability that the student is fail? Given that student’s score is even(including 0), then what’s the probability that the student is fail?
16
Supermarket data
17
Conditional independence
Generic problem in data mining is finding relationships between variables Is purchasing item A likely to be related to purchasing item B? Variables are independent if there is no relationship; otherwise they are dependent Independent if p(x,y)=p(x)p(y)
18
Conditional Independence: More than 2 variables
X is conditional independence of Y Given Z if for all values of X, Y, Z we have
19
Conditional Independence: More than 2 variables
Example P(F)=60/101 P(E∩F)=30/51 Now E and F are dependence If student’s score !=100, then P(F|B)=60/100 P(E|B)=1/2 P(E∩F|B)=30/100=60/100*1/2 Given B condition,E and F are independence
20
Conditional Independence: More than 2 variables
Example If student’s score == 100,then P(F|C)=0 P(E|C)=1 P(E ∩ F|C)=0=1*0 Given C condition,E and F are independence Now we can calculate P(E ∩ F)
21
Conditional Independence
Conditional independence don’t imply marginal independence Note that X and Y may be unconditionally independence but conditionally dependent given Z
22
On assuming independence
Independence is a strong assumption frequently violated in practice But provides modeling Fewer parameters Understandable models
23
Dependence and Correlation
Covariance measures how X and Y vary together Large positive if large X is associated with large Y,and small X with small Y Negative if large X is associated with small Y Two variables may be dependent but no linearly correlated
24
Correlation and Causation
Two variables may be highly correlated without a causal relationship between the two Yellow stained finger and lung cancer may be correlated but causally linked only by a third variable : smoking Human reaction time and earned income are negatively correlated Does not mean one causes the other A third variable “age” is causally related to both
25
Samples and Statistical inference
Samples can be used to model the data If goal is to detect the small deviations form the data,the size of samples will effect the result
26
Dual Role of Probability and Statistics in Data Analysis
27
Outline Random Variable Estimate Sampling
Maximum Likelihood Estimation Bayesian Estimation Sampling
28
Estimation In inference we want to make statements about entire population from which sample is drawn The two important methods for estimating parameters of a model Maximum Likelihood Estimation Bayesian Estimation
29
Desirable properties of estimators
Let be an estimate of parameter Two measures of estimator quality Expected value of estimate (Bias) Difference between expected and true value Variance of Estimate
30
Mean squared error The mean of the squared difference between the value of the estimator and the true value of parameter Mean squared error can be partitioned as sum of squared bias and variance
31
Mean squared error
32
Maximum Likelihood Estimation
Most widely used method for parameter estimation Likelihood Function is probability that data D would have arisen for a given value of θ Value of θ for which the data has the highest probability is the MLE
33
Example of MLE for Binomial
Customers either purchase or not purchase milk We want estimate of proportion purchasing Binomial with unknown parameterθ Samples x(1),…,x(1000) where r purchase milk Assuming conditional independence,likelihood function is
34
Log-likelihood Function
We want the highest probability,so change to Log-likelihood function Then Differentiating and setting equal to zero
35
Example of MLE for Binomial
r milk purchases out of n customers θis the probability that milk is purchased by random customer For 3 data set r = 7,n =10 r = 70,n =100 r = 700,n =1000 Uncertainty becomes smaller as n increases
36
Example of MLE for Binomial
37
Likelihood under Normal Distribution
For 1 variance,Unknown mean Likelihood function
38
Log-likelihood function
To find the MLE set derivative d/dθ to zero
39
Likelihood under Normal Distribution
θis the estimated mean For 2 data set(By random) 20 data points 200 data points
40
Likelihood under Normal Distribution
41
Sufficient statistic Quantity s(D) is a sufficient statistic forθ if the likelihood l(θ) only depends on the data through s(D) no other statistic which can be calculated from the same sample provides any additional information as to the value of the parameter
42
Interval estimate Point estimate doesn’t convey uncertainty associated with it Interval estimate provide a confidence interval
43
Likelihood under Normal Distribution
44
Mean
45
Variance
46
Outline Random Variable Estimate Sampling
Maximum Likelihood Estimation Bayesian Estimation Sampling
47
Bayesian approach Frequestist approach Bayesian approach
The parameters of population are fixed but unknown Data is a random sample Intrinsic variability lies in data Bayesian approach Data are known Parameters θ are random variables θhas a distribution of values reflects degree of belief on where true parameters θ may be
48
Bayesian estimation Modification done by Bayesian rule
Leads to a distribution rather than single value Single value can be obtained by mean or mode
49
Bayesian estimation P(D) is a constant independent of θ
For a given data set D and a particular model(model = distribution for prior and likelihood) If we have a weak belief about parameter before collecting data, choose a wide prior(normal with large variance)
50
Binomial example Single binary variable X : wish to estimate
Prior for parameter in [0, 1] is the Beta distribution
51
Binomial example Likelihood function Combining likelihood and prior
We get another Beta distribution With parameters and
52
Beta(5,5) and Beta(145,145)
53
Beta(5,5)
54
Beta(45,50)
55
Advantages of Bayesian approach
Retain full knowledge of all problem uncertainty Calculating full posterior distribution onθ Natural updating of distribution
56
Predictive distribution
In equation to modify prior to posterior Denominator is called predictive distribution of D Useful for model checking If observed data have only small probability then it is unlikely to be correct
57
Normal distribution example
Suppose x comes from a normal distribution With unknown mean θand known variance α Prior distribution for θis
58
Normal distribution example
59
Jeffrey’s prior A reference prior Fisher information Jeffrey’s prior
60
Conjugate priors p(θ) is a conjugate prior for p(x| θ) if the posterior distribution p(θ|x) is in the same family as the prior p(θ) Beta to Beta Normal distribution to Normal distribution
61
Outline Random Variable Estimate Sampling
Maximum Likelihood Estimation Bayesian Estimation Sampling
62
Sampling in Data Mining
The data set is only fit statistical analysis “Experimental design” in statistics is concerned with optimal ways of collecting data Data miners can’t control the data collection process The data may be ideally suited to the purposes for which it was collected, but not adequate for its data mining uses
63
Sampling in Data Mining
Two ways in which sample arise Database is sample of population Database contains every cases, but the analysis is based on the sample Not appropriate when we want to find unusual records
64
Why sampling Draw a sample from the database that allows us to construct a model reflects the structure of the data in the database Efficiency, quicker, easier The sample must representative of the entire database
65
Systematic sampling Try to ensure representativeness
Taking one out of every two records Can lead to problems when there are regularities in database Data set where records are of married couples
66
Random Sampling Avoiding regularities Epsem Sampling
Each record has same probability of being chosen
67
Variance of Mean of Random Sample
If variance of population of size N is , the variance of mean of a simple random sample of size n without replacement is Usually N >> n, so the second term is small, and variance decreases as sample size increases
68
Example 2000 points, population mean = 0.0108
Random sample n = 10, 100, 1000, repeat 200 times
69
Example
70
Stratified Random Sampling
Split population into non-overlapping subpopulations or strata Advantages Enable making statements about each of the subpopulations separately For example, one of the credit card companies we work with categorizes transactions into 26 categories : supermarket, gas station, and so on
71
Mean of Stratified Sample
The total size of population is N stratum has elements in it are chosen for the sample from this stratum Sample mean within stratum is Estimate of population mean
72
Cluster Sampling Every cluster contains many elements
Simple random sample on elements is not appropriate Select cluster, not element
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.