More Parameter Learning, Multinomial and Continuous Variables

Slides:



Advertisements
Similar presentations
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Sampling: Final and Initial Sample Size Determination
Pattern Recognition and Machine Learning
Probability Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Confidence intervals. Population mean Assumption: sample from normal distribution.
Visual Recognition Tutorial
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Data Basics. Data Matrix Many datasets can be represented as a data matrix. Rows corresponding to entities Columns represents attributes. N: size of the.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
CS Pattern Recognition Review of Prerequisites in Math and Statistics Prepared by Li Yang Based on Appendix chapters of Pattern Recognition, 4.
Visual Recognition Tutorial
Computer vision: models, learning and inference Chapter 3 Common probability distributions.
Correlations and Copulas Chapter 10 Risk Management and Financial Institutions 2e, Chapter 10, Copyright © John C. Hull
1 Confidence Intervals for Means. 2 When the sample size n< 30 case1-1. the underlying distribution is normal with known variance case1-2. the underlying.
5-1 Two Discrete Random Variables Example Two Discrete Random Variables Figure 5-1 Joint probability distribution of X and Y in Example 5-1.
5-1 Two Discrete Random Variables Example Two Discrete Random Variables Figure 5-1 Joint probability distribution of X and Y in Example 5-1.
Standard error of estimate & Confidence interval.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Maximum Likelihood Estimation
Chapter Two Probability Distributions: Discrete Variables
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Confidence Intervals (Chapter 8) Confidence Intervals for numerical data: –Standard deviation known –Standard deviation unknown Confidence Intervals for.
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Probability and Measure September 2, Nonparametric Bayesian Fundamental Problem: Estimating Distribution from a collection of Data E. ( X a distribution-valued.
Week 41 Estimation – Posterior mean An alternative estimate to the posterior mode is the posterior mean. It is given by E(θ | s), whenever it exists. This.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
Unit 4 Review. Starter Write the characteristics of the binomial setting. What is the difference between the binomial setting and the geometric setting?
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Random Variables By: 1.
Biostatistics Class 3 Probability Distributions 2/15/2000.
Canadian Bioinformatics Workshops
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Lecture 2. Bayesian Decision Theory
Probability Theory and Parameter Estimation I
LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)
Parameter Estimation 主講人:虞台文.
Graduate School of Information Sciences, Tohoku University
Computer vision: models, learning and inference
Lecture 09: Gaussian Processes
Special Topics In Scientific Computing
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Correlations and Copulas
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Statistical Analysis Professor Lynne Stokes
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Lecture 10: Gaussian Processes
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
Chapter 8 Estimation.
Each Distribution for Random Variables Has:
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Applied Statistics and Probability for Engineers
Fundamental Sampling Distributions and Data Descriptions
Presentation transcript:

More Parameter Learning, Multinomial and Continuous Variables Baran Barut CSE 970 – PATTERN RECOGNITION

OUTLINE Multinomial Variables - Learning a Relative Frequency - Probability Intervals and Regions - Learning Parameters in a Bayesian Network - Missing Data Items - Variances in Computed Relative Frequencies Continuous Variables - Normally Distributed Variable - Multivariate Normally Distributed Variable - Gaussian Bayesian Networks

Dirichlet: where Modeling our beliefs concerning relative frequencies!

Introductory formulas: If we knew that the relative frequency of k’th outcome is fk :

The probability of data set: - D is a multinomial sample of size M governed by F - sk is the number of outcomes in d equal k

How to update the distribution function using a data set: Updated probabilities of outcomes:

How confident are we about the estimate of the relative frequency fk?

A multinomial Bayesian network has Xi ’s with space i>2

Global independence of Fi’s Local independence of Fij’s

Equivalent sample size, N If G,F and N are specified, then or

Normal Distribution

Unknown Mean and Known Variance

Sample of size M 1. Each outcome has real numbers as range 2. F = {A,r} and D is called a normal sample of size M with parameter {A,r}

posterior density of A

assumptions about a hypothetical sample r = 1 case v = 0 case (no prior belief)

Probality of the next outcome remember, initially: and

Gamma Distribution X1, X2,..., Xk are k-independent random variables with N(x;0,σ2) and V= X 21+X 22+...+X 2k , then: V has distribution gamma(v,k/2,1/2 σ2)

Known Mean and Unknown Variance

Sample of size M 1. Each outcome has real numbers as range 2. F = {a,R} and D is called a normal sample of size M with parameter {a,R}

posterior density of r

t-distribution

Unknown Mean and Unknown Variance

How to update? and

meaning of the parameters v, μ μ is the mean of the hypothetical sample concerning value of A v is the size of the hypothetical sample concerning value of A

meaning of the parameters β β is s of the hypothetical sample

bivariate normal distribution

Vector Notation

Positive definite – positive semi definite Symmetric n by n matrix A is positive definite if Symmetric n by n matrix A is positive semidefinite if

Invertible If symmetric n by n matrix A is positive definite, then it is nonsingular. If Symmetric n by n matrix A is positive semidefinite but not positive definite, then it is singular.

Wishart-distribution

Multivariate t-distribution

Unknown Mean and Unknown Variance

How to update? and

meaning of the parameters v, μ, β μ is the mean of the hypothetical sample concerning value of A v is the size of the hypothetical sample concerning value of A β is s of the hypothetical sample

each node is a linear function of the values of the nodes that precede it in the ordering

How to find the precision matrix

Complete Gaussian Bayesian Network

Covariance Matrix What if b’s are 0?

How to update? and

Approximations! Gaussian Bayesian Networks stand for N(x; μ, T-1) whereas x was given by t(x;α, μ,T) We don’t assign distributions to Fi’s We asses distributions for the random variables A, R