Ch3: Model Building through Regression

Slides:



Advertisements
Similar presentations
Bayesian Learning & Estimation Theory
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Point Estimation Notes of STAT 6205 by Dr. Fan.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Pattern Recognition and Machine Learning
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
SOLVED EXAMPLES.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Part 2b Parameter Estimation CSE717, FALL 2008 CUBS, Univ at Buffalo.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
PATTERN RECOGNITION AND MACHINE LEARNING
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
MTH 161: Introduction To Statistics
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
INTRODUCTION TO Machine Learning 3rd Edition
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Confidence Interval & Unbiased Estimator Review and Foreword.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Estimation of Random Variables Two types of estimation: 1) Estimating parameters/statistics of a random variable (or several) from data. 2)Estimating the.
Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.
Machine Learning 5. Parametric Methods.
CLASSICAL NORMAL LINEAR REGRESSION MODEL (CNLRM )
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Univariate Gaussian Case (Cont.)
EE 551/451, Fall, 2006 Communication Systems Zhu Han Department of Electrical and Computer Engineering Class 15 Oct. 10 th, 2006.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Computacion Inteligente Least-Square Methods for System Identification.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
CS479/679 Pattern Recognition Dr. George Bebis
12. Principles of Parameter Estimation
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Probability Theory and Parameter Estimation I
ICS 280 Learning in Graphical Models
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Parameter Estimation 主講人:虞台文.
CH 5: Multivariate Methods
Special Topics In Scientific Computing
Probabilistic Models for Linear Regression
مدلسازي تجربي – تخمين پارامتر
Biointelligence Laboratory, Seoul National University
Simple Linear Regression
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
12. Principles of Parameter Estimation
Presentation transcript:

Ch3: Model Building through Regression 3.1 Introduction Regression: is to model or to determine through statistical data analysis the explicit relationship between a set of random variables, mathematically, where X : dependent variable (response) independent variables (regressors) Regression model: : sample values of expectational error for accounting for uncertainty

3.2 Linear Regression Model Linear function f  linear regression model Nonlinear function f  nonlinear regression model 3.2 Linear Regression Model Problem: The unknown stochastic environment is to be probed using a set of examples (x, d). Consider the linear regression model:

3.3 ML and MAP Estimations of w Problem (under stochastic environment): Given the joint statistics of X, D, W , estimate w. Methods of estimation: i) maximum likelihood (ML), ii) maximum a posterior (MAP) 3.3 ML and MAP Estimations of w Refer to (i) x bears no relation to w. (ii) The information of w is contained in d. Focus on the joint probability density function

From the conditional probability (Bayesian eq.) where : observation density of response d due to regressor x, given parameter w and is often reformulated as the likelihood function, i.e.,

: prior density of w before any observation. Let : posterior density of w after observations. : evidence of the information contained in d. Bayesian eq. becomes

Maximum likelihood (ML) estimate of w Maximum a posteriori (MAP) estimate of w or ML ignores the prior How to come up with an approximate Considering a Gaussian environment Let be the training sample.

Assumptions: (1) are iid. (2) is described by a Gaussian of zero mean and common variance , i.e., of w are iid. Each element is governed by a Gaussian density function of zero mean and common variance

The likelihood function measures the similarity between and in turn their difference From Assumption (2),

From Assumption (1), the overall likelihood function

From Assumption (3), where

Substitute Eqs. (B) and (C) into Eq. (A), Substitute into

Let Obtain where

If is large, the prior distribution of each element of w is close to be uniform and is close to zero, the MAP estimate reduces to the ML estimate The ML estimator is unbiased, i.e., while the MAP estimator is biased.

3.4 Relationship between Regularized LS and MAP Estimations of w Least Squares (LS) Estimation Define the cost function as Solve for w by minimizing Obtain which is the same solution as the ML one. Modify the cost function as structural regularizer

Solve for w by minimizing Obtain which is the same solution as the MAP one.