Recap: Conditional Exponential Model

Slides:



Advertisements
Similar presentations
Regularized risk minimization
Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
The Estimation Problem How would we select parameters in the limiting case where we had ALL the data? k → l  l’ k→ l’ Intuitively, the actual frequencies.
Supervised Learning Recap
Maximum Entropy Advanced Statistical Methods in NLP Ling 572 January 31, 2012.
Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University.
Visual Recognition Tutorial
Middle Term Exam 03/04, in class. Project It is a team work No more than 2 people for each team Define a project of your own Otherwise, I will assign.
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Support Vector Machines (and Kernel Methods in general)
Assuming normally distributed data! Naïve Bayes Classifier.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Estimation of parameters. Maximum likelihood What has happened was most likely.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Expectation Maximization Algorithm
Author Identification for LiveJournal Alyssa Liang.
Visual Recognition Tutorial
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
1  The goal is to estimate the error probability of the designed classification system  Error Counting Technique  Let classes  Let data points in class.
A brief maximum entropy tutorial. Overview Statistical modeling addresses the problem of modeling the behavior of a random process In constructing this.
Likelihood probability of observing the data given a model with certain parameters Maximum Likelihood Estimation (MLE) –find the parameter combination.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Isolated-Word Speech Recognition Using Hidden Markov Models
Feature and object tracking algorithms for video tracking Student: Oren Shevach Instructor: Arie nakhmani.

7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Feature Selection & Maximum Entropy Advanced Statistical Methods in NLP Ling 572 January 26,
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Museum and Institute of Zoology PAS Warsaw Magdalena Żytomska Berlin, 6th September 2007.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
1 Standard error Estimated standard error,s,. 2 Example 1 While measuring the thermal conductivity of Armco iron, using a temperature of 100F and a power.
Joseph Xu Soar Workshop Learning Modal Continuous Models.
MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran – Dikkala Sai Nishanth – Ashwin P. Paranjape
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
Week 41 How to find estimators? There are two main methods for finding estimators: 1) Method of moments. 2) The method of Maximum likelihood. Sometimes.
Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.
Intro to NLP - J. Eisner1 A MAXENT viewpoint.
Maximum Entropy … the fact that a certain prob distribution maximizes entropy subject to certain constraints representing our incomplete information, is.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
Bayesian Approach Jake Blanchard Fall Introduction This is a methodology for combining observed data with expert judgment Treats all parameters.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
MAXIMUM ENTROPY, SUPPORT VECTOR MACHINES, CONDITIONAL RANDOM FIELDS, NEURAL NETWORKS Heng Ji 04/12, 04/15, 2016.
MathematicalMarketing Slide 3c.1 Mathematical Tools Chapter 3: Part c – Parameter Estimation We will be discussing  Nonlinear Parameter Estimation  Maximum.
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Ch3: Model Building through Regression
Parameter Estimation 主講人:虞台文.
ELN – Natural Language Processing
LECTURE 23: INFORMATION THEORY REVIEW
Mathematical Foundations of BME
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
The Improved Iterative Scaling Algorithm: A gentle Introduction
Presentation transcript:

Recap: Conditional Exponential Model Predication probability Model parameters: For each class y, we have weights wy and threshold cy Maximum likelihood estimation Translation invariance

Modified Conditional Exponential Model Set w1 to be a zero vector and c1 to be zero Predication probability Model parameter estimation

MaxEnt for Classification Problems Favor uniform distributions Maximizing entropy of distribution Consistent with training data Constraints on the mean of input features

Translation Problem Parameters: p(dans), p(en), p(au), p(a), p(pendant) Represent each French word with two features {dans, en} {dans, a} dans 1 en au-cours-de a pendant Empirical Average 0.3 0.5

Constraints

Maximum Entropy Formulation for the Translation Problem Solution: p(dans) = 0.2, p(a) = 0.3, p(en)=0.1, p(au-cours-de) = 0.2, p(pendant) = 0.2