Constrained Optimization for Validation-Guided Conditional Random Field Learning Minmin Chen , Yixin Chen , Michael Brent , Aaron Tenney Washington University.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Various Regularization Methods in Computer Vision Min-Gyu Park Computer Vision Lab. School of Information and Communications GIST.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
CMPUT 466/551 Principal Source: CMU
Forward-Backward Correlation for Template-Based Tracking Xiao Wang ECE Dept. Clemson University.
Lecture 4: Embedded methods
PROCESS INTEGRATED DESIGN WITHIN A MODEL PREDICTIVE CONTROL FRAMEWORK Mario Francisco, Pastora Vega, Omar Pérez University of Salamanca – Spain University.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Visual Recognition Tutorial
Hidden Markov Models in Bioinformatics Example Domain: Gene Finding Colin Cherry
Derek Hao Hu, Qiang Yang Hong Kong University of Science and Technology.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Incremental Learning of Temporally-Coherent Gaussian Mixture Models Ognjen Arandjelović, Roberto Cipolla Engineering Department, University of Cambridge.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Efficient Methodologies for Reliability Based Design Optimization
Discriminative Training of Kalman Filters P. Abbeel, A. Coates, M
Sample-Separation-Margin Based Minimum Classification Error Training of Pattern Classifiers with Quadratic Discriminant Functions Yongqiang Wang 1,2, Qiang.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Learning Hidden Markov Model Structure for Information Extraction Kristie Seymour, Andrew McCullum, & Ronald Rosenfeld.
Classification and Prediction: Regression Analysis
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Final review LING572 Fei Xia Week 10: 03/11/
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
Graphical models for part of speech tagging
EM and expected complete log-likelihood Mixture of Experts
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.
An Introduction to Support Vector Machines (M. Law)
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.
A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences Duke University Machine Learning Group Presented by Qiuhua Liu March.
Vision-based SLAM Enhanced by Particle Swarm Optimization on the Euclidean Group Vision seminar : Dec Young Ki BAIK Computer Vision Lab.
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
Data Mining and Decision Support
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Learning From Measurements in Exponential Families Percy Liang, Michael I. Jordan and Dan Klein ICML 2009 Presented by Haojun Chen Images in these slides.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Regularized Least-Squares and Convex Optimization.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
Conditional Random Fields for ASR
10701 / Machine Learning.
Statistical Learning Dong Liu Dept. EEIS, USTC.
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
Generally Discriminant Analysis
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Discovering Frequent Poly-Regions in DNA Sequences
Presentation transcript:

Constrained Optimization for Validation-Guided Conditional Random Field Learning Minmin Chen , Yixin Chen , Michael Brent , Aaron Tenney Washington University in St. Louis 19/4/2015KDD 2009 Presented by: Qiang Yang, HKUST

Conditional Random Fields Conditional Random Fields ( Lafferty, McCallum& Pereira 2001 ) Probabilistic model for sequential data segmentation and labeling GAT 12 CG Labeling Sequence Observation Sequence 9/4/2015KDD 2009 FEATURE

Applications Natural Language Processing Lafferty J., McCallum A.& Pereira F Sha F. & Pereira F Sarawagi S. & Cohen W. W Computer Vision Sminchisescu C., Kanaujia A. & Metaxas D Vishwanathan S. V. N. et al Bioinformatics Culotta A., Kulp D. & McCallum A Gross, S. S. et al /4/2015KDD 2009

Challenges and Related Work Overfitting Likelihood training  Overfitting Model flexibility  Large feature set  Overfitting Related Work Regularization (Sha 2003, Vail 2007) Smoothing methods (Newman 1977, Rosenfeld 1996) Regularization with a Gaussian prior works as well as or better than smoothing methods, but the regularization parameter is hard to tune 9/4/2015KDD 20094

Motivation of Proposed Work Cross-validation is often used to estimate the accuracy of a classifier and to select models. Generally, the performance on the validation set is strongly correlated to the performance of the trained model on the unseen data Constraints prescribing the performance of the trained model on the validation set are used to guide the learning process and avoid the model from fitting freely and tightly to the training data 9/4/2015KDD 20095

Single Training Multiple Validation (STMV) Framework STMV Framework Small data set Large data set 9/4/2015KDD Training Testing V1 V2 V3 TrainingTestingV1 V2V3

Constrained Formulation Where Constraints prescribing the difference of scores to ensure the model take consideration of the performance on the validation sets 9/4/2015KDD Original objective : maximize the log likelihood of the labeling sequence given the observation sequence of the training data Score of a specific labeling sequence y for an observation sequence x Score of the most likely sequence found by Viterbi under current model for validation sequence v(j) Score of real labeling sequence for validation sequence v(j)

Extended Saddle Point (ESP) Theory Extended Saddle Point Theory ( Wah & Chen 2005 ) Introduces a necessary and sufficient condition on the Constrained Local Minimum (CLM) of a constrained nonlinear programming problem in a continuous, discrete, or mixed space Offer several salient advantages over previous constraint handling theory Does not require the constraints to be differentiable or in closed form Satisfied over an extended region of penalty values Necessary and sufficient 9/4/2015KDD 20098

Extended Saddle Point (ESP) Search Algorithm Extended Saddle Point Search Algorithm Transform the constrained formulation into a penalty form where Outer loop updates the extended penalty values Inner loop minimizes Challenges Efficient calculation of the gradient of the penalty function The first term of the constraint is determined by the most likely sequence found by the Viterbi Algorithm. Change of the parameter W can result in a very different sequence  non-differentiable 9/4/2015KDD 20099

Approximation of the discontinuous Gradient 9/4/2015KDD Highest prob. Of reaching this state at time t f(s1, s2, X)

Experimental Results : Gene Prediction Find out the protein coding regions and associated components Apply our STMV framework and ESP search algorithm on CONTRAST, a state-of-the-art gene predictor Data set: Fruit fly genome, 27,463 genes, evenly divided into 4 sets Feature set includes around 33,000 features Compare performance to original CONTRAST, and CONTRAST with regularization Gene, exon, nucleotide level sensitivity and specificity 9/4/2015KDD

Experimental Results : Gene Prediction(cont) The performance of original CRF, regularized CRF(CRFr) and constrained CRF(CRFc) on CONTRAST 9/4/2015KDD

Experimental Result: Stock Price Prediction Predict if tomorrow’s stock price will raise or fall comparing to today’s, based on historical data; Preprocessing techniques to smooth out the noise in the raw data Date set: 1741 stocks from NASDAQ and NYSE Each contains stock prices from to Feature set includes 2,000 features Predict on day T+1: Training Sequence : day 1 ~ T Validation Sequence: day T-V+1 ~ T (V = 100) 9/4/2015KDD

Experimental Result: Stock Price Prediction(cont) 9/4/2015KDD

Conclusion A Single Training Multiple Validation (STMV) framework Integrate validation into the training process by modeling the validation quality as constraints in the problem formulation Effectively avoid overfitting of CRF models Approximation Scheme Efficiently approximate the discontinuous gradient of our constrained formulation Extended Saddle Point (ESP) search algorithm Robustly find a constrained local minimum for our constrained formulation 9/4/2015KDD

Questions? 9/4/2015KDD