Constrained Optimization for Validation-Guided Conditional Random Field Learning Minmin Chen , Yixin Chen , Michael Brent , Aaron Tenney Washington University in St. Louis 19/4/2015KDD 2009 Presented by: Qiang Yang, HKUST
Conditional Random Fields Conditional Random Fields ( Lafferty, McCallum& Pereira 2001 ) Probabilistic model for sequential data segmentation and labeling GAT 12 CG Labeling Sequence Observation Sequence 9/4/2015KDD 2009 FEATURE
Applications Natural Language Processing Lafferty J., McCallum A.& Pereira F Sha F. & Pereira F Sarawagi S. & Cohen W. W Computer Vision Sminchisescu C., Kanaujia A. & Metaxas D Vishwanathan S. V. N. et al Bioinformatics Culotta A., Kulp D. & McCallum A Gross, S. S. et al /4/2015KDD 2009
Challenges and Related Work Overfitting Likelihood training Overfitting Model flexibility Large feature set Overfitting Related Work Regularization (Sha 2003, Vail 2007) Smoothing methods (Newman 1977, Rosenfeld 1996) Regularization with a Gaussian prior works as well as or better than smoothing methods, but the regularization parameter is hard to tune 9/4/2015KDD 20094
Motivation of Proposed Work Cross-validation is often used to estimate the accuracy of a classifier and to select models. Generally, the performance on the validation set is strongly correlated to the performance of the trained model on the unseen data Constraints prescribing the performance of the trained model on the validation set are used to guide the learning process and avoid the model from fitting freely and tightly to the training data 9/4/2015KDD 20095
Single Training Multiple Validation (STMV) Framework STMV Framework Small data set Large data set 9/4/2015KDD Training Testing V1 V2 V3 TrainingTestingV1 V2V3
Constrained Formulation Where Constraints prescribing the difference of scores to ensure the model take consideration of the performance on the validation sets 9/4/2015KDD Original objective : maximize the log likelihood of the labeling sequence given the observation sequence of the training data Score of a specific labeling sequence y for an observation sequence x Score of the most likely sequence found by Viterbi under current model for validation sequence v(j) Score of real labeling sequence for validation sequence v(j)
Extended Saddle Point (ESP) Theory Extended Saddle Point Theory ( Wah & Chen 2005 ) Introduces a necessary and sufficient condition on the Constrained Local Minimum (CLM) of a constrained nonlinear programming problem in a continuous, discrete, or mixed space Offer several salient advantages over previous constraint handling theory Does not require the constraints to be differentiable or in closed form Satisfied over an extended region of penalty values Necessary and sufficient 9/4/2015KDD 20098
Extended Saddle Point (ESP) Search Algorithm Extended Saddle Point Search Algorithm Transform the constrained formulation into a penalty form where Outer loop updates the extended penalty values Inner loop minimizes Challenges Efficient calculation of the gradient of the penalty function The first term of the constraint is determined by the most likely sequence found by the Viterbi Algorithm. Change of the parameter W can result in a very different sequence non-differentiable 9/4/2015KDD 20099
Approximation of the discontinuous Gradient 9/4/2015KDD Highest prob. Of reaching this state at time t f(s1, s2, X)
Experimental Results : Gene Prediction Find out the protein coding regions and associated components Apply our STMV framework and ESP search algorithm on CONTRAST, a state-of-the-art gene predictor Data set: Fruit fly genome, 27,463 genes, evenly divided into 4 sets Feature set includes around 33,000 features Compare performance to original CONTRAST, and CONTRAST with regularization Gene, exon, nucleotide level sensitivity and specificity 9/4/2015KDD
Experimental Results : Gene Prediction(cont) The performance of original CRF, regularized CRF(CRFr) and constrained CRF(CRFc) on CONTRAST 9/4/2015KDD
Experimental Result: Stock Price Prediction Predict if tomorrow’s stock price will raise or fall comparing to today’s, based on historical data; Preprocessing techniques to smooth out the noise in the raw data Date set: 1741 stocks from NASDAQ and NYSE Each contains stock prices from to Feature set includes 2,000 features Predict on day T+1: Training Sequence : day 1 ~ T Validation Sequence: day T-V+1 ~ T (V = 100) 9/4/2015KDD
Experimental Result: Stock Price Prediction(cont) 9/4/2015KDD
Conclusion A Single Training Multiple Validation (STMV) framework Integrate validation into the training process by modeling the validation quality as constraints in the problem formulation Effectively avoid overfitting of CRF models Approximation Scheme Efficiently approximate the discontinuous gradient of our constrained formulation Extended Saddle Point (ESP) search algorithm Robustly find a constrained local minimum for our constrained formulation 9/4/2015KDD
Questions? 9/4/2015KDD