Training Conditional Random Fields using Virtual Evidence Boosting Lin Liao, Tanzeem Choudhury †, Dieter Fox, and Henry Kautz University of Washington † Intel Research Experiments Approaches to Training Conditional Random Fields (CRFs) Maximum Likelihood Run numerical optimization to find the optimal weights, which requires inference at each iteration Inefficient for complex structures Inadequate for continuous observations and feature selection Maximum Pseudo-Likelihood Convert a CRF into separate patches; each consists of a hidden node and true values of neighbors Run ML learning on separate patches Efficient but may over-estimate inter-dependency Inadequate for continuous observations and feature selection Our Approach: Virtual Evidence Boosting Convert a CRF into separate patches; each consists of a hidden node and virtual evidence of neighbors Run boosting (to select features) and belief propagation (to update virtual evidence) alternately Efficient and unified approach to feature selection and parameter estimation Suitable for both discrete and continuous observations Extension of LogitBoost with Virtual Evidence Algorithms Traditional boosting algorithms assume feature values be deterministic We extend LogitBoost algorithm to handle virtual evidence, i.e., a feature could also be a likelihood value or probability distribution INPUTS: training samples OUTPUT: F (linear combination of features) FOR each iteration FOR each sample Compute likelihood Compute sample weight Compute working response END Obtain best weak learner by solving Add the weak learner to F END Virtual Evidence Boosting for CRFs Boosted Random Fields versus VEB Closest related work to VEB is Boosted Random Fields (Torralba 2004) BRFs combine boosting and belief propagation but assume dense graph structure and weak pair- wise influence We compare the two as the pair-wise influence changes VEB performs significantly better with strong relations Feature Selection VEB can be used to extract sparse structure from complex models. In this experiment it is able to find the exact order in a high-order HMM, and thus outperforms other learning alternatives. Indoor Activities Activities: computer usage, meal, TV, meeting, and sleeping Linear chain CRF with 315 continuous input features 1100 minutes of data over 12 days Physical Activities and Spatial Contexts Context: indoors, outdoors, and vehicles Activities: stationary, walking, running, driving, and going up/down stairs Approximately 650 continuous input features 400 minutes of data over 12 episodes INPUTS: Structure of CRF and training samples OUTPUT: F (linear combination of features) FOR each iteration Run BP using current F to get virtual evidence ve(x i, n(y i )); FOR each sample Compute likelihood Compute sample weight Compute working response END Obtain best weak learner by solving Add the weak learner to F END Training AlgorithmAverage accuracy VEB88.8% MPL + all observations72.1% MPL + boosting70.9% HMM + AdaBoost85.8% Training AlgorithmAverage accuracy VEB94.1% BRF88.0% ML + all observations87.7% ML + boosting88.5% MPL + all observations87.9% MPL + boosting88.5% Goal: To develop efficient feature selection and parameter estimation technique for Conditional Random Fields (CRFs) Application domain: To learn human activity models from continuous, multi-modal sensory inputs Introduction Application: Human Activity Recognition Model human activities and select discriminatory features from multimodal sensor data. Sensors include accelerometer, audio, light, temperature, etc. Context sequence Activity sequence