Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar and Dan Roth 1
Motivation 2
Inference with General Constraint Structure [Roth&Yih’04,07] Recognizing Entities and Relations Dole ’s wife, Elizabeth, is a native of N.C. E 1 E 2 E 3 R 12 R 23 other 0.05 per 0.85 loc 0.10 other 0.05 per 0.50 loc 0.45 other 0.10 per 0.60 loc 0.30 irrelevant 0.10 spouse_of 0.05 born_in 0.85 irrelevant 0.05 spouse_of 0.45 born_in 0.50 irrelevant 0.05 spouse_of 0.45 born_in 0.50 other 0.05 per 0.85 loc 0.10 other 0.10 per 0.60 loc 0.30 other 0.05 per 0.50 loc 0.45 irrelevant 0.05 spouse_of 0.45 born_in 0.50 irrelevant 0.10 spouse_of 0.05 born_in 0.85 other 0.05 per 0.50 loc 0.45 Improvement over no inference: 2-5% 3
Structured Learning and Inference 4
Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions Outline 5
Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions Outline 6
Set of allowed structures often specified by constraints Weight parameters (to be estimated during learning) Features on input- output Structured Prediction: Inference 7
Structural SVM Score of gold structure Score of predicted structure Loss functionSlack variable 8 For all samples and feasible structures
Dual Problem of Structural SVM 9
Active Set 10
Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions Outline 11
Overview of DEMI-DCD 12 Learning Active Set Selection 12
Learning Thread 13
Synchronization 14
Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions Outline 15
A parallel Dual Coordinate Descent Algorithm 16 Master Slave Sent current w Solve loss-augmented inference and update A Master Update w based on A 16
Structured Perceptron and its Parallel Version 17
Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions Outline 18
Experiment Settings POS tagging (POS-WSJ): Assign POS label to each word in a sentence. We use standard Penn Treebank Wall Street Journal corpus with 39,832 sentences. Entity and Relation Recognition (Entity-Relation): Assign entity types to mentions and identify relations among them. 5,925 training samples. Inference is solved by an ILP solver. We compare the following methods: DEMI-DCD: the proposed method. MS-DCD: A master-slave style parallel implementation of DCD. SP-IPM: parallel structured Perceptron. 19
Convergence on Primal Function Value 20 Relative primal function value difference along training time POS-WSJEntity-Relation Log-scale
Test Performance 21 Test Performance along training time POS-WSJ SP-IPM converges to a different model
Test Performance 22 Test Performance along training time Entity-Relation Task Entity F1 Relation F1
Moving Average of CPU Usage 23 POS-WSJEntity-Relation DEMI-DCD fully utilizes CPU power CPU usage drops because of the synchronization
Different Number of Threads 24 Relative primal function value along training time POS-WSJEntity-Relation
Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions Outline 25
Conclusion We proposed DEMI-DCD for training structural SVM on multi- core machine. The proposed method decouples the model update and inference phases of learning. As a result, it can fully utilize all available processors to speed up learning. Software will be available at: Thank you. 26