Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.

Slides:



Advertisements
Similar presentations
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Advertisements

Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.
Sparsification and Sampling of Networks for Collective Classification
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois.
1 Prepared and presented by Roozbeh Farahbod Voted Perceptron: Modified for NP-Chunking A Re-ranking Method.
Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
Introduction to Machine Learning Approach Lecture 5.
Peter Richtárik (joint work with Martin Takáč) Distributed Coordinate Descent Method AmpLab All Hands Meeting - Berkeley - October 29, 2013.
Standard EM/ Posterior Regularization (Ganchev et al, 10) E-step: M-step: argmax w E q log P (x, y; w) Hard EM/ Constraint driven-learning (Chang et al,
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction Ming-Wei Chang and Scott Wen-tau Yih Microsoft Research 1.
Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
Aspect Guided Text Categorization with Unobserved Labels Dan Roth, Yuancheng Tu University of Illinois at Urbana-Champaign.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
June 2013 Inferning Workshop, ICML, Atlanta GA Amortized Integer Linear Programming Inference Dan Roth Department of Computer Science University of Illinois.
Fast Support Vector Machine Training and Classification on Graphics Processors Bryan Catanzaro Narayanan Sundaram Kurt Keutzer Parallel Computing Laboratory,
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
Dual Transfer Learning Mingsheng Long 1,2, Jianmin Wang 2, Guiguang Ding 2 Wei Cheng, Xiang Zhang, and Wei Wang 1 Department of Computer Science and Technology.
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.
Learning from Big Data Lecture 5
Global Inference via Linear Programming Formulation Presenter: Natalia Prytkova Tutor: Maximilian Dylla
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
Sunpyo Hong, Hyesoon Kim
Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.
Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i candidates.size()
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Lecture 07: Soft-margin SVM
“Temperature-Aware Task Scheduling for Multicore Processors”
By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS
CIS 700 Advanced Machine Learning Structured Machine Learning:   Theory and Applications in Natural Language Processing Shyam Upadhyay Department of.
CIS 700 Advanced Machine Learning for NLP Inference Applications
Improving a Pipeline Architecture for Shallow Discourse Parsing
Jan Rupnik Jozef Stefan Institute
Margin-based Decomposed Amortized Inference
Machine Learning Week 1.
Speaker: Jim-an tsai advisor: professor jia-lin koh
Bird-species Recognition Using Convolutional Neural Network
Lecture 07: Soft-margin SVM
Lecture 08: Soft-margin SVM
Lecture 07: Soft-margin SVM
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Face Recognition: A Convolutional Neural Network Approach
Automatic Handwriting Generation
Dan Roth Computer and Information Science University of Pennsylvania
Introduction to Neural Networks
Dan Roth Department of Computer Science
Presentation transcript:

Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar and Dan Roth 1

Motivation 2

Inference with General Constraint Structure [Roth&Yih’04,07] Recognizing Entities and Relations Dole ’s wife, Elizabeth, is a native of N.C. E 1 E 2 E 3 R 12 R 23 other 0.05 per 0.85 loc 0.10 other 0.05 per 0.50 loc 0.45 other 0.10 per 0.60 loc 0.30 irrelevant 0.10 spouse_of 0.05 born_in 0.85 irrelevant 0.05 spouse_of 0.45 born_in 0.50 irrelevant 0.05 spouse_of 0.45 born_in 0.50 other 0.05 per 0.85 loc 0.10 other 0.10 per 0.60 loc 0.30 other 0.05 per 0.50 loc 0.45 irrelevant 0.05 spouse_of 0.45 born_in 0.50 irrelevant 0.10 spouse_of 0.05 born_in 0.85 other 0.05 per 0.50 loc 0.45 Improvement over no inference: 2-5% 3

Structured Learning and Inference 4

Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions Outline 5

Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions Outline 6

Set of allowed structures often specified by constraints Weight parameters (to be estimated during learning) Features on input- output Structured Prediction: Inference 7

Structural SVM Score of gold structure Score of predicted structure Loss functionSlack variable 8 For all samples and feasible structures

Dual Problem of Structural SVM 9

Active Set 10

Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions Outline 11

Overview of DEMI-DCD 12 Learning Active Set Selection 12

Learning Thread 13

Synchronization 14

Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions Outline 15

A parallel Dual Coordinate Descent Algorithm 16 Master Slave Sent current w Solve loss-augmented inference and update A Master Update w based on A 16

Structured Perceptron and its Parallel Version 17

Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions Outline 18

Experiment Settings POS tagging (POS-WSJ):  Assign POS label to each word in a sentence.  We use standard Penn Treebank Wall Street Journal corpus with 39,832 sentences. Entity and Relation Recognition (Entity-Relation):  Assign entity types to mentions and identify relations among them.  5,925 training samples.  Inference is solved by an ILP solver. We compare the following methods:  DEMI-DCD: the proposed method.  MS-DCD: A master-slave style parallel implementation of DCD.  SP-IPM: parallel structured Perceptron. 19

Convergence on Primal Function Value 20 Relative primal function value difference along training time POS-WSJEntity-Relation Log-scale

Test Performance 21 Test Performance along training time POS-WSJ SP-IPM converges to a different model

Test Performance 22 Test Performance along training time Entity-Relation Task Entity F1 Relation F1

Moving Average of CPU Usage 23 POS-WSJEntity-Relation DEMI-DCD fully utilizes CPU power CPU usage drops because of the synchronization

Different Number of Threads 24 Relative primal function value along training time POS-WSJEntity-Relation

Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions Outline 25

Conclusion We proposed DEMI-DCD for training structural SVM on multi- core machine. The proposed method decouples the model update and inference phases of learning. As a result, it can fully utilize all available processors to speed up learning. Software will be available at: Thank you. 26