Efficient Inference on Sequence Segmentation Models

Slides:



Advertisements
Similar presentations
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Advertisements

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Approximate Counting via Correlation Decay Pinyan Lu Microsoft Research.
1 Fall 2005 Extending LANs Qutaibah Malluhi CSE Department Qatar University Repeaters, Hubs, Bridges, Fiber Modems, and Switches.
Reduced Support Vector Machine
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
CSCI 4550/8556 Computer Networks Comer, Chapter 11: Extending LANs: Fiber Modems, Repeaters, Bridges and Switches.
MAP estimation in MRFs via rank aggregation Rahul Gupta Sunita Sarawagi (IBM India Research Lab) (IIT Bombay)
Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
E.G.M. PetrakisBinary Image Processing1 Binary Image Analysis Segmentation produces homogenous regions –each region has uniform gray-level –each region.
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
Assignment 2: Papers read for this assignment Paper 1: PALMA: mRNA to Genome Alignments using Large Margin Algorithms Paper 2: Optimal spliced alignments.
Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.
Digital Image Processing CCS331 Relationships of Pixel 1.
Lesson Objective: Understand what an algorithm is and be able to use them to solve a simple problem.
Presenter: Shanshan Lu 03/04/2010
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
ALGORITHMS.
Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
The Role of Lexical Analyzer
Doug Raiford Phage class: introduction to sequence databases.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
Quiz Week 8 Topical. Topical Quiz (Section 2) What is the difference between Computer Vision and Computer Graphics What is the difference between Computer.
Dynamic programming with more complex models When gaps do occur, they are often longer than one residue.(biology) We can still use all the dynamic programming.
Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before.
Auditing Information Leakage for Distance Metrics Yikan Chen David Evans TexPoint fonts used in EMF. Read the TexPoint manual.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Strong Supervision From Weak Annotation Interactive Training of Deformable Part Models ICCV /05/23.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Sridhar Raghavan and Joseph Picone URL:
April 21, 2016Introduction to Artificial Intelligence Lecture 22: Computer Vision II 1 Canny Edge Detector The Canny edge detector is a good approximation.
CSC 108H: Introduction to Computer Programming Summer 2011 Marek Janicki.
Virtual Memory (Section 9.3). The Need For Virtual Memory Many computers don’t have enough memory in RAM to accommodate all the programs a user wants.
Divide & Conquer Algorithms
Online Multiscale Dynamic Topic Models
Introduction to Algorithms
Bruhadeshwar Meltdown Bruhadeshwar
Neural networks (3) Regularization Autoencoder
Margin-based Decomposed Amortized Inference
Compact Query Term Selection Using Topically Related Text
Final Presentation: Neural Network Doc Summarization
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Lecture 2- Query Processing (continued)
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
presented by Thomas L. Packer
Sunita Sarawagi IIT Bombay Team: Rahul Gupta (PhD)
UNIVERSITY OF MASSACHUSETTS Dept
Handwritten Characters Recognition Based on an HMM Model
The Power of Computing Algorithms
Data Structures & Algorithms
UNIVERSITY OF MASSACHUSETTS Dept
Recap In previous lessons we have looked at how numbers can be stored as binary. We have also seen how images are stored as binary. This lesson we are.
CSE 5290: Algorithms for Bioinformatics Fall 2009
Presentation transcript:

Efficient Inference on Sequence Segmentation Models Sunita Sarawagi IIT Bombay sunita@iitb.ac.in

Sequence segmentation models Flexible & accurate models for many applications Speech segmentation on phonemes Syntactic chunking Protein/Gene finding Information extraction with entity-level features Whole entity match with database of entities Length of entity between 3 and 8 words Third or fourth token of entity a “-” Last three tokens are digits From Keshet et al ’05 NIPS wkshp

Sequence Vs. Segmentation Models 1 2 3 4 5 6 7 8 R. Fagin and J. Halpern Belief Awareness Reasoning Author Other Title x y Features describe the single word “Fagin” y1 y2 y3 y4 y5 y6 y7 y8 l,u l1=1, u1=2 l1=u1=3 l1=4, u1=5 l1=6, u1=8 R. Fagin and J. Halpern Belief Awareness Reasoning Author Other Title x Degree of match of entire entity with the entity in the database y Similarity to author’s column in database Features describe full entity

Segmentation models Input: sequence x=x1,x2..xn, label set Y Output: segmentation S=s1,s2…sp sj = (start position, end position, label) = (tj,uj,yj) Score: F(x,s) = Transition potentials Segment starting at i has label y and previous label is y’ Segment potentials Segment starting at i’, ending at i, and with label y. All positions from i’ to i get same label. Inference Most likely segmentation (Max-margin trainers) Marginal around segments (likelihood-based & exponentiated-gradient trainers)

Inference: Marginal for a segment Forward messages (L = max segment length) O(n L2) Matrix notation: for L = n Segment Marginal y1 y2 y3 y4 y5 y6 y7 y8

Goal Speed up segmentation models Currently 3—8 times slower than sequence models Eliminate L, the hard limit on segment length Efficiently handle mix of potentials spanning varying number of tokens Pay the penalty of segmentation models only for longer entity level features instead of all of them Empirical results on extraction tasks: Segmentation models with few entity features: higher accuracy at the same cost as sequence models

Succinct potentials Key insight Main challenge Compactly represent features on overlapping segments Main challenge Inference algorithms on compact potentials where cost is independent of segments a potential applies to Four kinds of potentials

Applications with mixed potentials Named Entity Recognition Speech segmentation on phonemes TODO: Add one more example

Efficient Inference: forward pass Sharing computation Split potentials (y-s) into two parts: different Common to all segments ending after i-1 Common to all segments starting before i-m Maximum gap between boundary of any y

Optimized forward pass Two sets of modified forward messages y1 y2 y3 y4 y5 y6 y7 y8 y9 O(n m2) Similar two sets of backward messages TODO: second alpha a<= i-1, remove combined alpha theta B9 Same strategy for max-product inference

Marginals around potentials Direct computation of marginals is O(n2) Reduced to O(1) by two tricks Decomposing potentials as in a,b Sharing computations across adjacent potentials Direct Optimized a bit more tricky

Complexity and data structures Complexity of computing marginals Optimized: O(nm+H), Original: O(nL+G) H = number of features in succinct form G = O(L2H) (In real-data |G| = 5--10 times |H|) Achieved via incremental computation of q Special data structure for storing y to compute qi’:i in O(1) time from previous qi’:i-1 Marginals m computed in sorted order: increasing start boundary, decreasing end boundary TODO:

Empirical evaluation Task: Features: Methods Citations: Cora, articles (L=20) Address: Indian address (L=7) Features: Token-level Orthographic properties/lexicon match of words at the start, end, middle, left, right of segment Entity-level TFIDF Match with lexicon, entity length Methods Sequence-BCEU: Begin-Continue-End-Unique labels Segment: Original un-optimized algorithm Segment-Opt: Optimized inference with compact potentials

Running time and Accuracy

Limit on segment length (L) L (Hard limit on segment length) Too small  reduced accuracy 9081 Too large  increased running time 30 minutes  1 hour m (Maximum entity-level features) Reduced by half  accuracy still 3% higher than Sequence Too large  running time does increases by only 30%

Code: http://crf.sourceforge.net Concluding remarks Segmentation models: natural, flexible, accurate Main limitation: inference expensive Solved via a compact design of shared potentials New efficient inference algorithms Pays penalty of entity-level features only when needed Running time comparable to sequence models No hard limit on segment length Future work: Features that are functions of distance from boundary Other models: 2-D segmentation? Code: http://crf.sourceforge.net