Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
A Comparison of String Matching Distance Metrics for Name-Matching Tasks William Cohen, Pradeep RaviKumar, Stephen Fienberg.
Intelligent Information Retrieval 1 Vector Space Model for IR: Implementation Notes CSC 575 Intelligent Information Retrieval These notes are based, in.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Extracting Personal Names from Applying Named Entity Recognition to Informal Text Einat Minkov & Richard C. Wang Language Technologies Institute.
Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Conditional Random Fields
Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition Author: Andrew Borthwick John Sterling Eugene Agichtein Ralph Grishman.
Scalable Text Mining with Sparse Generative Models
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 6 9/8/2011.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Design Challenges and Misconceptions in Named Entity Recognition Lev Ratinov and Dan Roth The Named entity recognition problem: identify people, locations,
INFORMATION EXTRACTION SNITA SARAWAGI. Management of Information Extraction System Performance Optimization Handling Change Integration of Extracted Information.
Ling 570 Day 17: Named Entity Recognition Chunking.
A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.
1. RECAP 2 Parallel NB Training Documents/labels Documents/labels – 1 Documents/labels – 2 Documents/labels – 3 counts DFs Split into documents subsets.
Dynamic Programming.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
CONCEPTS AND TECHNIQUES FOR RECORD LINKAGE, ENTITY RESOLUTION, AND DUPLICATE DETECTION BY PETER CHRISTEN PRESENTED BY JOSEPH PARK Data Matching.
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
Distance functions and IE – 5 William W. Cohen CALD.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
IE with Dictionaries Cohen & Sarawagi. Announcements Current statistics: –days with unscheduled student talks: 2 –students with unscheduled student talks:
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
Chapter 23: Probabilistic Language Models April 13, 2004.
Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.
GEM: The GAAIN Entity Mapper Naveen Ashish, Peehoo Dewan, Jose-Luis Ambite and Arthur W. Toga USC Stevens Neuroimaging and Informatics Institute Keck School.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Data Structures and Algorithms Searching Algorithms M. B. Fayek CUFE 2006.
NTU & MSRA Ming-Feng Tsai
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
BOOTSTRAPPING INFORMATION EXTRACTION FROM SEMI-STRUCTURED WEB PAGES Andrew Carson and Charles Schafer.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before.
Learning Analogies and Semantic Relations Nov William Cohen.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
Conditional Random Fields & Table Extraction Dongfang Xu School of Information.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Distance functions and IE - 3 William W. Cohen CALD.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Efficient Inference on Sequence Segmentation Models
CSE 4705 Artificial Intelligence
Representation of documents and queries
Text Categorization Assigning documents to a fixed set of categories
CSCI 5832 Natural Language Processing
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
Handwritten Characters Recognition Based on an HMM Model
The Voted Perceptron for Ranking and Structured Classification
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Presentation transcript:

Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi Presented by: Quoc Le CS591CXZ – General Web Mining.

Motivation Information Extraction –Deriving structured data from unstructured data. –Using structured data as guidance to improve extracting unstructured sources. Name Entity Recognition –Extracting names, locations, times. –Improving NER systems with external dictionaries.

Approaches Look-up entities in (large) dictionary. –Surface from is different, prone to noise and errors. Take an existing NER system and link to an external dictionary. –High performance NER classify words to class vs. similarity of entire entity to dictionary entry.

Problem Formulation Name finding as Word Tagging –E.g.: (Fred) Person (please stop by) Other (my office) Loc (this afternoon) Time –x: sequence of words map to y: sequence of labels.  (x,y) pairs. Conditional distribution of y given x (HMM):

Semi-Markovian NER Segmentation: S = : start position t j, end position u j and a label l j. –E.g.: S = {(1,1,Person), (2,4.Oth), (5,6,Loc), (7,8,Time)} Conditional semi-Markov Model (CSMM): Inference and Learning problems

Compare to other approaches CMM: CSMM predicts tag + position. Order-L CMM: CSMM uses corresponding tokens (not just previous ones). Treat external dictionary as training examples: prone to misspelling, large dictionary, different dictionary. (Good when training data is limited). N-gram classification: entities may overlap Use dictionary to bootstrap the search for extraction patterns: rule-based vs. probabilistic.

Training SMM Modified version of Collins’ perceptron-based algorithm for training HMMs. Assume local feature function f which maps pair (x,S) and an index j to a vector of features f(j,x,S). Define: Let W be the weight vector over the components of F. –Inference: Compute V(W,x) – the Viterbi decoding of x with W. –Training: Learn W that lead to best performance. Viterbi search can be done with recurrence of V x,W (i,y).

Perceptron-based SMM Learning Let SCORE(x,W,S) = W. F(x,S). For each example x t, S t : –Find K segmentations that gives highest score –Let W t+1 = W t –For each I such that SCORE(x t,W t,S i ) is greater than (1-β). SCORE(x t,W t,S t ), update W t+1 : W t+1 = W t+1 + F(x t,S t ) – F(x t,S i ) Return the average of all W t.

Features Examples: value of segment, length of segment, left window, right window etc. Most can be applied to HMM NER system. More powerful and meaningful, e.g. “X+ X+” is more indicative than “X+” for name. Distance features: similarity to words in an external dictionary

Distance Features D: dictionary; d: distance metric, e: entity name in D. e’: segment.  Define: –g D/d (e’) = min d(e,e’). Distance metric: Jaccard (word-level), Jaro-Winkler (character-level), TFIDF (word-level), SoftTFIDF (hybrid measure).

Experiments HMM-VP (1) : Predicts two labels y: one for tokens inside an entity & one outside. HMM-VP (4) : Encoding scheme: Use labels with tags unique, begin, end, continue and other. –E.g.: (Fred) Person unique,please stop by the (fourth) Loc begin (floor meeting) Loc c ontinue (room) Loc end SMM (K = 2, E = 20, β = 0.05): any, first, last, exact. Data sets: address in India, student s, jobs.

Considerations Evaluate exact matching against a dictionary: low recall, errors, but provide good indication of quality of dictionary. Normalizing dictionary entries: yes and no. E.g.: “Will” & “will”. For HMMs, we could use partial distance between tokens & dictionary entries. Segment size is bounded by some number

Evaluation Combination of NER methods; without external dictionary, with binary features, with distance features. Train with only 10% data, test with the rest. Do it 7 times and record average. Partial extraction gets no credit Use precision, recall and F1.

Results SMM-VP is best: outperforms HMM-VP (4) on 13 out of 15 cases. HMM-VP (1) is worst: HMM-VP (4) outperforms HMM-VP(1) on 13 out of 15 cases. Binary dict. features are helpful but distance features are more helpful. See Table 1 (details) and 4 (short).

Effects Improvements over Collins methods – T.5. The gap between SMM and HMM-VP (4) decreases when training size increases, but still at different convergent speed. T.2 High order HMM doesn’t improve much performance. T.6 Alternative (less-related) dictionaries: Both methods seems fairly robust.

Conclusion & Questions Conclusion –Incorporate knowledge nicely. –Applicable to sequential model. –Improvements is significant, but it uses more resources and run 3-5 times slower. Questions: –What if dictionary is not super-set? Unrelated dictionary. –Harder type of data which is not easy to get named.