Download presentation
Presentation is loading. Please wait.
Published byBlaise Willis Modified over 9 years ago
1
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi
2
What Are Statistical Methods? “Statistical methods of entity extraction convert the extraction task to a problem of designing a decomposition of the unstructured text and then labeling various parts of the decomposition, either jointly or independently.” Models Token-level Segment-level Grammar-based Training Likelihood Max-margin
3
Token-level Models Sequence of tokens (characters, words, or n-grams) Entity labels assigned to each token Generalization of classification problem Feature selection important
4
Features Word features Surface word itself is strong indicator of which label to use Orthographic features Capitalization patterns (cap-words) Presence of special characters Alphanumeric generalization of characters in the token Dictionary lookup features f : (x,y, i) → R
5
Models for Labeling Tokens Logistic classifier Support Vector Machine (SVM) Hidden Markov Models (HMMs) Maximum entropy Markov Model (MEMM) Conditional Markov Model (CMM) Conditional Random Fields (CRFs) Single joint distribution Pr(y|x) Scoring function
6
Segment-level Models Sequence of segments Entity labels assigned to each segment Features span multiple tokens
7
Entity-level Features Exact segment match Similarity function such as TF/IDF Segment length
8
Global Segmentation Models Probability distribution Goal is to find segment s such that w·f (x,s) is maximized
9
Grammar-based Models Production rule oriented Produces parse trees Scoring function for each production
10
Training Algorithms Outputs some y Sequence of labels for sequence models Segmentation of x for segment-level models Parse tree for grammar-based models Argmax of s(y) = w·f (x,y) where f(x,y) is a feature vector Two types of training methods Likelihood-based training Max-margin training
11
Likelihood Trainer Probability distribution Log probability distribution Maximize weight vector w
12
Likelihood Trainer
13
Max-margin Training “an extension of support vector machines for training structured models” Find weight vector w
14
Max-margin Training
15
Inference Algorithms
16
MAP for Sequential Labeling
17
MAP for Segmentations
18
MAP for Parse Trees
19
Expected Features Values for Sequential Labelings
20
Summary Most prominent models used Maximum entropy taggers (MaxEnt) Hidden Markov Models (HMMs) Conditional Random Fields (CRFs) CRFs are now established as state-of-the-art Segment-level and grammar-based CRFs not as popular
21
Further Readings Active learning Bootstrapping from structured data Transfer learning from domain adaptation Collective inference
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.