Presentation is loading. Please wait.

Presentation is loading. Please wait.

School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.

Similar presentations


Presentation on theme: "School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented."— Presentation transcript:

1 School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented by Tal Blum for the course: Machine Learning Approaches to Information Extraction and Information Integration

2 School of Computer Science 2 Outline Background on HMM transition structure selection Background on HMM transition structure selection The algorithm for the sparse IE task The algorithm for the sparse IE task Comparison between their algorithm and Borkar et al. algorithm Comparison between their algorithm and Borkar et al. algorithm Discussion Discussion Results Results

3 School of Computer Science 3 HMMs for IE Has been successfully used in many tasks: Has been successfully used in many tasks: –Speech Recognition –Information Extraction (Biker et al.,Borkar et al.) –IE in Bioinformatics (Leek) –POS Tagging (Ratnaparkhi)

4 School of Computer Science 4 Sparse Extraction task Fields are extracted from a long document Fields are extracted from a long document Most of the document is irrelevant Most of the document is irrelevant Examples: Examples: –NE –Conference Time & Location

5 School of Computer Science 5 Learning HMM Structure? BN X Y Z W S Obs HMM as a BN S1 Obs1 S2 Obs2 S3 Obs3 HMM as a dynamic BN t

6 School of Computer Science 6 Constrained Transition X3X4 X1 X2 X1 X2 X3 X4 X1 X2 X3 X4

7 School of Computer Science 7 HMM Structure Learning Unlike BN structure learning Unlike BN structure learning Learn the structure of the transition Matrix A Learn the structure of the transition Matrix A Learn structures with different number of states Learn structures with different number of states Zip codecountry street St. # Zip codecountry street St. # Zip C2 Zip C1 country street St. #

8 School of Computer Science 8 HMM Structure Example

9 School of Computer Science 9 Example Hierarchical HMM

10 School of Computer Science 10 Why learn HMM structure? HMMs are not specifically suited for IE tasks HMMs are not specifically suited for IE tasks Including structural bias can reduce the amount of parameters needed to learn and therefore require less data Including structural bias can reduce the amount of parameters needed to learn and therefore require less data The parameters will be more accurate The parameters will be more accurate Constrain the number of times a class can appear in a document Constrain the number of times a class can appear in a document Can represent class length more accurately Can represent class length more accurately The emission probability might be multi modal The emission probability might be multi modal To model class left and right context of a class for the sparse IE task To model class left and right context of a class for the sparse IE task

11 School of Computer Science 11 Fully Observed vs. Partially Observed The structure learning is only required when the data is partially observed The structure learning is only required when the data is partially observed Partially Observed – a field is represented by several states, where the label is the field Partially Observed – a field is represented by several states, where the label is the field With fully observed data we can let the probabilities “learn” the structure With fully observed data we can let the probabilities “learn” the structure Edges that are not observed will get zero probability Edges that are not observed will get zero probability Learning the transition structure involves incorporating new states Learning the transition structure involves incorporating new states Naively allowing arbitrary transition will not generalize well Naively allowing arbitrary transition will not generalize well

12 School of Computer Science 12 The Problem How to select the additional states and the state transition structure How to select the additional states and the state transition structure Manual Selection doesn’t scale well Manual Selection doesn’t scale well Human intuition do not always corresponds to the best structures Human intuition do not always corresponds to the best structures

13 School of Computer Science 13 The Solution A system that automatically selects a HMM transition structure A system that automatically selects a HMM transition structure The system starts from an initial simple model and extends it sequentially by a set of operations to search for a better model The system starts from an initial simple model and extends it sequentially by a set of operations to search for a better model The model quality is measured by its discrimination on validation dataset The model quality is measured by its discrimination on validation dataset The best model is returned The best model is returned The system is comparable with human constructed HMM structures and on average outperforms them The system is comparable with human constructed HMM structures and on average outperforms them

14 School of Computer Science 14 IE with HMMs Each extracted field has its own HMM Each extracted field has its own HMM Each HMM contains two kinds of states: Each HMM contains two kinds of states: –Target states –Non-Target states All of the fields HMM are concatenated to a whole consistent HMM All of the fields HMM are concatenated to a whole consistent HMM The entire document is used to train the models with no need of pre-processing The entire document is used to train the models with no need of pre-processing

15 School of Computer Science 15 Parameter Estimation Transition Probabilities Estimation is done with Maximum Likelihood Transition Probabilities Estimation is done with Maximum Likelihood –Unique path – ratio of counts –Non Unique path – use EM Emission Probabilities require smoothing with priors Emission Probabilities require smoothing with priors –shrinkage with EM

16 School of Computer Science 16 Learning State-Transition Structure States: Target Prefix Suffix Background

17 School of Computer Science 17 Model Expansion Choices States: – – Target – – Prefix – – Suffix – – Background Model Expansion Choices: Model Expansion Choices: –Lengthen a prefix –Split a prefix –Lengthen a suffix –Split a suffix –Lengthen a target string –Split a target string –Add a background state

18 School of Computer Science 18 The Algorithm

19 School of Computer Science 19 Discussion Structure Learning is similar to rule learning for word or boundary classification Structure Learning is similar to rule learning for word or boundary classification The search for the best structure is not comprehensive The search for the best structure is not comprehensive There is no attempt to generalize better by using the same emission probabilities for different states There is no attempt to generalize better by using the same emission probabilities for different states

20 School of Computer Science 20 Comparison with Bokar et. al. algorithm Differences Segmentation vs. Sparse Extraction Background and boundaries modeling Unique Path - don’t use EM Backward Search vs. Forward Search Both assume boundaries and that the position is the more relevant feature that distinguish different states

21 School of Computer Science 21 Experimental Results Tested on 8 extraction tasks over 4 datasets Tested on 8 extraction tasks over 4 datasets –Seminar Announcements (485) –Reuter Corporate Acquisition articles (600) –Job Announcements (298) –Call For Paper (363) Training and Testing were equal size Training and Testing were equal size Average performance over 10 splits Average performance over 10 splits

22 School of Computer Science 22 Learned Structure

23 School of Computer Science 23 Experimental Results Compared to 4 other approaches Compared to 4 other approaches –Grown HMM – the structure learned –SRV – rule learning (Freitag 1998) –Rapier – rule learning (Califf 1998) –Simple HMM –Complex HMM

24 School of Computer Science 24 Experimental Results

25 School of Computer Science 25 Conclusions HMMs has been proved to be state of the art method for IE HMMs has been proved to be state of the art method for IE Constraining the transition structure has a crucial effect on performance Constraining the transition structure has a crucial effect on performance Automatic Transition Structure learning compares and even outperforms manually crafted HMMs which require hard labor for manual construction Automatic Transition Structure learning compares and even outperforms manually crafted HMMs which require hard labor for manual construction

26 School of Computer Science 26 The End! Questions?


Download ppt "School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented."

Similar presentations


Ads by Google