INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy of Chinese Sciences AAAI 2010, Atlanta7/15/101
INSTITUTE OF COMPUTING TECHNOLOGY Semantic Role Labeling Given a sentence and its verbs Identify the arguments of the verbs Assign semantic labels (the roles they play) This company last year1000 cars in the U.S. sold Agent Patient ArgMod -TeMPoral ArgMod -TeMPoral ArgMod -LOCation ArgMod -LOCation PropBank (Kingsbury and Palmer 2002) 7/15/102
INSTITUTE OF COMPUTING TECHNOLOGY One Conventional Approach the roleof Celimeneisplayedby Kim Cattrall Patient Agent AAAI 2010, Atlanta7/15/103
INSTITUTE OF COMPUTING TECHNOLOGY One Conventional Approach the roleof Celimeneisplayedby Kim Cattrall PatientAgent S NP PP VP AUXVBNPP VP NP AAAI 2010, Atlanta7/15/104
INSTITUTE OF COMPUTING TECHNOLOGY One Conventional Approach the roleof Celimeneisplayedby Kim Cattrall PatientAgent S NP PP VP AUXVBNPP VP ? more than 15% AAAI 2010, Atlanta7/15/105
INSTITUTE OF COMPUTING TECHNOLOGY … 12 3 k … Solution k-best parses: limited scope: k too much redundancy 2 5 <50<2 6 S NP PP VP AUXVBNPP VP NP S PP VP AUXVBNPP VP … AAAI 2010, Atlanta7/15/106
INSTITUTE OF COMPUTING TECHNOLOGY Our Solution Forest A compact representation of many parses By sharing common sub-derivations Polynomial-space encoding of exponentially large set S NPPP VP AUXVBNPP NP VP S NP PP VP AUXVBNPP VP NP S PP VP AUXVBNPP VP … Unpack AAAI 2010, Atlanta7/15/107
INSTITUTE OF COMPUTING TECHNOLOGY Our Solution Forest A compact representation of many parses By sharing common sub-derivations Polynomial-space encoding of exponentially large set S NPPP VP AUXVBNPP NP VP AAAI 2010, Atlanta7/15/108
INSTITUTE OF COMPUTING TECHNOLOGY Outline Tree-based Semantic Role Labeling Parsing Selecting candidates Extracting features Classifying Forest-based Semantic Role Labeling Experiments Conclusion AAAI 2010, Atlanta7/15/109
INSTITUTE OF COMPUTING TECHNOLOGY Parsing S NP VP DTNNJJNNVBDNPPP CDNNSINNP NNPDT Thiscompanylastyear sold 1000carsin theU.S. AAAI 2010, Atlanta7/15/1010
INSTITUTE OF COMPUTING TECHNOLOGY Selecting Candidates S NP VP DTNNJJNNVBDNPPP CD NNS INNP NNPDT sold Thiscompanylastyear 1000carsin theU.S. AAAI 2010, Atlanta7/15/1011
INSTITUTE OF COMPUTING TECHNOLOGY Extracting Features S NP VP DTNNJJNNVBDNPPP CDNNSINNP NNPDT Thiscompanylastyear sold 1000carsin theU.S. Path to the predicate Thiscompanylastyear 1000carsin theU.S. NNS NP S VP VBN AAAI 2010, Atlanta7/15/1012
INSTITUTE OF COMPUTING TECHNOLOGY Extracting Features S NP VP DTNNJJNNVBDNPPP CDINNP NNPDT Thiscompanylastyear sold 1000carsin theU.S. Position: left Thiscompanylastyear 1000carsin theU.S. NNS NP S VP VBN left AAAI 2010, Atlanta7/15/1013
INSTITUTE OF COMPUTING TECHNOLOGY Extracting Features S NP VP DTNNJJNNVBDNPPP CDINNP NNPDT Thiscompanylastyear sold 1000carsin theU.S. Head word: company Thiscompanylastyear 1000carsin theU.S. NNS NP S VP VBN left company AAAI 2010, Atlanta7/15/1014
INSTITUTE OF COMPUTING TECHNOLOGY Extracting Features S NP VP DTNNJJNNVBDNPPP CDINNP NNPDT Thiscompanylastyear sold 1000carsin theU.S. Head POS tag: NN Thiscompanylastyear 1000carsin theU.S. NNS NP S VP VBN left company NN … AAAI 2010, Atlanta7/15/1015
INSTITUTE OF COMPUTING TECHNOLOGY Classifying S NP VP DTNNJJNNVBDNPPP CDINNP NNPDT Thiscompanylastyear sold 1000carsin theU.S. S(Agent)=0.1 S(Patient)=0.1 S(None)=0.5 … S(AM-TMP)=0.9 S(Patient)=0.1 S(None)=0.1 … S(Agent)=0.2 S(Patient)=0.8 S(None)=0.1 … S(Agent)=0.8 S(Patient)=0.1 S(None)=0.1 … S(AM-LOC)=0.9 S(Agent)=0.1 S(None)=0.1 … Computing Score using a trained classifier Thiscompanylastyear 1000carsin theU.S. NNS 16
INSTITUTE OF COMPUTING TECHNOLOGY Classifying S NP VP DTNNJJNNVBDNPPP CDINNP NNPDT Thiscompanylastyear sold 1000carsin theU.S. S(Agent)=0.8 … S(AM-LOC)=0.9 … Thiscompanylastyear 1000carsin theU.S. NNS S(None)=0.5 … S(AM-TMP)=0.9 … S(Patient)=0.8 … Best score for each constituent Simply sort them Choose the best label sequence NP 17
INSTITUTE OF COMPUTING TECHNOLOGY Classifying S NP VP DTNNJJNNVBDNPPP CDINNP NNPDT Thiscompanylastyear sold 1000carsin theU.S. Agent AM-TMPV Patient AM-LOC Thiscompanylastyear 1000carsin theU.S. NNS 18
INSTITUTE OF COMPUTING TECHNOLOGY Outline Tree-based Semantic Role Labeling Forest-based Semantic Role Labeling Parsing into a forest Selecting candidates Extracting features on forest Classifying Experiments Conclusion AAAI 2010, Atlanta7/15/1019
INSTITUTE OF COMPUTING TECHNOLOGY Forest the roleof Celimene is played by Kim Cattrall S NPPP VP AUXVBNPP NP VP Hyper-graph Hyper-edge Node AAAI 2010, Atlanta7/15/1020
INSTITUTE OF COMPUTING TECHNOLOGY Selecting Candidates the roleof Celimene is played by Kim Cattrall S NPPP VP AUXVBNPP NP VP AAAI 2010, Atlanta7/15/1021
INSTITUTE OF COMPUTING TECHNOLOGY Exacting features Path to the predicate the roleof Celimene is played by Kim Cattrall S NPPP VP AUXVBNPP NP VP NP NP S VP VP VBN AAAI 2010, Atlanta7/15/1022
INSTITUTE OF COMPUTING TECHNOLOGY Exacting features Path to the predicate the roleof Celimene is played by Kim Cattrall S NPPP VP AUXVBNPP NP VP NP S VP VP VBN NP NP S VP VP VBN shortest AAAI 2010, Atlanta7/15/1023
INSTITUTE OF COMPUTING TECHNOLOGY Exacting features Parent Label NP S VP VP VBN the roleof Celimene is played by Kim Cattrall S NPPP VP AUXVBNPP NP VP AAAI 2010, Atlanta7/15/1024
INSTITUTE OF COMPUTING TECHNOLOGY Exacting features Parent Label the roleof Celimene is played by Kim Cattrall NPPP VP AUXVBNPP VP NP S VP VP VBN in the shortest path AAAI 2010, Atlanta7/15/1025
INSTITUTE OF COMPUTING TECHNOLOGY New Features Parsing score (Fractional value (Mi et al., 2008)) Inside-outside Marginal prob. the roleof Celimene is played by Kim Cattrall S NPPP VP AUXVBNPP NP VP NP S VP VP VBN f(NP 3 ) AAAI 2010, Atlanta7/15/1026
INSTITUTE OF COMPUTING TECHNOLOGY Classifying S(Patient)=0.8 S(Agent)=0.1 S(None)=0.2 … S(Patient)=0.5 S(Agent)=0.1 S(None)=0.3 … the roleof Celimene is played by Kim Cattrall S NPPP VP AUXVBNPP NP VP S(Agent)=0.8 S(Patient)=0.1 S(None)=0.2 … AAAI 2010, Atlanta7/15/1027
INSTITUTE OF COMPUTING TECHNOLOGY Classifying S(Patient)=0.8 … the roleof Celimene is played by Kim Cattrall S NP PP VP AUXVBNPP NP VP S(Agent)=0.8 … PatientAgent AAAI 2010, Atlanta7/15/1028
INSTITUTE OF COMPUTING TECHNOLOGY Outline Tree-based Semantic Role Labeling Forest-based Semantic Role Labeling Experiments Conclusion AAAI 2010, Atlanta7/15/1029
INSTITUTE OF COMPUTING TECHNOLOGY Experiments Corpus: CoNLL-2005 shared task Sections of PropBank for training Section 24 for development set Section 23 for test set Total 43,594 sentences 262,281 arguments AAAI 2010, Atlanta7/15/1030
INSTITUTE OF COMPUTING TECHNOLOGY Experiments Training sentences Parse into 1-best and forest Prune forest using inside-outside algorithm Train classifiers Decoding sentences Parse into 1-best and forest Prune forest using inside-outside algorithm Use classifiers AAAI 2010, Atlanta7/15/1031
INSTITUTE OF COMPUTING TECHNOLOGY Features Predicate lemma Path to predicate Path length Partial path Position Voice Head word/POS tag … AAAI 2010, Atlanta7/15/1032
INSTITUTE OF COMPUTING TECHNOLOGY Results on Dev Set precision recall F 1-best 50-best forest(p3) 9.63×10 5 forest(p5) 5.78×
INSTITUTE OF COMPUTING TECHNOLOGY Results on Tst Set AAAI 2010, Atlanta7/15/1034
INSTITUTE OF COMPUTING TECHNOLOGY Outline Tree-based Semantic Role Labeling Forest-based Semantic Role Labeling Experiments Conclusion AAAI 2010, Atlanta7/15/1035
INSTITUTE OF COMPUTING TECHNOLOGY Conclusion Forest Exponentially encode many parses Enlarge the candidate space Explore more rich features Improve the quality significantly Not necessary using very large forest Can NOT use k-best to simulate Future works Features on forest AAAI 2010, Atlanta7/15/1036
INSTITUTE OF COMPUTING TECHNOLOGY Thank you! Patient AAAI 2010, Atlanta7/15/1037