Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology.

Slides:



Advertisements
Similar presentations
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
Word sense induction using continuous vector space models
Prestige (Seeley, 1949; Brin & Page, 1997; Kleinberg,1997) Use edge-weighted, directed graphs to model social networks Status/Prestige In-degree is a good.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology.
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Oh-Woog Kwon KLE Lab. CSE POSTECH.
Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.
1 A Graph-Theoretic Approach to Webpage Segmentation Deepayan Chakrabarti Ravi Kumar
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Web Mining Class Nam Hoai Nguyen Hiep Tuan Nguyen Tri Survey on Web Structure Mining
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Overview of Web Ranking Algorithms: HITS and PageRank
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms EMNLP /5/27 Mamoru Komachi †, Taku Kudo ‡, Masashi Shimbo † and.
HyperLex: lexical cartography for information retrieval Jean Veronis Presented by: Siddhanth Jain( ) Samiulla Shaikh( )
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Ranking Link-based Ranking (2° generation) Reading 21.
Analysis of Link Structures on the World Wide Web and Classified Improvements Greg Nilsen University of Pittsburgh April 2003.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Wenyuan Dai, Ou Jin, Gui-Rong Xue, Qiang Yang and Yong Yu Shanghai Jiao Tong University & Hong Kong University of Science and Technology.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
DM GROUP MEETING PRESENTATION PLAN Eigenvector-based Centrality Measures For Temporal Networks by D Taylor et.al. Uncovering the Small Community.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Graph-based WSD の続き DMLA /7/10 小町守.
Semi-Supervised Clustering
HITS Hypertext-Induced Topic Selection
7CCSMWAL Algorithmic Issues in the WWW
Nara Institute of Science and Technology
Presented by: Chang Jia As for: Pattern Recognition
Junghoo “John” Cho UCLA
Lecture 16. Classification (II): Practical Considerations
Presented by Nick Janus
--WWW 2010, Hongji Bao, Edward Y. Chang
Presentation transcript:

Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology

Corpus Supervised learning has succeeded in many natural language processing tasks Needs time-consuming annotation (data creation) Why not learning from minimally annotated resource? Dictionary Classifier Corpus High-quality Large-scale High-accuracy 2

Corpus-based extraction of semantic category 3 Singapore Hong Kong Visa for __ Hong Kong China History of __ Australia Egypt InstancePatternNew instance Alternate step by step InputOutput(Extracted from corpus) Visa for Singapore Travel guide to Singapore

Semantic drift is the central problem of bootstrapping Singapore card visa __ is Australia messages greeting __ card words InstancePatternNew instance Errors propagate to successive iteration InputOutput(Extracted from corpus) Semantic category changed! Generic patterns Patterns co-occurring with many irrelevant instances Generic patterns Patterns co-occurring with many irrelevant instances 4

Two major problems solved by this work Why semantic drift occurs? Is there any way to prevent semantic drift? 5

Answers to the problems of semantic drift 1.Suggest a parallel between semantic drift in Espresso [Pantel and Pennachiotti, 2006] style bootstrapping and topic drift in HITS [Kleinberg, 1999] 2.Solve semantic drift using “relatedness” measure (regularized Laplacian) instead of “importance” measure (HITS authority) used in link analysis community 6

Table of contents 7 4.Graph-based Analysis of Espresso-style Bootstrapping Algorithms 3.Espresso-style Bootstrapping Algorithms 2. Overview of Bootstrapping Algorithms 5.Word Sense Disambiguation 6.Bilingual Dictionary Construction 7.Learning Semantic Categories

Preliminaries: Espresso and HITS

Espresso Algorithm [Pantel and Pennacchiotti, 2006] Repeat Pattern extraction Pattern ranking Pattern selection Instance extraction Instance ranking Instance selection Until a stopping criterion is met 9

Pattern/instance ranking is mutually defined in Espresso Score for pattern p Score for instance i 10 p: pattern i: instance P: set of patterns I: set of instances pmi: pointwise mutual information max pmi: max of pmi in all the patterns and instances Reliable instances are supported by reliable patterns, and vice versa

HITS (Hypertext Induced Topic Search) finds Hubs and Authorities in a linked graph 11 Hub hAuthority a Hub score = sum of weights for all nodes pointed by h Authority score = sum of weights for all nodes pointing to a

HITS Algorithm [Kleinberg 1999] Input Initial hub score vector Adjacency matrix A Main loop Repeat Until a and h converge Output Hub and authority score vectors a and h 12 α: normalization factor β: normalization factor

HITS converges to fixed points regardless of initial input Authority score vector Hub score vector 13 a(k): vector a on k-th iteration h(k): vector h on k-th iteration HITS authority vector a = the principal eigenvector of A T A HITS hub vector h = the principal eigenvector of AA T where A T A = co-citation matrix AA T = bibliographic coupling matrix

Graph-based Analysis of Espresso-style Bootstrapping Algorithms How Espresso works, and how Espresso fails to solve semantic drift

p = pattern score vector i = instance score vector A = pattern-instance matrix Make Espresso look like HITS pattern ranking... instance ranking |P| = number of patterns |I| = number of instances normalization factors to keep score vectors not too large

Espresso uses pattern-instance matrix A as adjacency matrix in HITS |P|×|I| -dimensional matrix holding the (normalized) pointwise mutual information (pmi) between patterns and instances i |I| 1 2 : p : |P| [A] p,i = pmi(p,i) / max p,i pmi(p,i) instance indices pattern indices

Three simplifications to reduce Espresso to HITS Repeat Pattern extraction Pattern ranking Pattern selection Instance extraction Instance ranking Instance selection Until a stopping criterion is met 17 For graph-theoretic analysis, we will introduce 3 simplifications to Espresso For graph-theoretic analysis, we will introduce 3 simplifications to Espresso

Keep pattern-instance matrix constant in the main loop Compute the pattern-instance matrix Repeat Pattern extraction Pattern ranking Pattern selection Instance extraction Instance ranking Instance selection Until a stopping criterion is met 18 Simplification 1 Remove pattern/instance extraction steps Instead, pre-compute all patterns and instances once in the beginning of the algorithm Simplification 1 Remove pattern/instance extraction steps Instead, pre-compute all patterns and instances once in the beginning of the algorithm

Remove pattern/instance selection heuristics Compute the pattern-instance matrix Repeat Pattern ranking Pattern selection Instance ranking Instance selection Until a stopping criterion is met 19 Simplification 2 Remove pattern/instance selection steps which retain only highest scoring k patterns / m instances for the next iteration i.e., reset the scores of other items to 0 Instead, retain scores of all patterns and instances Simplification 2 Remove pattern/instance selection steps which retain only highest scoring k patterns / m instances for the next iteration i.e., reset the scores of other items to 0 Instead, retain scores of all patterns and instances

Remove early stopping heuristics Compute the pattern-instance matrix Repeat Pattern ranking Instance ranking Until a stopping criterion is met 20 Until score vectors p and i converge Simplification 3 No early stopping i.e., run until convergence Simplification 3 No early stopping i.e., run until convergence

Simplified Espresso Input Initial score vector of seed instances Pattern-instance co-occurrence matrix A Main loop Repeat Until i and p converge Output Instance and pattern score vectors i and p pattern ranking... instance ranking

HITS Algorithm [Kleinberg 1999] Input Initial hub score vector Adjacency matrix A Main loop Repeat Until a and h converge Output Hub and authority score vectors a and h 22 α: normalization factor β: normalization factor

Simplified Espresso is essentially HITS Simplified Espresso =HITS Problem  No matter which seed you start with, the same instance is always ranked topmost  Semantic drift (also called topic drift in HITS) 23 The ranking vector i tends to the principal eigenvector of A T A as the iteration proceeds regardless of the seed instances!

How about Espresso? Espresso has two heuristics not present in Simplified Espresso Early stopping Pattern and instance selection Do these heuristics really help reduce semantic drift? And how? 24

Experiments on semantic drift Does the heuristics in original Espresso help reduce drift?

Word sense disambiguation task of Senseval-3 English Lexical Sample Predict the sense of “bank” 26 … the financial benefits of the bank (finance) 's employee package ( cheap mortgages and pensions, etc ), bring this up to … In that same year I was posted to South Shields on the south bank (bank of the river) of the River Tyne and quickly became aware that I had an enormous burden Possibly aligned to water a sort of bank(???) by a rushing river. Training instances are annotated with their sense Predict the sense of target word in the test set

Word sense disambiguation by Espresso Seed instance = the instance to predict its sense Proximity measure = instance score vector given by Espresso 27 Seed instance... pattern ranking... instance ranking

Example of k-NN classification by Espresso System output = k-nearest neighbor (k=3) i=(0.9, 0.1, 0.8, 0.5, 0, 0, 0.95, 0.3, 0.2, 0.4) → sense A 28 Seed instance … the financial benefits of the bank (finance) 's employee package ( cheap mortgages and pensions, etc ), bring this up to … In that same year I was posted to South Shields on the south bank (bank of the river) of the River Tyne and quickly became aware that I had an enormous burden

Two heuristics in Espresso Early stopping Plot results on each iteration Pattern and instance selection # of patterns to retain p=20 (increase p by 1 on each iteration) # of instance to retain m=100 (increase m by 100 on each iteration) Evaluation metric Recall = |correct instances| / |total true instances| 29

10/16/ Convergence process of Espresso Original Espresso Simplified Espresso Most frequent sense (baseline) Heuristics in Espresso helps reducing semantic drift (However, early stopping is required for optimal performance) Output the most frequent sense regardless of input Semantic drift occurs (always outputs the most frequent sense)

Learning curve of Espresso: per-sense breakdown 10/16/ # of most frequent sense predictions increases Recall for infrequent senses worsens even with original Espresso Most frequent sense Other senses

Summary: Espresso and semantic drift Semantic drift happens because Espresso is designed like HITS HITS gives the same ranking list regardless of seeds Some heuristics reduce semantic drift Early stopping is crucial for optimal performance Still, these heuristics require many parameters to be calibrated but calibration is difficult 32

Main contributions of this work 1.Suggest a parallel between semantic drift in Espresso-like bootstrapping and topic drift in HITS (Kleinberg, 1999) 2.Solve semantic drift by graph kernels used in link analysis community 33

Q. What caused drift in Espresso? A. Espresso's resemblance to HITS HITS is an importance computation method (gives a single ranking list for any seeds) Why not use a method for another type of link analysis measure - which takes seeds into account? "relatedness" measure (it gives different rankings for different seeds) 34

The regularized Laplacian kernel A relatedness measure Has only one parameter 35 Normalized Graph Laplacian Regularized Laplacian matrix A :adjacency matrix of the graph D :(diagonal) degree matrix β:parameter Each column of R β gives the rankings relative to a node

Word Sense Disambiguation Evaluation of regularized Laplacian against Espresso and other graph-based algorithms

Label prediction of “bank” (Recall) AlgorithmMost frequent senseOther senses Simplified Espresso Espresso (after convergence) Espresso (optimal stopping) Regularized Laplacian ( β =10 -2 ) The regularized Laplacian keeps high recall for infrequent senses Espresso suffers from semantic drift (unless stopped at optimal stage)

WSD on all nouns in Senseval-3 algorithmRecall Most frequent sense (baseline)54.5 HyperLex [Agirre et al. 2005]64.6 PageRank [Agirre et al. 2005]64.6 Simplified Espresso44.1 Espresso (after convergence)46.9 Espresso (optimal stopping)66.5 Regularized Laplacian ( β =10 -2 ) Outperforms other graph-based methods Espresso needs optimal stopping to achieve an equivalent performance

Regularized Laplacian is stable across a parameter 39

Conclusions Semantic drift in Espresso is a parallel form of topic drift in HITS The regularized Laplacian reduces semantic drift in bootstrapping for natural language processing tasks inherently a relatedness measure (  importance measure) 40

Future work Investigate if a similar analysis is applicable to a wider class of bootstrapping algorithms (including co-training) Investigate the influence of seed selection to bootstrapping algorithms and propose a way to select effective seed instances 41