Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National.

Slides:



Advertisements
Similar presentations
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Advertisements

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Albert Gatt Corpora and Statistical Methods Lecture 13.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Vector Space Model CS 652 Information Extraction and Integration.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection Boanerges Aleman-Meza, Meenakshi Nagarajan,
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Efficient Concept-Based Mining Model for Enhancing.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Word Weighting based on User’s Browsing History Yutaka Matsuo National Institute of Advanced Industrial Science and Technology (JPN) Presenter: Junichiro.
Andriy Shepitsen, Jonathan Gemmell, Bamshad Mobasher, and Robin Burke
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
Adding Semantics to Clustering Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang Microsoft Research Asia, Beijing, P.R.China Department of Computer.
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Measuring Semantic Similarity between Words Using Web Search Engines WWW 07.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Hierarchical Clustering for POS Tagging of the Indonesian Language Derry Tanti Wijaya and Stéphane Bressan.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
A Practical Web-based Approach to Generating Topic Hierarchy for Text Segments CIKM2004 Speaker : Yao-Min Huang Date : 2005/03/10.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.
Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Your caption here POLYPHONET: An Advanced Social Network Extraction System from the Web Yutaka Matsuo Junichiro Mori Masahiro Hamasaki National Institute.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
Finding Social Network for Trust Calculation Yutaka Matsuo, Hironori Tomobe, Koiti Hasida and Mitsuru Ishizuka National Institute of Advance Industrial.
Authors: Yutaka Matsuo & Mitsuru Ishizuka Designed by CProDM Team.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
An Integrated Approach for Relation Extraction from Wikipedia Texts Yulan Yan Yutaka Matsuo Mitsuru Ishizuka The University of Tokyo WWW 2009.
Information Organization: Overview
System for Semi-automatic ontology construction
Presented by: Prof. Ali Jaoua
Information Organization: Clustering
Topic Oriented Semi-supervised Document Clustering
Hierarchical, Perceptron-like Learning for OBIE
Information Organization: Overview
Presentation transcript:

Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National Institute of Advanced Industrial Science and Technology IJCAI-07

Abstract The goal is extracting the underlying relations between entities that are embedded in social networks. The algorithm automatically extracts labels that describe relations among entities. The algorithm –clusters similar entity pairs –underlying relations between entities are obtained from results of clustering.

Introduction Social networks for AI and the Semantic Web –trust estimation –ontology construction –end-user ontology Building social networks –extraction of social networks automatically from various sources of information. Flink : Web pages, messages, and publications Polyphonet [www06]

Introduction Explore underlying relations Most automatic extraction methods are superficial approach Co-occurrence analysis Non-profound assessment –Flink : provide a clue to the strength of relations –Polyphonet : defines four kinds of relations C5 Co-Author, Co-Lab, Co-Proj, Co-Conf

Related Work A supervised method –Need large annotated corpora –to gather the domain specific knowledge –a priori to define extracted relations Ontology population (Semantic annotation) –Pattern-based approaches –context-based approaches Web is highly heterogeneous and unstructured –In this paper context-based a bag-of-words of context [Turney, 2005]

Method - Concept (1/4) The social network was extracted according to co-occurrence of entities on the Web.

Method - Concept (2/4) Given entity pairs in the social network –discover relevant keyphrases to analyze the surrounding local context (Co-occur on the Web ) keyword extraction

Method - Concept (3/4) The keywords are ordered according to TF-IDF- based scoring

Method - Concept (4/4) Hypothesize: –the local contexts of entity pairs in the Web are similar, the entity pairs share a similar relation. –[Harris, 1968; Schutze, 1998]: words are similar to the extent that their contextual representations are similar. According to that hypothesis –the method clusters entity pairs according to the similarity of their collective contexts. – each cluster represents a different relation and each entity pair in a cluster is an instance of similar relation.

Method - Procedure

Method - Context Model and Similarity Calculation C i,j (n,m) = t 1,..., t N –A context model C i,j of an entity pair (e i, e j ) –N terms t 1,..., t N that are extracted from the context of an entity pair –m is the number of intervening terms between e i and e j –n is the number of words to the left and right of either entity. –a feature weight of t i : TF-IDF TF : term frequency of term t i in the contexts IDF : log(|C|/df(t i ))+1

Method - Clustering and Label Selection TFIDF-based cosine similarity Hierarchical agglomerative clustering –complete linkage –The similarity between the clusters CL 1, CL 2 is evaluated by considering the two most dissimilar elements With a cluster CL’s labels l 1,..., l n scored according to the term relevancy, an entity pair, e i and e j, that belongs to the CL can be regarded as holding the relations described by l 1,..., l n.

Experiment – 1/3 Test Data – 143 distinct entity pairs from a political social network pair of a politician and a geo-political entity – 421 entity pairs from a researcher network pair of Japanese AI researchers Context model of each entity pair –100 Web pages –NP and Noun by part-of-speeches (POS) –exclude stop words

Experiment – 2/3 Clustering –complete-linkage agglomerative five distinct clusters for the political social network twelve distinct clusters for the researcher network two human subjects –three or fewer possible labels for each pairs –a cluster label the most frequent term among the manually assigned relation labels of entity pairs in the cluster.

Experiment – 3/3

Evaluation For each cluster cl –EP cl,correct : manually assigned relation labels include the label of cluster cl –EP cl,total : the number of entity pairs in the cluster cl For each relation l –EP l,correct : the relation label l whose cluster label is l –EP l,total : the number of entity pairs have the relation label l

Evaluation

Conclusions Automatically extracting labels –relations between entities in social networks –Unsupervised and domain independent Utilizing the Web to obtain the collective contexts –Semantic Web –Web mining Future –other types of social networks –enriching social networks