Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National.

Similar presentations


Presentation on theme: "Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National."— Presentation transcript:

1 Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National Institute of Advanced Industrial Science and Technology IJCAI-07

2 Abstract The goal is extracting the underlying relations between entities that are embedded in social networks. The algorithm automatically extracts labels that describe relations among entities. The algorithm –clusters similar entity pairs –underlying relations between entities are obtained from results of clustering.

3 Introduction Social networks for AI and the Semantic Web –trust estimation –ontology construction –end-user ontology Building social networks –extraction of social networks automatically from various sources of information. Flink : Web pages, e-mail messages, and publications Polyphonet [www06]

4 Introduction Explore underlying relations Most automatic extraction methods are superficial approach Co-occurrence analysis Non-profound assessment –Flink : provide a clue to the strength of relations –Polyphonet : defines four kinds of relations C5 Co-Author, Co-Lab, Co-Proj, Co-Conf

5 Related Work A supervised method –Need large annotated corpora –to gather the domain specific knowledge –a priori to define extracted relations Ontology population (Semantic annotation) –Pattern-based approaches –context-based approaches Web is highly heterogeneous and unstructured –In this paper context-based a bag-of-words of context [Turney, 2005]

6 Method - Concept (1/4) The social network was extracted according to co-occurrence of entities on the Web.

7 Method - Concept (2/4) Given entity pairs in the social network –discover relevant keyphrases to analyze the surrounding local context (Co-occur on the Web ) keyword extraction

8 Method - Concept (3/4) The keywords are ordered according to TF-IDF- based scoring

9 Method - Concept (4/4) Hypothesize: –the local contexts of entity pairs in the Web are similar, the entity pairs share a similar relation. –[Harris, 1968; Schutze, 1998]: words are similar to the extent that their contextual representations are similar. According to that hypothesis –the method clusters entity pairs according to the similarity of their collective contexts. – each cluster represents a different relation and each entity pair in a cluster is an instance of similar relation.

10 Method - Procedure

11 Method - Context Model and Similarity Calculation C i,j (n,m) = t 1,..., t N –A context model C i,j of an entity pair (e i, e j ) –N terms t 1,..., t N that are extracted from the context of an entity pair –m is the number of intervening terms between e i and e j –n is the number of words to the left and right of either entity. –a feature weight of t i : TF-IDF TF : term frequency of term t i in the contexts IDF : log(|C|/df(t i ))+1

12 Method - Clustering and Label Selection TFIDF-based cosine similarity Hierarchical agglomerative clustering –complete linkage –The similarity between the clusters CL 1, CL 2 is evaluated by considering the two most dissimilar elements With a cluster CL’s labels l 1,..., l n scored according to the term relevancy, an entity pair, e i and e j, that belongs to the CL can be regarded as holding the relations described by l 1,..., l n.

13 Experiment – 1/3 Test Data – 143 distinct entity pairs from a political social network pair of a politician and a geo-political entity – 421 entity pairs from a researcher network pair of Japanese AI researchers Context model of each entity pair –100 Web pages –NP and Noun by part-of-speeches (POS) –exclude stop words

14 Experiment – 2/3 Clustering –complete-linkage agglomerative five distinct clusters for the political social network twelve distinct clusters for the researcher network two human subjects –three or fewer possible labels for each pairs –a cluster label the most frequent term among the manually assigned relation labels of entity pairs in the cluster.

15 Experiment – 3/3

16 Evaluation For each cluster cl –EP cl,correct : manually assigned relation labels include the label of cluster cl –EP cl,total : the number of entity pairs in the cluster cl For each relation l –EP l,correct : the relation label l whose cluster label is l –EP l,total : the number of entity pairs have the relation label l

17 Evaluation

18 Conclusions Automatically extracting labels –relations between entities in social networks –Unsupervised and domain independent Utilizing the Web to obtain the collective contexts –Semantic Web –Web mining Future –other types of social networks –enriching social networks


Download ppt "Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National."

Similar presentations


Ads by Google