Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Tutorial 6 & 7 Symbol Table
Basic Data Mining Techniques
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Distributed Representations of Sentences and Documents
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Chapter 5 Data mining : A Closer Look.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
Data Mining Techniques
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
Presented by Tienwei Tsai July, 2005
1 A Graph-Theoretic Approach to Webpage Segmentation Deepayan Chakrabarti Ravi Kumar
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
A Language Independent Method for Question Classification COLING 2004.
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Algorithmic Detection of Semantic Similarity WWW 2005.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Clustering.
Prepared by: Mahmoud Rafeek Al-Farra
Slides for “Data Mining” by I. H. Witten and E. Frank.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Evaluation of DBMiner By: Shu LIN Calin ANTON. Outline  Importing and managing data source  Data mining modules Summarizer Associator Classifier Predictor.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
EUROCONTROL EXPERIMENTAL CENTRE1 / 29/06/2016  Raphaël CHRISTIEN  Network Capacity & Demand Management  5 th USA/Europe ATM 2003 R&D seminar  23 rd.
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Constrained Clustering -Semi Supervised Clustering-
Machine Learning with Weka
Topic Oriented Semi-supervised Document Clustering
Block Matching for Ontologies
Leverage Consensus Partition for Domain-Specific Entity Coreference
Text Categorization Berlin Chen 2003 Reference:
Hierarchical, Perceptron-like Learning for OBIE
Presentation transcript:

Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea 、 Waleed Aljandal 、 Wiliam H. Hsu Advisor: Dr. Jai-Ling Koh Speaker: Shun-hong Sie

Outline Introduction Background Hierarchical Agglomerative and Divisive Clustering Evaluation of the ontology through learning Experimental & Result Conclusion

Introduction Users’ interests can be grouped in a concept hierarchy that makes explicit the implicit relationships between various interest concepts, thus helping in the process of data understanding and analysis. Only from the common interests of two users are ineffective at predicting friend relationships.

Background Clustering is an unsupervised learning task that can be defined as the process of partitioning data into groups, corresponding to higher level concepts, of similar entities. A hierarchical clustering algorithm builds a tree of clusters by successively grouping the closest cluster pairs, until no further grouping is possible.

Background (Cont.) Kim [22] introduces a top-down divisive hierarchical clustering method to recursively divide parent clusters into child clusters until a terminating criterion is met. Godoy’s[14] approach requires each individual interest “entering” into the clusters below the root to be matched against the associated cluster concepts.

Background (Cont.) These two papers consider a bag of words (BoW) approach to describe instances (web pages), and thus each instance is represented as a fixed length array. The approach in [14] does not allow instances to belong to several concepts. The incremental nature of the unsupervised learning approach used in [14] is not appropriate for our social network data. A cluster is cohesive enough to produce a concept out of it, but the concept cannot be assigned descriptive terms.

Hierarchical Agglomerative and Divisive Clustering A hybrid clustering algorithm. Designed to make the ontology extraction process as fast as possible and at the same time to produce a sensible and useful ontology.

HAD steps Obtaining Interest Definitions. Divisive Clustering. Agglomerative Clustering.

Obtaining Interest Definitions Fetch data from WordNet, IMDB and Amazon(AWS). WordNet – character, n, reference| character| formal|describing| qualifications| dependability... – character, n, grapheme| graphic| symbol| ritten|symbol| used| represent| speech... – character, n, genetics| functional| determined|gene| group| genes... – character, v, engrave| inscribe| characters... IMDB – character, reality tv, reality| film| fantasy|history| character| movie| dream| rocky... AWS – character, novel, butcher| covey| davenport|detective| dresden| effective| favor| files...

Divisive Clustering Dividing the data into several clusters before applying the clustering algorithm. – We can apply the algorithm in parallel to these clusters, resulting in faster ontology construction. – The source from which the definitions of an interest are obtained can inform us about other concepts that this interest can be associated with.

Agglomerative Clustering Bottom-up fashion and takes as input the set of instances in a particular cluster. Cluster

Visual Inspection of the Ontology Visual inspection of an ontology can provide useful insights about how various instances have been organized into concepts that capture the semantic knowledge in a domain. Open source tool “Cytoscape”.

Ontology of terms related to “networking”

Evaluation of the ontology through learning Interest-based features – User interests themselves can be indicative of user friendships and could be used as interest-based nominal features. – If two users have many interests in common, then it is possible that they are friends, regardless of what exactly those interests are. Graph-based features – Such as in-degree of users, out-degree of users, forward deleted distances, etc.

Evaluation of the ontology through learning Numerical interest based measures using Association Rules.

Experimental & Result In each experiment, the training and test data sets are independent and each consists of 1000 user pairs. The training set contains approximately 50% friend pairs and 50% non-friend pairs, while the test set contains user pairs selected randomly from the original distribution. Any overlap in terms of users is removed. Assumption: Two users are not friends if there is no direct link between them in the network graph.

WEKA’s classifiers J48 decision trees Support vector machines(build logistic model option enabled) Random forests. Logistic regression(Logistic) One-attribute-rule(OneR)

Result

ROC curves for different classifiers when using graph- based and interest-based numerical features, derived using the ontology, to predict friends

ROC curves for SVM when using different sets of attributes to predict friend

Conclusion HAD does not cluster users and their interests together, and uses the semantic information captured in the ontology to address the problem of predicting friend relationships. The second subtask of social network discovery i.e. visual mining of the underlying network has also received much attention in the recent research on social networks.

Conclusion The HAD algorithm, proposed in this paper, constructs a concept hierarchy of interests of users, enabling classification algorithms to use semantic information in the link prediction. The algorithm is implemented by combining hierarchical agglomerative and divisive clustering approaches to overcome the shortcomings of other related approaches. A network (social or otherwise) does not always have well defined links among nodes. Predicting links using the concept hierarchies can help in completing the network, which can then be used for other problems.