Example Data Sets Prior Research Join related objects to form independent compound objects, cluster normally (Yin et al., 2005). Use attribute-based distance.

Slides:



Advertisements
Similar presentations
Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia Joint work.
Advertisements

Learning Relational Probability Trees Jennifer Neville David Jensen Lisa Friedland Michael Hay Presented by Andrew Tjang.
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Relational Clustering for Entity Resolution Queries Indrajit Bhattacharya, Louis Licamele and Lise Getoor University of Maryland, College Park.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Modelling Relational Statistics With Bayes Nets School of Computing Science Simon Fraser University Vancouver, Canada Tianxiang Gao Yuke Zhu.
Recommender Systems Aalap Kohojkar Yang Liu Zhan Shi March 31, 2008.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Interactive Visualization of the Stock Market Graph Presented by Camilo Rostoker Department of Computer Science University of British.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Data Mining Adrian Tuhtan CS157A Section1.
Visualisation of Cluster Dynamics and Change Detection in Ubiquitous Data Stream Mining Authors Brett Gillick, Mohamed Medhat Gaber, Shonali Krishnaswamy,
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab (ECSS 4.416) The University of Texas at Dallas.
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab The University of Texas at Dallas.
Data Mining BS/MS Project Clustering for Market Segmentation Presentation by Mike Calder.
WPI Center for Research in Exploratory Data and Information Analysis From Data to Knowledge: Exploring Industrial, Scientific, and Commercial Databases.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining Techniques
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Data Mining GyuHyeon Choi. ‘80s  When the term began to be used  Within the research community.
Dimacs Graph Mining (via Similarity Measures) Ye Zhu Stephanie REU-DIMACS, July 17, 2009 Mentor : James Abello.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Anomaly detection with Bayesian networks Website: John Sandiford.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Structural.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County.
Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Network Community Behavior to Infer Human Activities.
Consensus Group Stable Feature Selection
Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA
Discovering Interesting Patterns for Investment Decision Making with GLOWER-A Genetic Learner Overlaid With Entropy Reduction Advisor : Dr. Hsu Graduate.
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.
CENG 770. Data mining (knowledge discovery from data) – Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful)
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
Task 7- Economic Data of the Middle East and the United States By: Ryan Papetti and Katie Fricker.
Learning Bayesian Networks for Complex Relational Data
Queensland University of Technology
Mining Utility Functions based on user ratings
Dr. Hongqin FAN Department of Building and Real Estate
Data Mining: Concepts and Techniques
Clustering of Web pages
Eick: Introduction Machine Learning
School of Computer Science & Engineering
An Enhanced Support Vector Machine Model for Intrusion Detection
Machine Learning University of Eastern Finland
Adrian Tuhtan CS157A Section1
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
CS7280: Special Topics in Data Mining Information/Social Networks
Community Distribution Outliers in Heterogeneous Information Networks
Adaptive entity resolution with human computation
Christoph F. Eick: A Gentle Introduction to Machine Learning
Statistical Relational AI
Presentation transcript:

Example Data Sets Prior Research Join related objects to form independent compound objects, cluster normally (Yin et al., 2005). Use attribute-based distance measures as weights in a relation graph; adapt a graph cutting algorithm to use edge weights (Neville et al., 2003). Probabilistic relational model with an adapted EM algorithm (Taskar et al., 2001). Calculate a hybrid metric that linearly combines relation similarity and attribute similarity, run single-link algorithm (Bhattacharya and Getoor, 2005) Open Problems in Relational Data Clustering University of Maryland Baltimore County Adam Anthony Marie desJardins Overview Data clustering is the task of detecting patterns in a set of data. Most algorithms take non-relational data as input and are sometimes unable to find significant patterns. Many data sets can include relational information, as well as independent object attributes. Relational data clustering techniques can help find strong patterns in such sets. Two areas of interest in relational data clustering are: clustering heterogeneous data, and relation selection. Feature Space A feature space is a set of objects with attributes, FS = {o 1, o 2, …, o n }, where o i = Internet Movie Database Attributes include personal data such as awards received, financial earnings, age, gender, or Hollywood stock exchange rating. Examples of relations are acted-in, directed, and sequel. CIA World Factbook Attribute values come from categories like government, economics, and population. Relations can be derived from sources such as common membership in international organizations. Relation Space A relation space is a set of relation graphs, RS = {RG 1, RG 2,..., RG K }, where RG i = {O i, R i }, O i  FS, and R i is a set of edges for a specific relation Heterogeneous Data It can be very difficult to compare different typed objects. For example, how can actors be compared to directors? One possibility is an inter-cluster relation signature. Relation Selection It is intuitive that, just as some features are not helpful for clustering a data set, some relations might provide little information for a relational clustering algorithm, or even harm the performance of an algorithm. As relational clustering algorithms continue to develop, detecting such graphs will become more important. Conclusion Early research in relational clustering has been successful. Analyzing relational patterns can help us develop methods for comparing heterogeneous data objects. Development of relation selection techniques will help improve existing relational clustering algorithms. 1.Cluster one set of homogeneous data. This is the reference clustering. 2. For each object, Create a vector that records the number of links from that object to each cluster discovered in step 1. This is the inter-cluster relation signature. 3. Cluster all objects based on the inter-cluster relation signatures. AU G-77 BotswanaKenya ThailandJapanChina AsDB US UKItaly G-8 G-77 UNSC This research funded by NSF grant # The graph on the right includes an additional relation graph (blue links) that represents the World Trade Organization, which fully connects all countries shown (redundant links omitted). Including the WTO as one of the relation graphs obscures the patterns that can be seen in the graph on the left, making a clustering harder to find. We find this situation to be similar to cases in the feature space where an attribute has the same value for all objects. Removing the WTO graph reduces the size of the total graph, and makes finding patterns easier. AU G- 77 Botswan a Kenya Japan AsDB Italy G-8 G-77 UNSC US Thailand UKChina Ron Howard Norman Jewison Carl Weathers Talia Shire directed acted-in directed Ron Howard Norman Jewison Carl Weathers Talia Shire directed acted-in Boxing ComedyDrama 1 Boxing 1 Comedy 1 Boxing 1 Drama References Bhattacharya, I., & Getoor, L. (2005). Entity resolution in graph data (Technical Report CS-TR-4758). University of Maryland. Neville, J., Adler, M., & Jensen, D. (2003). Clustering relational data using attribute and link information. Proceedings of the Text Mining and Link Analysis Workshop. Taskar, B., Segal, E., & Koller, D. (2001). Probabilistic classification and clustering in relational data. Proceeding of IJCAI-01, 17 th International Joint Conference on Artificial Intelligence (pp. 870–878). Seattle, US. Yin, X., Han, J., & Yu, P. S. (2005). Cross-relational clustering with user’s guidance. KDD ’05: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (pp. 344–353). New York, NY, USA: ACM Press.