Density-based Place Clustering in Geo-Social Networks Jieming Shi, Nikos Mamoulis, Dingming Wu, David W. Cheung Department of Computer Science, The University.

Slides:



Advertisements
Similar presentations
P2PR-tree: An R-tree-based Spatial Index for P2P Environments ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA University of Tokyo.
Advertisements

Clustering (2). Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like.
Mining Frequent Spatio-temporal Sequential Patterns
Danzhou Liu Ee-Peng Lim Wee-Keong Ng
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Lecture outline Density-based clustering (DB-Scan) – Reference: Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu: A Density-Based Algorithm for.
Efficient Density-Based Clustering of Complex Objects Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle University of Munich Institute for Computer.
Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.
DBSCAN – Density-Based Spatial Clustering of Applications with Noise M.Ester, H.P.Kriegel, J.Sander and Xu. A density-based algorithm for discovering clusters.
Segmentation in color space using clustering Student: Yijian Yang Advisor: Longin Jan Latecki.
Clustering Prof. Navneet Goyal BITS, Pilani
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
Part II - Clustering© Prentice Hall1 Clustering Large DB Most clustering algorithms assume a large data structure which is memory resident. Most clustering.
Clustering Methods Professor: Dr. Mansouri
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Chapter 3: Cluster Analysis
Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,
Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
K-Means and DBSCAN Erik Zeitler Uppsala Database Laboratory.
Avatar Path Clustering in Networked Virtual Environments Jehn-Ruey Jiang, Ching-Chuan Huang, and Chung-Hsien Tsai Adaptive Computing and Networking Lab.
Cluster Analysis.
Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
On Discovering Moving Clusters in Spatio-temporal Data Panos Kalnis National University of Singapore Nikos Mamoulis University of Hong Kong Spiridon Bakiras.
© University of Minnesota Data Mining CSCI 8980 (Fall 2002) 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center.
SCAN: A Structural Clustering Algorithm for Networks
An Intelligent & Incremental Approach to kNN using R-trees DJ Oneil & Esten Rye (G01)
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
 Clustering of Web Documents Jinfeng Chen. Zhong Su, Qiang Yang, HongHiang Zhang, Xiaowei Xu and Yuhen Hu, Correlation- based Document Clustering using.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
Outlier Detection Lian Duan Management Sciences, UIOWA.
Density-Based Clustering Algorithms
RELAXED REVERSE NEAREST NEIGHBORS QUERIES Arif Hidayat Muhammad Aamir Cheema David Taniar.
Efficient Processing of Top-k Spatial Preference Queries
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
1 A System for Outlier Detection and Cluster Repair Ying Liu Dr. Sprague Oct 21, 2005.
DBSCAN Data Mining algorithm Dr Veljko Milutinović Milan Micić
Presented by Ho Wai Shing
Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters.
5/29/2008AI UEC in Japan Chapter 12 Clustering: Large Databases Written by Farial Shahnaz Presented by Zhao Xinyou Data Mining Technology.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.
Other Clustering Techniques
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.
Clustering By : Babu Ram Dawadi. 2 Clustering cluster is a collection of data objects, in which the objects similar to one another within the same cluster.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS (3) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
QED : An Efficient Framework for Temporal Region Query Processing Yi-Hong Chu 朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National.
DATA MINING Spatial Clustering
CSE 4705 Artificial Intelligence
Hierarchical Clustering: Time and Space requirements
Data Mining Soongsil University
Location-Aware Query Recommendation for Search Engines at Scale
CSE572, CBS598: Data Mining by H. Liu
CLUSTER BY: A NEW SQL EXTENSION FOR SPATIAL DATA AGGREGATION
CSE572, CBS572: Data Mining by H. Liu
CSE572: Data Mining by H. Liu
CS 685: Special Topics in Data Mining Jinze Liu
Presentation transcript:

Density-based Place Clustering in Geo-Social Networks Jieming Shi, Nikos Mamoulis, Dingming Wu, David W. Cheung Department of Computer Science, The University of Hong Kong

Clustering  Spatial clustering – grouping of spatial objects (geographic places in our case) into clusters  Useful for marketing and urban planning  Density based clustering divides a large collection of points into densely populated regions

DBSCAN algorithm  DBSCAN is one of the most common data clustering algorithms – proposed in 1996  For each place p it finds all the places within the radius ε of p – eps-neighborhood.  If the number of places in eps-neighborhood is no less than MinPts – p is called a core point -> it will form a cluster or will be a part of cluster  Dense eps-neighborhoods are put into the same cluster if they contain the cores of each other

Example ε ε MinPts = 4 ε 1 finish 3 2 …

DBSCAN result example

Use of geo-social network data  Current spatial clustering models disregard information about the people who are related to the clustered places.  Social Network with geographic checkins includes:  Users  Friendship connections  Checkins

Motivation  Urban planning: land managers are interested in identifying regions with uniform demographic statistics (for example, areas where elderly people prefer to visit or areas with people that have in common special transportation or living needs)  Data cleaning: nearby Geo-Social Network locations collected by user check-ins could belong to the same physical place  Marketing: if two or more places belong to the same geo-social cluster, the user who likes one place will probably be interested to visit the others

users places friendship connections checkins

Example 1 Example 2

Density-based Clustering Places in Geo-Social Networks (DCPGS)

Input

DCPGS - Geo-social ε-neighborhood definition

DCPGS algorithm idea

Distance functions

Social distance

Alternative ways to compute social distance – (1) Jaccard

Alternative ways to compute social distance – (2) SimRank

Alternative ways to compute social distance – (3) Katz

Alternative ways to compute social distance – (4) Commute Time

Algorithms DCPGS-R and DCPGS-G

DCPGS-R: R-tree based  The algorithm uses R-Tree to facilitate the search of geo-social ε-neighborhood for a given place  For the sake of efficiency the social network is stored in a hash table – each pair of friends as an entry

Spatial query – uses R-tree The distance has already been computed Compute social and geo-social distance

DCPGS-G: Grid-based  Individual R-tree based range queries find all the places within the radius maxD of the given geographic place in O(log n + ) which will be equal to O(log n) in most cases  But when we have millions of places – we need to perform millions of such queries

DCPGS-G: Grid-based

Results

Visualization-based Analysys

Social Entropy based Evaluation

 CommuteTime, and Katz have the lowest social entropy  however, these methods produce small clusters and have too many outliers  Jaccard also has low social entropy for the same reason  DCPGS is better than SimRank Social Entropy based Evaluation