Download presentation
Presentation is loading. Please wait.
Published byGodwin Bishop Modified over 8 years ago
1
Density-based Place Clustering in Geo-Social Networks Jieming Shi, Nikos Mamoulis, Dingming Wu, David W. Cheung Department of Computer Science, The University of Hong Kong
2
Clustering Spatial clustering – grouping of spatial objects (geographic places in our case) into clusters Useful for marketing and urban planning Density based clustering divides a large collection of points into densely populated regions
3
DBSCAN algorithm DBSCAN is one of the most common data clustering algorithms – proposed in 1996 For each place p it finds all the places within the radius ε of p – eps-neighborhood. If the number of places in eps-neighborhood is no less than MinPts – p is called a core point -> it will form a cluster or will be a part of cluster Dense eps-neighborhoods are put into the same cluster if they contain the cores of each other
4
Example ε ε MinPts = 4 ε 1 finish 3 2 …
5
DBSCAN result example
6
Use of geo-social network data Current spatial clustering models disregard information about the people who are related to the clustered places. Social Network with geographic checkins includes: Users Friendship connections Checkins
7
Motivation Urban planning: land managers are interested in identifying regions with uniform demographic statistics (for example, areas where elderly people prefer to visit or areas with people that have in common special transportation or living needs) Data cleaning: nearby Geo-Social Network locations collected by user check-ins could belong to the same physical place Marketing: if two or more places belong to the same geo-social cluster, the user who likes one place will probably be interested to visit the others
8
users places friendship connections checkins
9
Example 1 Example 2
10
Density-based Clustering Places in Geo-Social Networks (DCPGS)
11
Input
12
DCPGS - Geo-social ε-neighborhood definition
13
DCPGS algorithm idea
14
Distance functions
15
Social distance
16
Alternative ways to compute social distance – (1) Jaccard
17
Alternative ways to compute social distance – (2) SimRank
19
Alternative ways to compute social distance – (3) Katz
20
Alternative ways to compute social distance – (4) Commute Time
21
Algorithms DCPGS-R and DCPGS-G
22
DCPGS-R: R-tree based The algorithm uses R-Tree to facilitate the search of geo-social ε-neighborhood for a given place For the sake of efficiency the social network is stored in a hash table – each pair of friends as an entry
25
Spatial query – uses R-tree The distance has already been computed Compute social and geo-social distance
26
DCPGS-G: Grid-based Individual R-tree based range queries find all the places within the radius maxD of the given geographic place in O(log n + ) which will be equal to O(log n) in most cases But when we have millions of places – we need to perform millions of such queries
27
DCPGS-G: Grid-based
32
Results
33
Visualization-based Analysys
36
Social Entropy based Evaluation
38
CommuteTime, and Katz have the lowest social entropy however, these methods produce small clusters and have too many outliers Jaccard also has low social entropy for the same reason DCPGS is better than SimRank Social Entropy based Evaluation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.