1 Clustering of location- based data Mohammad Rezaei May 2013.

Slides:

Advertisements

Similar presentations

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST

Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…

Advanced Piloting Cruise Plot.

Chapter 1 The Study of Body Function Image PowerPoint

Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 5 Author: Julia Richards and R. Scott Hawley.

1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.

1 Chapter 40 - Physiology and Pathophysiology of Diuretic Action Copyright © 2013 Elsevier Inc. All rights reserved.

By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.

Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.

and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Title Subtitle.

My Alphabet Book abcdefghijklm nopqrstuvwxyz.

Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.

2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt ShapesPatterns Counting Number.

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)

ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.

MULTIPLICATION EQUATIONS 1. SOLVE FOR X 3. WHAT EVER YOU DO TO ONE SIDE YOU HAVE TO DO TO THE OTHER 2. DIVIDE BY THE NUMBER IN FRONT OF THE VARIABLE.

SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION

MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

FACTORING Think Distributive property backwards Work down, Show all steps ax + ay = a(x + y)

FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.

Year 6 mental test 5 second questions

Year 6 mental test 10 second questions

Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.

Learning to show the remainder

IMAGE RESIZING & SEAMCARVING CS16: Introduction to Algorithms & Data Structures Thursday, January 23, 14 1.

Richmond House, Liverpool (1) 26 th January 2004.

BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.

1 of Audience Survey Results Larry D. Gustke, Ph.D. – October 5, 2013.

ABC Technology Project

1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)

© S Haughton more than 3?

© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.

1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.

Twenty Questions Subject: Twenty Questions

Chapter 5 Microsoft Excel 2007 Window

Squares and Square Root WALK. Solve each problem REVIEW:

YO-YO Leader Election Lijie Wang

© 2012 National Heart Foundation of Australia. Slide 2.

Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN

Understanding Generalist Practice, 5e, Kirst-Ashman/Hull

Chapter 5 Test Review Sections 5-1 through 5-4.

SIMOCODE-DP Software.

GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.

Addition 1’s to 20.

25 seconds left…...

Test B, 100 Subtraction Facts

Januar MDMDFSSMDMDFSSS

We will resume in: 25 Minutes.

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

Figure Essential Cell Biology (© Garland Science 2010)

A SMALL TRUTH TO MAKE LIFE 100%

1 Unit 1 Kinematics Chapter 1 Day

PSSA Preparation.

Immunobiology: The Immune System in Health & Disease Sixth Edition

1 PART 1 ILLUSTRATION OF DOCUMENTS  Brief introduction to the documents contained in the envelope  Detailed clarification of the documents content.

How Cells Obtain Energy from Food

Traktor- og motorlære Kapitel 1 1 Kopiering forbudt.

Presentation transcript:

1 Clustering of location- based data Mohammad Rezaei May 2013

Data mining and Clustering - Huge amount of location-based Data - Need for mechanisms to extract knowledge - Clustering as an important field in spatio- temporal data mining 2

Clustering 3

Some applications Routing Interesting places Recommendation of services Marketing management Users with same interests Visualization 4

Clustering Problems in Mopsi Clutter of markers on the map Similar services or photos in a list Categorization of services Distribution of users locations Timeline view of photos Clustering of events 5

Clutter of markers 6

Search results 7 Clustering

Photos 8

Users 9

Solutions Grid based clustering Distance based clustering 10

Google Maps version Using location in pixels for grid-base clustering - 22 zoom levels - 256*256 in zoom level 0 to * in zoom level *10 12 cells in the zoom level 21 with cell size(60,80) 11

Some issues - Photos are added or deleted dynamically - Querying for a certain time, certain user or according to photo description - Different zoom levels, moving map 12

Hierarchical Clustering on server 13

Hierarchical Clustering on server Individual clustering for different zoom levels Clustering of whole data How to extract clusters for a specific query? Are clusters for a lower zoom level can be derived from higher level? 14

Client side clustering - Query from server (Resulting N objects) - Take the zoom view Not too many cells - Taking objects in the zoom view and do clustering only for them (M objects) - It takes O(N) to find out the objects in the zoom view! 15

Grid based clustering Input location (lat, lon) of markers Width and height of markers (H m,W m ) Width and height of cells in the grid (H, W) Output Location of clusters 16 Location of the marker W H WmWm HmHm

Representation - Middle of cell -No overlap -Locations can be misleading 17

Representation- First object 18

Representation – Average Location 19

Proposed approach - Grids start from beginning of the whole map - Extend the grid in current zoom view By moving map clustersdo not change - Average location for representative By moving map clusters do not change 20 W H (x min, y min ) (x max, y max )

Algorithm 1. nRow = ceil((x max -x min )/W) 2. nColumn = ceil((y max -y min )/H) 3. nCell = nRow * nColumn 4. Clusters = all cells // empty clusters 5. For all the markers 6. row = floor((y-y min )/gridHeight) 7. column = floor((x-x min )/gridWidth) 8. cellNum = row*nColumn + column 9. Add the marker to Clusters[cellNum] 10. Update the cluster: Clusters[cellNum] 21 W H (x max, y max ) (x min, y min ) (x,y) Cell number 1820

Merging algorithm- Average location as representative 1. MergeClusters(clusters) 2. change the order of clusters descending according to the size of clusters 3. set parent of each cluster, the same cluster 4. k=1 (K is number of clusters) 5. while (k < K ) 6. if ( k is not processed ) 7. checkNeighbors(k); 8. mark the cluster k processed 9. k=k CheckNeighbors(k) 11. cluster1=clusters[k] 12. For all 8 neighbors 13. cluster2 = one of the neighbors // 14. if cluster2 is not an empty cell 15. checkNeighbor(cluster1, cluster2) 22

Merging algorithm 1. checkNeighbor(cluster1, cluster2) 2. find the distance d between the two clusters 3. if d<T // distance threshold T 4. while ( cluster2 is processed ) // means it has been merged 5. cluster2 = clusters[cluster2.parent] 6. MergeClusters(cluster1, cluster2); 1. MergeClusters(cluster1, cluster2) 2. n1 and n2: size of the clusters 3. (x1,y1) and (x2,y2): location of clusters 4. x=(n1*x1+n2*x2)/(n1+n2) 5. y=(n1*y1+n2*y2)/(n1+n2) 6. x1 x and y1 y 7. mark the second cluster processed 8. cluster2.parent = k 23

Grid based clustering Width and height of a cell H>H m and W>W m Minimum distance of the markers to avoid overlap 24 d WmWm HmHm Marker Location of marker

Distance based clustering Input location (lat, lon) of markers Width and height of markers (H m, W m ) Output location of clusters Time complexity: O(N 2 ) 25

Algorithm 1. i= 0; 2. While (i<N) // N=number of markers 3. if ( marker i is not clustered ) 4. Label marker i as clustered 5. Calculate distance (d j ) to other non-clustered markers 6. for all markers j 7. If d j <T // T: distance threshold 8. merge the markers i and j 9. Label marker j as clustered 10. i = i+1; 26

Timeline view of photos Displaying n photos in a limited space 27

Timeline view of photos Input Timestamps Number of clusters Output Partitions Algorithm K-means 28

Location clusters 29 Homes of users Shop Walking street Market place Swim hall Science park

Clustering of trajectories 30

Similarity or distance Start and end of the routes 31

Similarity or distance Speed, length, accelaration, time, etc km/h 72 km/h 50 km/h 30 km/h 60 km/h These two routes are more similar in speed than others

Similarity or distance Closeness of points and shape (Comparing whole route or segments of the routes) 33 t1 T1 t2 t3 t4 t5 t6 t7 t8 T2 t1 t2 t3 t4 t1 T1 t2 t3 t4 t5 t6 t7 t8 T2 t1 t2 t3 t4 Closest pair distance Sum of pair distance

Cluttering problem for routes 34