Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yongli Zhang and Christoph F. Eick University of Houston, USA

Similar presentations


Presentation on theme: "Yongli Zhang and Christoph F. Eick University of Houston, USA"— Presentation transcript:

1 Yongli Zhang and Christoph F. Eick University of Houston, USA
ST-DCONTOUR—A Serial, Density-contour Based Spatio-temporal Clustering Approach to Cluster Location Streams Yongli Zhang and Christoph F. Eick University of Houston, USA 7th ACM SIGSPATIAL International Workshop on GeoStreaming (IWGS 2016) October 31st, 2016, San Francisco, California, USA Data Analysis and Intelligent Systems Lab

2 Data Analysis and Intelligent Systems Lab
Talk Outline Introduction ST-DCONTOUR Experimental Results Future Work Data Analysis and Intelligent Systems Lab

3 Data Analysis and Intelligent Systems Lab
1. Introduction With the development of remote sensors and sensor networks, different types of spatio-temporal datasets become increasingly available these days. Our works centers on developing spatio-temporal clustering and hotspot discovery algorithms that are capable of identifying dense regions in the location/time space. r Data Analysis and Intelligent Systems Lab

4 Data Analysis and Intelligent Systems Lab
Past Work Kulldorff et al. [1] introduce a spatial scan statistic for the detection of spatio-temporal cylinders where the point objects occur consistently for a significant period of time. Iyengar et al. [2] extend the basic scan statistics using the flexible square pyramid shape to detect clusters with restrictive shapes, and the proposed framework can model growth and shifts in location over time. Wang et al. [3] propose a spatiotemporal clustering algorithm ST-GRID which maps the spatial and temporal dimensions into multidimensional cells and then extract and merge spatio-temporal dense regions to obtain a final cluster. Birant et al. [4] propose ST-DBSCAN as an extension of DBSCAN for spatio-temporal clustering by introducing a second parameter of temporal neighborhood radius in addition to the spatial neighborhood radius. Data Analysis and Intelligent Systems Lab

5 Data Analysis and Intelligent Systems Lab
Motivation However, most spatio-temporal clustering algorithms, are not suitable to deal with large data streams, as they: pass over the data several times cannot deal with very large datasets use time and location in a parallel fashion For example, ST-DBSCAN [4] treats time and location in parallel, assume that the dataset fits into the main memory instead of coming in batches as a stream, and also they scan through dataset multiple times during clustering process. Data Analysis and Intelligent Systems Lab

6 Data Analysis and Intelligent Systems Lab
Talk Outline Introduction ST-DCONTOUR Experimental Results Future Work Data Analysis and Intelligent Systems Lab

7 2. ST-DCONTOUR ST-DCONTOUR—A serial approach:
Subdivides the incoming data into batches Generates spatial clusters for each batch first Next, spatio-temporal clusters are formed by identifying continuing relationships between spatial clusters in consecutive batches. Data Analysis and Intelligent Systems Lab

8 Data Analysis and Intelligent Systems Lab
ST-DCONTOUR cont. Inputs Point cloud stream; (e.g. taxi pickup location cloud streams that are described by the location using longitude and latitude, and the pickup time.) Data collection area; (e.g. New York metropolitan area.) Outputs: Spatio-temporal cluster which are graphs of related spatial clusters Data Analysis and Intelligent Systems Lab

9 The Three Phases of ST-DCONTOUR
1. Obtain spatial density function for spatial point cloud collected in each batch. 2. Identify spatial clusters for each batch as polygons that are created from density contour lines of the spatial density function. 3. Identify relationships between spatial clusters in consecutive batches, and construct spatio-temporal clusters as continuing spatial clusters in consecutive batches. Data Analysis and Intelligent Systems Lab Data Analytics and Artificial Systems Lab

10 ST-DCONTOUR Phase1 In our approach, we us EM to obtain a Gaussian Mixture Model (GMM) for spatial point clouds collected at each batch; the obtained GMM model serves as a spatial density function: Data Analysis and Intelligent Systems Lab Data Analytics and Artificial Systems Lab

11 ST-DCONTOUR Phase2 Grid the data collection area.
Calculate a probability density for all grid intersection points using the spatial density function in Eq. 1, and obtain a density matrix. Pass triples (longitude, latitude, density value) for each grid intersection point, along with a density threshold to contouring algorithm-CONREC [5], which returns a set of contour lines. Close open contour lines. Classify obtained contour lines into holes and clusters. Remove insignificant spatial clusters. Data Analysis and Intelligent Systems Lab Data Analytics and Artificial Systems Lab

12 Pseudo Code Phase2

13 Phase 3 ST-DCONTOUR We define an overlap matrix, for two cluster sets and obtained for every two consecutive batches i, i+1, with having a list of clusters (with cluster ), and X’ having a list of N clusters (with cluster ), establish a matrix , an entry of which is calculated as follows: If two spatial clusters at two consecutive batches have significant overlap, we conclude that the spatial cluster doesn’t change significantly over two consecutive batches, and create a 'continuing' relationship between the two clusters. Data Analysis and Intelligent Systems Lab Data Analytics and Artificial Systems Lab

14 Pseudo Code Phase3

15 Talk Outline Introduction and Motivation ST-DCONTOUR
Experimental Results Future Work Data Analysis and Intelligent Systems Lab Data Analytics and Artificial Systems Lab

16 3. Experimental Results Dataset: NYC taxi trip dataset [6]
Contain data for over 1.1 billion taxi trips from January 2009 through June 2016. Each individual trip record contains precise location coordinates from where the trip started and ended, timestamps when the trip started and ended, trip distance and fares. Data Analysis and Intelligent Systems Lab Data Analytics and Artificial Systems Lab

17 Taxi-Cab Results Data Analysis and Intelligent Systems Lab
We analyzed yellow cab pickup streams of 3 consecutive hours from 11pm January 6th to 2am January 7th (2016): Figure shows two spaio-temporal clusters that ST-DCONTOUR created: SC1 continues for 3 consecutive batches, and SC2 appears at batch 2 and continues for two batches. We conclude that midtown area of New York city is quite busy during midnight (11pm-2am) as far as taxi pickups are concerned, which means there are a lot of people hung around that area during that time, especially the Time Square area. A newly appearing cluster, such as SC2 located in the time square at batch 2, shows that people start leaving that area at midnight but do not require taxi cab services at an earlier time as there isn't a spatial cluster for the 11p to midnight batch. Data Analysis and Intelligent Systems Lab Data Analytics and Artificial Systems Lab

18 Talk Outline Introduction ST-DCONTOUR Experimental Results Future Work
Data Analysis and Intelligent Systems Lab Data Analytics and Artificial Systems Lab

19 4. Future Work Enable ST-DCONTOUR to support dynamic or adaptive batch sizes. Investigate semi-automatic and automatic parameter selection tools to facilitate the use of ST- DCONTOUR. Conduct a more thorough experimental evaluation. Replace the parametric GMM approach that does not work well at all by a non-parametric density estimation approach to obtain spatial density function . Data Analysis and Intelligent Systems Lab Data Analytics and Artificial Systems Lab

20 Future Work cont. Extend our approach to support multiple density thresholds. Data Analysis and Intelligent Systems Lab Data Analytics and Artificial Systems Lab

21 References Martin Kulldor. Spatial scan statistics: models, calculations, and applications. In Scan statistics and applications, pages 303{322. Springer, 1999. Vijay S Iyengar. On detecting space-time clusters. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 587{592. ACM, 2004. Min Wang, Aiping Wang, and Anbo Li. Mining spatial-temporal clusters from geo-databases. In International Conference on Advanced Data Mining and Applications, pages 263{270. Springer, 2006. Derya Birant and Alp Kut. St-dbscan: An algorithm for clustering spatial{temporal data. Data & Knowledge Engineering, 60(1):208{221, 2007. Paul D Bourke. A contouring subroutine. Byte, 12(6):143{150, 1987. record data.shtml, (accessed August 23, 2016). Data Analysis and Intelligent Systems Lab Data Analytics and Artificial Systems Lab

22 Data Analysis and Intelligent Systems Lab
Any Questions? Data Analysis and Intelligent Systems Lab


Download ppt "Yongli Zhang and Christoph F. Eick University of Houston, USA"

Similar presentations


Ads by Google