Smarter Outlier Detection and Deeper Understanding of Large-Scale Taxi Trip Records: A Case Study of NYC Jianting Zhang Department of Computer Science.

Slides:



Advertisements
Similar presentations
Complexity Settlement Simulation using CA model and GIS (proposal) Kampanart Piyathamrongchai University College London Centre for Advanced Spatial Analysis.
Advertisements

Computer Network Topologies
Research Challenges in the CarTel Mobile Sensor System Samuel Madden Associate Professor, MIT.
ADAPTIVE FASTEST PATH COMPUTATION ON A ROAD NETWORK: A TRAFFIC MINING APPROACH Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, John Paul Sondag.
An Interactive-Voting Based Map Matching Algorithm
Urban Computing with Taxicabs
A Topological Interpretation for Mass Transit Network Connectivity July 8, 2006 Chulmin Jun, Seungjae Lee, Hyeyoung Kim & Seungil Lee The University of.
On Map-Matching Vehicle Tracking Data
Yu Stephanie Sun 1, Lei Xie 1, Qi Alfred Chen 2, Sanglu Lu 1, Daoxu Chen 1 1 State Key Laboratory for Novel Software Technology, Nanjing University, China.
Constructing Popular Routes from Uncertain Trajectories Authors of Paper: Ling-Yin Wei (National Chiao Tung University, Hsinchu) Yu Zheng (Microsoft Research.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
ACCURACY CHARACTERIZATION FOR METROPOLITAN-SCALE WI-FI LOCALIZATION Presented by Jack Li March 5, 2009.
T-Drive : Driving Directions Based on Taxi Trajectories Microsoft Research Asia University of North Texas Jing Yuan, Yu Zheng, Chengyang Zhang, Xing Xie,
TrafficView: A Scalable Traffic Monitoring System Tamer Nadeem, Sasan Dashtinezhad, Chunyuan Liao, Liviu Iftode* Department of Computer Science University.
Route Planning Vehicle navigation systems, Dijkstra’s algorithm, bidirectional search, transit-node routing.
California Car License Plate Recognition System ZhengHui Hu Advisor: Dr. Kang.
TrafficView: A Driver Assistant Device for Traffic Monitoring based on Car-to-Car Communication Sasan Dashtinezhad, Tamer Nadeem Department of CS, University.
Exploration of Ground Truth from Raw GPS Data National University of Defense Technology & Hong Kong University of Science and Technology Exploration of.
Congestion Mitigation: Options for Evaluation New York City Traffic Congestion Mitigation Commission January 10, 2008.
Use of Truck GPS Data for Travel Model Improvements Talking Freight Seminar April 21, 2010.
Vetri Venthan Elango Dr. Randall Guensler School of Civil and Environmental Engineering Georgia Institute of Technology On Road Vehicle Activity GPS Data.
GIS Techniques and Algorithms to Automate the Processing of GPS- Derived Travel Survey Data Praprut Songchitruksa, Ph.D., P.E. Mark Ojah Texas A&M Transportation.
SCATTER-SELMA joint workshop, Brussels, 8 June 2004 Testing potential solutions to control urban sprawl The Brussels case city.
Reducing Uncertainty of Low-sampling-rate Trajectories Kai Zheng, Yu Zheng, Xing Xie, Xiaofang Zhou University of Queensland & Microsoft Research Asia.
Group 6: Paul Antonios, Tamara Dabbas, Justin Fung, Adib Ghawi, Nazli Guran, Donald McKinnon, Alara Tascioglu Quantitative Capacity Building for Emergency.
An Empirical Comparison of Microscopic and Mesoscopic Traffic Simulation Paradigms Ramachandran Balakrishna Daniel Morgan Qi Yang Caliper Corporation 14.
Source: NHI course on Travel Demand Forecasting (152054A) Session 10 Traffic (Trip) Assignment Trip Generation Trip Distribution Transit Estimation & Mode.
Paul Roberts – TIF Technical Manager Presentation to the TPS – 3 June 2009.
U 2 SOD-DB: A Database System to Manage Large-Scale Ubiquitous Urban Sensing Origin-Destination Data Jianting Zhang 134 Hongmian Gong 234 Camille Kamga.
CrowdAtlas: Self-Updating Maps for Cloud and Personal Use Mike Lin.
Multimodal Analysis Using Network Analyst. Outline Summarizing accessibility Summarizing accessibility Adding transportation modes to a network Adding.
Geography and CS Philip Chan. How do I get there? Navigation Which web sites can give you turn-by-turn directions?
1 Pertemuan 20 Teknik Routing Matakuliah: H0174/Jaringan Komputer Tahun: 2006 Versi: 1/0.
HERO: Online Real-time Vehicle Tracking in Shanghai Xuejia Lu 11/17/2008.
Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.
Some network flow problems in urban road networks Michael Zhang Civil and Environmental Engineering University of California Davis.
Implementing Codesign in Xilinx Virtex II Pro Betim Çiço, Hergys Rexha Department of Informatics Engineering Faculty of Information Technologies Polytechnic.
Presenter: Mathias Jahnke Authors: M. Zhang, M. Mustafa, F. Schimandl*, and L. Meng Department of Cartography, TU München *Chair of Traffic Engineering.
Small-Scale and Large-Scale Routing in Vehicular Ad Hoc Networks Wenjing Wang 1, Fei Xie 2 and Mainak Chatterjee 1 1 School of Electrical Engineering and.
Efficient Route Computation on Road Networks Based on Hierarchical Communities Qing Song, Xiaofan Wang Department of Automation, Shanghai Jiao Tong University,
Everyday Mapping of Traffic Conditions - An Urban Planning Tool Laboratory of Geodesy Aristotle University of Thessaloniki, Department of Civil Engineering.
Analytical Queries on Road Networks: An Experimental Evaluation of Two System Architectures Shangfu PengHanan Samet Department.
1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery.
Maze Routing Algorithms with Exact Matching Constraints for Analog and Mixed Signal Designs M. M. Ozdal and R. F. Hentschke Intel Corporation ICCAD 2012.
CCR = Connectivity Residue Ratio = Pr. [ node pair connected by an edge are together in a common page on computer disk drive.] “U of M Scientists were.
Multimodal Analysis Using Network Analyst. Outline Summarizing accessibility Summarizing accessibility Adding transportation modes to a network Adding.
A Tour-Based Urban Freight Transportation Model Based on Entropy Maximization Qian Wang, Assistant Professor Department of Civil, Structural and Environmental.
© 2008 Frans Ekman Mobility Models for Mobile Ad Hoc Network Simulations Frans Ekman Supervisor: Jörg Ott Instructor: Jouni Karvo.
Huiming Yin, P.E., PhD Liang Wang Paul Maurin Heqin Xu, P.E., PhD Dept. of Civil Engineering & Engineering Mechanics Columbia University Jan 16, 2012 Dynamic.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Exploring Human Mobility with Multi- Source Data at Extremely Large Metropolitan Scales Authors: Zhang, Zhang, Huang, Xu, Li, He University of Minnesota,
Generated Trips and their Implications for Transport Modelling using EMME/2 Marwan AL-Azzawi Senior Transport Planner PDC Consultants, UK Also at Napier.
1 Travel Times from Mobile Sensors Ram Rajagopal, Raffi Sevlian and Pravin Varaiya University of California, Berkeley Singapore Road Traffic Control TexPoint.
Mapping of Traffic Conditions at Downtown Thessaloniki with the Help of GPS Technology P. D. Savvaidis and K. Lakakis Aristotle University of Thessaloniki,
Database Laboratory TaeHoon Kim. /18 Work Progress.
Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
CBD Characteristics You will need to be able to describe and where appropriate explain the main characteristics of the CBD. Where possible always try and.
ParkNet: Drive-by Sensing of Road-Side Parking Statistics Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin,
Privacy Vulnerability of Published Anonymous Mobility Traces Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University) Nageswara S. V. Rao (Oak.
Network Analyst. Network A network is a system of linear features that has the appropriate attributes for the flow of objects. A network is typically.
Lessons learned from Metro Vancouver
T-Share: A Large-Scale Dynamic Taxi Ridesharing Service
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Yi Wu 9/17/2018.
Jianting Zhang Department of Computer Science
Predicting Traffic Dmitriy Bespalov.
Impacts of Reducing Freeway Shockwaves on Fuel Consumption and Emissions Meng Wang, Winnie Daamen, Serge Hoogendoorn, Bart van Arem Department.
High-Performance Analytics on Large-Scale GPS Taxi Trip Records in NYC
Outline Summary an Future Work Introduction
Prototyping A Web-based High-Performance Visual Analytics Platform for Origin-Destination Data: A Case study of NYC Taxi Trip Records Jianting Zhang1,2.
Presentation transcript:

Smarter Outlier Detection and Deeper Understanding of Large-Scale Taxi Trip Records: A Case Study of NYC Jianting Zhang Department of Computer Science The City College of New York

Outline Introduction Background and Related Work Method and Discussions Experiments and Results Summary

Introduction 3 Taxi trip records ~300 million trips in about two years ~170 million trips (300 million passengers) in /5 of that of subway riders and 1/3 of that of bus riders in NYC The dataset is not perfect... 13,000 Medallion taxi cabs License priced at $600, 000 in 2007 Car services and taxi services are separate Only taxis with Medallion license are for hail (the rule could be under changing outside Manhattan...)

Introduction Medallion# Shift# Trip# Trip_Pickup_DateTime Trip_Dropoff_DateTime Trip_Pickup_Location Trip_Dropoff_Location Start_Lon Start_Lat End_Lon End_Lat Payment_Type Surcharge Total_Amt Rate_Code Passenger_Count Fare_Amt Tolls_Amt Tip_Amt Trip_Time Trip_Distance vendor_name date_loaded store_and_forward time_between_service distance_between_service Start_Zip_Code End_Zip_Code start_x start_y end_x end_y (local projection) Meshed up on purpose due to privacy concerns

Introduction In addition: –Some of the data fields are empty –Pickup and drop-off locations can be in Hudson River –The recorded trip distance/duration can be unreasonable –... Outlier detections for data cleaning are needed Mission can be easier to handle 170 million trips with the help of U 2 SOD-DB

Background and Related Work Existing approaches for outlier detection for urban computing Thresholding: e.g. 200m < dist < 30km Locating in unusual ranges of distributions Spatial analysis: within a region or a land use type Matching trajectory with road segments – treat unmatched ones as outliers Some techniques require complete GPS traces while we only have O-D locations Large-scale Shortest path computing has not been used for outlier detection

Background and Related Work Shortest path computation –Dijkstra and A* –New generation algorithms –Contraction Hierarchy (CH) based Open source implementations of CH: MoNav OSRM Much faster than ArcGIS NA module

Background and Related Work Network Centrality (Brandes, 2008) Node based Edge based Can be easily derived after shortest paths are computed Mapping node/edge between centrality can reveal the connection strengths among different parts of cities

Method and Discussions Raw Taxi trip data Match pickup/drop-off point locations to street segments within Distance D 0 CD >D 1 AND CD>W*RD? Assign pickup/drop-off nodes by picking closer ones Type I outlier (spatial analysis) Compute shortest path CD: Compute shortest distance RD: Recorded trip distance Type II outlier (network analysis) Successful? Aggregate unique (sid,tid) pairs Update centrality measurements

Method and Discussions The approach is approximate in nature –Taxi drivers do not always follow shortest path –Especially for short trips and heavily congested areas –But we only care about aggregated centrality measurements and the errors have a chance to be cancelled out by each other Increasing D 0 will reduce # of type I outliners, but the locations might be mismatched with segments Reducing D 1 and/or W will increase # of type II outliers but may generate false positives.

Experiments and Results Over all distributions of trip distance, time, speed and fare

Experiments and Results Mapping of Computed Shortest Paths Overlaid with NYC Community Districts Map D0=200 feet, D1=3 miles, W=2 166 million trips, 25 million unique ~2.5 millions (1.5%) type I outliers ~ 18,000 type II outliers Shortest path computation completes in less than 2 hours (5,952 seconds) on a single CPU core (2.26 GHZ)

Experiments and Results Examples of Detected Type II Outliers

Experiments and Results Mapping Betweenness Centralities (All hours)

Experiments and Results 00H02H04H 06H08H 10H 12H14H16H 18H 20H22H Legend: Mapping Betweenness Centralities (bi-hourly)

Summary Large-scale taxi trip records are error-prone due to a combination of device, human and information system induced errors – outlier detection and data cleaning are important preprocess steps. Our approach detects outliers that can not be snapped to street segments (through spatial analysis) and/or have significant differences between computed shortest distances and recorded trip distances (network analysis) The work is preliminary - a more comprehensive framework is needed (e.g., incorporating pickup and drop-off times, trip duration and fare information) It would be interesting to generate dynamics of betweenness maps at different traffic conditions, e.g., peak/non-peak, morning/afternoon and weekdays/weekends, and explore connection strengths among NYC regions.

Q&A