Capstone Project
NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip. Table 1 below gives a small sample of this data. There are several entries per second for four years. The raw trip data takes up about 116GB in text CSV format.
NYC Taxi DataSet
The data is organized as follows: Medallion (car ID). Hack license (driverID). Vender id Rate_code (taximeter rate). Store_and_fwd_flag (unknown attribute).
NYC Taxi DataSet Pickup datetime: start time of the trip, mm-dd-yyyy hh24:mm:ss EDT. Dropoff datetime: end time of the trip, mm-dd-yyyy hh24:mm:ss EDT. Passenger count: number of passengers on the trip, default value is one. Trip time in secs: trip time measured by the taximeter in seconds.
NYC Taxi DataSet Trip distance: trip distance measured by the taximeter in miles. Pickup_longitude and pickup_latitude: GPS coordinates at the start of the trip. Dropoff longitude and dropoff latitude: GPS coordinates at the end of the trip.
NYC Taxi DataSet Fare data is also available from A sample of the fare data is shown in Table 2 below. This dataset contains the following attributes: Medallion: car ID. Hack license: driverID. Vender id: Pickup datetime: start time of the trip, mm-dd-yyyy hh24:mm:ss EDT.
NYC Taxi DataSet Fare amount: the meter fare, it should include the Newark surcharge, in USD. Surcharge: Extra fees, such as rush hour and overnight surcharges, in USD. Mta tax: Metropolitan commuter transportation mobility tax, in USD. Tip amount: tip amount, in USD.
NYC Taxi DataSet Tolls amount: total price paid for tolls, summed across all tolls for the trip, in USD. Total amount: all charges that are presented to the passenger at time of fare payment (includes tip for non-cash trips), in USD.
NYC Taxi DataSet
Trajectory Data Query Model Existing query models of the trajectory data interested in searching and finding trajectories or trips with respect to a given range or point. (e.g. “find all objects within a given area (or at a given point) sometime during a given time interval” or “find the k-closest objects with respect to a given point at a given time interval”)
Trajectory Data Query Model The coordinate based queries: Point Queries: (e.g. find the location of specific object between 1:00pm-1:30pm). Region Queries: (e.g. find all trajectories or trips passed through R region between 1:00pm-1:30pm). K- Nearest Neighbor Queries: (e.g. find all trajectories or trips within 500m of a gas station between 1:00pm-1:30pm).
Trajectory Data Query Model The trajectory based queries: Topological Queries: (e.g. “When did vehicle X enters street Y most recently”). Navigational Queries: (e.g. “What is the current speed of vehicle X”).
A Study of New York City Taxi Trips From: Visual Exploration of Big Spatio-Temporal Urban Data: A Study of New York City Taxi Trips. Nivan Ferreira, Jorge Poco, Huy T. Vo, Juliana Freire, and Claudio T. Silva
For NYC DataSet: 2013 : – 2013: NYC TaxiVis Paper: