The latte Stream-Archive Query Project - Exploring Stream+Archive Data in Intelligent Transportation Systems Jin Li (with Kristin Tufte, Vassilis Papadimos,

Slides:



Advertisements
Similar presentations
Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Advertisements

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Abstract Travel time based performance measures are widely used for transportation systems and particularly freeways. However, it has become evident that.
1 NATMEC 2008 Christopher Monsere Kristin Tufte, Robert L. Bertini, and Soyoung Ahn Intelligent Transportation Systems Laboratory Maseeh College of Engineering.
SWiM QoS Lessons from Multimedia David Maier OGI School of Science & Engineering Oregon Health & Science University.
Windows in Niagara Jin (Jenny) Li, David Maier, Vassilis Papadimos, Peter Tucker, Kristin Tufte.
Experience Implementing PORTAL: Portland Transportation Archive Listing Andrew M. Byrd Andy Delcambre Steve Hansen Portland State University TransNow 2.
February 9, 2006TransNow Student Conference Using Ground Truth Geospatial Data to Validate Advanced Traveler Information Systems Freeway Travel Time Messages.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Implementing the ITS Archive Data User Service in Portland, Oregon Robert L. Bertini Andrew M. Byrd Thareth Yin Portland State University IEEE 7 th Annual.
DATA ADDRESS PREDICTION Zohair Hyder Armando Solar-Lezama CS252 – Fall 2003.
Semantics and Evaluation Techniques for Window Aggregates in Data Stream Jin Li, David Maier, Kristin Tufte, Vassillis Papadimos, Peter Tucker. Presented.
Month XX, 2004 Dr. Robert Bertini Using Archived Data to Measure Operational Benefits of ITS Investments: Ramp Meters Oregon Department of Transportation.
Improving Travel Time-Delay Functions for the Highway 217 Corridor Study Using a Regional ITS Data Archive Robert L. Bertini, Ph.D, P.E. Portland State.
Congestion Management Innovations in Oregon Christopher Monsere Assistant Professor Portland State University Civil and Environmental Engineering Director,
Portland State University Autoscope for Dummies What can it do? What is it? The Autoscope consists of a hardware unit and accompanying software. The software.
State of the Practice for ITS Data Archiving: Regional Data Archive and Performance Measures Robert L. Bertini Portland State University North American.
Estimating Traffic Flow Rate on Freeways from Probe Vehicle Data and Fundamental Diagram Khairul Anuar (PhD Candidate) Dr. Filmon Habtemichael Dr. Mecit.
June 2006 ITE District 6 Annual Meeting June Evaluation of Single-Loop Detector Vehicle-Classification Algorithms using an Archived Data User.
Border Data Warehouse. Vancouver, BC Bellingham, WA The Cascade Gateway.
1 Combined Arterial Performance Status Report Intelligent Transportation Systems Laboratory Maseeh College of Engineering and Computer Science Portland.
Accuracy in Real-Time Estimation of Travel Times Galen McGill, Kristin Tufte, Josh Crain, Priya Chavan, Enas Fayed 15 th World Congress on Intelligent.
NATMEC June 5, 2006 Comparative Analysis Of Various Travel Time Estimation Algorithms To Ground Truth Data Using Archived Data Christopher M. Monsere Research.
Abstract Transportation sustainability is of increasing concern to professionals and the public. This project describes the modeling and calculation of.
Abstract The Portland Oregon Regional Transportation Archive Listing (PORTAL) is the official intelligent transportation systems data archive for the Portland.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
2007 ITE District 6 Annual Meeting July 17, 2007 Sirisha Kothuri Kristin Tufte Robert L. Bertini PSU Hau Hagedorn OTREC Dean Deeter Athey Creek Consultants.
Abstract The Portland Oregon Transportation Archive Listing (PORTAL) archives high resolution traffic data including speed, volume, and occupancy collected.
Abstract Transportation sustainability is of increasing concern to professionals and the public. This project describes modeling, calculation, and necessary.
Travel Time and Reliability: Is Data Quality a Showstopper? The Georgia Navigator Experience Angshuman Guin URS Corporation
Assessment and Refinement of Real-Time Travel Time Algorithms for Use in Practice November 8 th, 2007.
Robert L. Bertini Sirisha M. Kothuri Kristin A. Tufte Portland State University Soyoung Ahn Arizona State University 9th International IEEE Conference.
Data Streams: Lecture 101 Window Aggregates in NiagaraST Kristin Tufte, Jin Li Thanks to the NiagaraST PSU.
January 23, 2006Transportation Research Board 85 th Annual Meeting Using Ground Truth Geospatial Data to Validate Advanced Traveler Information Systems.
Assessment and Refinement of Real-Time Travel Time Algorithms for Use in Practice April 11, 2007.
Comparative Analysis Of Various Travel Time Estimation Algorithms To Ground Truth Data Using Archived Data Christopher M. Monsere Research Assistant Professor.
GSLPI: a Cost-based Query Progress Indicator
Network Computing Laboratory A programming framework for Stream Synthesizing Service.
1 Using Automatic Vehicle Location Data to Determine Detector Placement Robert L. Bertini, Christopher Monsere, Michael Wolfe and Mathew Berkow Portland.
Using Signal Systems Data and Buses as Probes to Create Arterial Performance Measures Mathew Berkow, Michael Wolfe, John Chee, Robert Bertini,
Portland Oregon Regional Transportation Archive Listing Intelligent Transportation Systems Lab Department of Civil and Environmental Engineering Maseeh.
1 Techniques for Validating an Automatic Bottleneck Detection Tool Using Archived Freeway Sensor Data Jerzy Wieczorek, Rafael J. Fernández-Moctezuma, and.
1 NATMEC 2008 Christopher Monsere Robert L. Bertini, Mathew Berkow, and Michael Wolfe Intelligent Transportation Systems Laboratory Maseeh College of Engineering.
1 Lessons From Developing an Archived Data User Service: Who Is Using It? Lessons from Developing an Archived Data User Service in Portland, Oregon: Who.
1 Using Archived ITS Data to Automatically Identify Freeway Bottlenecks in Portland, Oregon Robert L. Bertini, Rafael J. Fernández-Moctezuma, Jerzy Wieczorek,
PORTAL Data Quality & Aggregation Sue Ahn, Kristin Tufte.
Abstract The value of creating an ITS data archive is somewhat undisputed, and a number exist in states and major metropolitan regions in North America.
July 13, 2005ITE District 6 Annual Meeting Using Ground Truth Geospatial Data to Validate Advanced Traveler Information Systems Freeway Travel Time Messages.
1 Intelligent Transportation Systems: Saving Lives, Time and Money PORTAL: Transportation Data Archive Intelligent Transportation Systems Laboratory Deena.
Building a WIM Data Archive for Improved Modeling, Design, and Rating Christopher Monsere Assistant Professor, Portland State University Andrew Nichols.
PORTAL: An On-Line Regional Transportation Data Archive with Transportation Systems Management Applications Casey Nolan Portland State University CUPUM.
1 The Effects of Weather on Freeway Traffic Flow Meead Saberi K. Priya Chavan Robert L. Bertini Kristin Tufte Alex Bigazzi 2009 ITE Quad Conference, Vancouver,
Using Archived Data to Measure Operational Benefits of a System-wide Adaptive Ramp Metering (SWARM) System Data Collection Plan / Experimental Design May.
Interactive Data Exploration Using Semantic Windows Alexander Kalinin Ugur Cetintemel, Stan Zdonik.
ITE District 6 June 27, 2006 Incorporating Incident Data into a Freeway Data Archive for Improved Performance Measurement ITE District 6 June 27, 2006.
1 Querying the Physical World Son, In Keun Lim, Yong Hun.
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
1 TRB 88 th Annual Meeting January 12, 2009 – TRB 88 th Annual Meeting Mathew Berkow, Robert L. Bertini, Christopher Monsere, Michael Wolfe, Portland State.
ITE District 6 June 27, 2006 Incorporating Incident Data into a Freeway Data Archive for Improved Performance Measurement ITE District 6 June 27, 2006.
1 Bottleneck Identification and Forecasting in Traveler Information Systems Robert L. Bertini, Rafael Fernández-Moctezuma, Huan Li, Jerzy Wieczorek, Portland.
1 Toward Implementing Weather- Responsive Advanced Traffic Management and Information Systems Meead Saberi K. Robert L. Bertini Alex Bigazzi 2009 ITS World.
Experience Implementing PORTAL: Portland Transportation Archive Listing Robert L. Bertini Steven Hansen Andy Rodriguez Portland State University Traffic.
Abstract Dynamic Message Signs (DMS) on freeways are used to provide a variety of information to motorists including incident and construction information,
PORTAL: Portland Transportation Archive Listing Improving Travel Demand Forecasting Conclusion Introduction Metro is working closely with PSU researchers.
Toward Understanding and Reducing Errors in Real-Time Estimation of Travel Times Sirisha Kothuri, Kristin Tufte, Enas Fayed, Josh Crain, Robert L. Bertini.
Partial Query-Evaluation in Internet Query Engines Jayavel Shanmugasundaram Kristin Tufte David DeWitt David Maier Jeffrey Naughton University of Wisconsin.
1 Out of Order Processing for Stream Query Evaluation Jin Li (Portland State Universtiy) Joint work with Theodore Johnson, Vladislav Shkapenyuk, David.
Krista Nordback, Ph.D., P.E., Kristin Tufte, Ph.D.
Applying Control Theory to Stream Processing Systems
Using Ground Truth Geospatial Data to Validate Advanced Traveler Information Systems Freeway Travel Time Messages CTS Transportation Seminar Series, January.
Presentation transcript:

The latte Stream-Archive Query Project - Exploring Stream+Archive Data in Intelligent Transportation Systems Jin Li (with Kristin Tufte, Vassilis Papadimos, David Maier, and Robert L. Bertini) Summer, 2007 Funded by the National Science Foundation

2 Why? Stream queries provide us current, real- time state of a system However, you might also want to know  Is today’s (vehicle) traffic better than average or worse than average?  Are loop sensors on I-5 NB go wrong? Requires combining the live stream data with data from the archive  In particular, compare with ‘similar’ data from the archive

3 latte architecture NiagaraST/ latte PORTAL Archive (PostgreSQL) stream operators Live Stream (ODOT) ……

4 Outline Introduction/Background on latte Retrieving archive data latte query evaluation Demo

5 NiagaraST – Stream Query Engine Stream query processing system  Extended from the Niagara query engine  Joint work with UW-Madison  Supports XML input Support for stream queries  Window query semantics and evaluation - formally define window  Data semantics - punctuation  Out of order processing  Handling data skew scan select (I-5 mp 297.5) window avg(speed)

6 PORTAL – Data Archive PORTAL: Portland Oregon Regional Transportation Archive Listing Official transportation archive for Portland Metropolitan Region Relational database archive (PostgreSql) PORTAL receives a stream of live data from ODOT  Every 20 seconds receive: speed, volume, occupancy, status from 500 loop detectors in the Portland area, archived since July 2004  Data is provided in XML format  Every 20 seconds scripts parse the data and insert it into the database  Aggregations are performed every 5 minutes and overnight

7 PORTAL – Data Archive (Cont.) Loop Detector Data 20 s count, lane occupancy, speed from 500 detectors – archived since July 2004 Incident Data >155,000 since 1999 Bus AVL Data Under Development DMS Data 19 VMS since 1999 Data Archive Archived loop detector data since July 2004 About 3 million records/day About 500GB, almost 3 billion rows in database

8 Data Quality Issues in ITS Data quality is a big issue (about 20% dirty data) Missing data  Communication Failure  Construction  Cabinet Damage Corrupted Data  Detectors degrade over time  Calibration errors (loop spacing)  Physical incidents (equipment parking) ITS expert provides constraints on data values  e.g. low speed and low volume are likely to be wrong Trying to fill in the data gap intelligently

9 Outline Introduction/Background on latte Retrieving archive data latte query evaluation Demo

10 latte architecture NiagaraST/ latte PORTAL Archive (PostgreSQL) stream operators Live Stream (ODOT) …… For each piece of stream data, we retrieve ‘similar’ data from archive What does ‘similar’ mean?

11 What does similar mean? Application dependent  No standard User defined  Preprocessing can be hard or infeasible Similarity definitions for ITS  Compare today’s data to data from five previous weekdays But …Friday traffic is very different from Wednesday traffic  Compare today to five previous Wednesdays What about weather?  Better: Compare today to five previous Wednesdays where the weather (rainfall) is similar to today’s weather  Hard: Compare to days with traffic conditions (speed, volume) similar to today’s conditions

12 Retrieving Similar Data Expect database retrieval to be the bottleneck Reactive: query per tuple/event  May perform poorly Continuity of similarity definitions  If measurements at time A and time B are similar, measurements at time A+1 and time B+1 tend to be similar too Time is continuous weather, speed, vehicle position NiagaraST/ latte PORTAL Archive (PostgreSQL) stream operators Live Stream (ODOT) ……

13 Retrieving Similar Data (Cont.) Predictive: prefetching based on similarity definition (latte)  Fetch database data too early – requires buffering  Fetch data too late – stream will have to stall  Dynamically adapt to database load NiagaraST/ latte PORTAL Archive (PostgreSQL) stream operators Live Stream (ODOT) ……

14 Outline Introduction/Background on latte Retrieving archive data latte query evaluation Demo

15 adaptive extractor window aggregate project join PORTAL filter (data cleaning) NiagaraST/ latte SQL queries speed, volume, occupancy for each detector, updates received every 20 seconds cleaned stream data average stream speed grouped by segmentId and windowId average archive speed grouped by segmentId and windowId windowId, segmentId, stream speed, archive speed database queries perform low-level (pane) aggregation database tuples latte query plan window aggregate join db scan (segmentids) stream scan Builds and issues queries to db Dynamically adapts Punctuates ‘stream’ of data from db punctuation

16 Database Access – Porthole Scan live stream 22 April, 4:34 15 April, 4:34 8 April, 4:34 1 April, 4: April, 4:34 sliding- windo w avg sliding- windo w aggr. ⋈ Our ideal view of data archive access Database continuously provides archive data from several different places in synchronization with the input stream  Assume continuity of similarity definition last 4 same-day-of-week days

17 Database Access – Paned Aggregation Based on a query template (based on a similarity specification) and current stream time (from punctuation), extractor builds and issues queries to database Pane aggregation in database queries to reduce data communication A simplified example – SELECT floor(extract('epoch' from (timestamp - (TIMESTAMP ' :00:00' - interval “7 days”)))/60) as pane-id, seg-id, sum(speed), count(*) FROM loopdata_20sec WHERE (curtime – interval “7 days”) < timestamp <= (curtime – interval “7 days”) + 5 min GROUP BY pane-id, seg-id live stream 22 April, 4:34 15 April, 4:34 8 April, 4:34 1 April, 4: April, 4:34 sliding- windo w avg pane aggr. here windo w roll- up ⋈

18 Database Access - Adaptation How much data each database query should fetch and when Dynamically adapts the granularity and the amount of prefetching for database queries  Response time of database queries  high-watermark time of archive data  High-watermark time of stream data

19 Future Work More on adaptive archive access Multi-query sharing

20 Demo

21 Similarity-Condition Selection and Speed Map-Display

22 Adaptivity and Database Access Current stream time Green: Query completed, data buffered in NiagaraST Yellow: Query in progress Empty: Data purged Pink: Query in progress, data late Red: Query not started yet, data late Each box represents a query issued to the PORTAL database; each query requests data for a certain period of time on a certain day. The horizontal axis is time; the horizontal span of the box represents the coverage (time extent) of the database query. Lag = (database high-water mark) − (stream high-water mark)

23 Questions?