Elke A. Rundensteiner Database Systems Research Group Office: Fuller 238 Phone: Ext. – 5815 WebPages:
Elke A. Rundensteiner Topics projects in database and Information systems, such as, web information systems, distributed databases, Etc. Database Systems Research Lab Office: Fuller 238 Phone: x – 5815 Webpages:
Project Topics in a Nutshell: Distributed Data Sources: EVE : Data Warehousing over Distributed Data TOTAL-ETL : Distributed Extract Transform Load [NSF’96,NSF02,NSF05?] XML/Web Data Systems: RAINBOW : XML to Relational Databases MASS : Native XQuery Processing System [Verizon,IBM,NSF05, NSF05?] Distributed Data Sources: EVE : Data Warehousing over Distributed Data TOTAL-ETL : Distributed Extract Transform Load [NSF’96,NSF02,NSF05?] XML/Web Data Systems: RAINBOW : XML to Relational Databases MASS : Native XQuery Processing System [Verizon,IBM,NSF05, NSF05?] Databases & Visualization: Scalable Visual High-Dim. Data Exploration Data and Visual Quality Support in XMDV [NSF’97,NSF01,NSF05] Stream Monitoring System: Scalable Query Engine for Data Streams Fire Prediction and Monitoring Appl. [NSF05a?, NSF05b?]
CAPE : Engine for Querying and Monitoring Streaming Data Example of Stream Data Applications: Market Analysis –Streams of Stock Exchange Data - get rich Critical Care –Streams of Vital Sign Measurements – save lives Physical Plant Monitoring –Streams of Environmental Readings – protect env
Databases Upside Down data Query data streams of data static data Standing queries one-time queries
Stream Query Processing Register Continuous Queries Distributed Stream Query Engine Distributed Stream Query Engine Streaming Data Streaming Result Real-time and accurate responses required May have time- varying rates and high-volumes Available resources for executing each operator may vary over time. Run-time Distribution and Adaptations required. High workload of queries Receive Answers Memory- and CPU resource limitations
Good news … for a research student We can lean on the oldie and goodie, Yet so many new and unsolved problems at our finger tips due to new light ! Interesting (yet doable) research challenges Even possibilities for start-up (if you are so inclined) We can lean on the oldie and goodie, Yet so many new and unsolved problems at our finger tips due to new light ! Interesting (yet doable) research challenges Even possibilities for start-up (if you are so inclined)
Research Contributions Scalable Query Operators (Punctuations) Adapt and select among tasks such as memory purging, stream reading, memory- to-disk shuffling, punctuation propagation, index selection, etc. Synchronized Plan Spilling Operators selectively spill data to disk to off-set the system overload with adaptive re-load to improve performance Adaptive Operator Scheduling Selector scores alternate scheduling algorithm based on their effect on QoS requirements, and selects candidate. On-line Query Plan Migration On-line plan restructuring and then online migration to the new plan even for stateful operators. Distributed Plan Execution Adaptively distribute computations across multiple machines to optimize QoS requirements without information loss Scalable Query Operators (Punctuations) Adapt and select among tasks such as memory purging, stream reading, memory- to-disk shuffling, punctuation propagation, index selection, etc. Synchronized Plan Spilling Operators selectively spill data to disk to off-set the system overload with adaptive re-load to improve performance Adaptive Operator Scheduling Selector scores alternate scheduling algorithm based on their effect on QoS requirements, and selects candidate. On-line Query Plan Migration On-line plan restructuring and then online migration to the new plan even for stateful operators. Distributed Plan Execution Adaptively distribute computations across multiple machines to optimize QoS requirements without information loss
We got it all... and more If you like theory algorithms for np-complete optimization, graph theory If you like systems distributed allocation, scheduling, and parallelism of query execution If you like networking quality-of-query, load-shedding, grid-computing If you like AI learning of scheduling selection, run-time adaptation If you like software engineering huge query engine code base, we really need you If you like theory algorithms for np-complete optimization, graph theory If you like systems distributed allocation, scheduling, and parallelism of query execution If you like networking quality-of-query, load-shedding, grid-computing If you like AI learning of scheduling selection, run-time adaptation If you like software engineering huge query engine code base, we really need you So where is the database in this stuff?
One answer : Who cares ? If it’s fun, it’s database stuff Second answer : Development of a new generation of “data query engine” One answer : Who cares ? If it’s fun, it’s database stuff Second answer : Development of a new generation of “data query engine”
A driving application: FIRE
Sensors in Rooms
Engineering Data for Fire Science
Futuristic Monitoring Queries ? Track a smoke cloud (moving cluster) in terms of its speed and severity ? Find the scope and direction of fire spreads ? Match given sensors readings of fire with a fire stream simulation to determine similarity ? Is this a prank (outlier), or are we dealing with an actual fire ? What path should people be leaving this building ? Any sensor readings are faulty, and should be ignored? Track a smoke cloud (moving cluster) in terms of its speed and severity ? Find the scope and direction of fire spreads ? Match given sensors readings of fire with a fire stream simulation to determine similarity ? Is this a prank (outlier), or are we dealing with an actual fire ? What path should people be leaving this building ? Any sensor readings are faulty, and should be ignored?
FireEngine : Fire Stream Processing
If Questions, me: Better, drop by DSRG Labs : Fuller 319 & 318 My office : Fuller 238