Elke A. Rundensteiner Database Systems Research Group Office: Fuller 238 Phone: Ext. – 5815 WebPages:

Slides:



Advertisements
Similar presentations
Technology of Data Analytics. INTRODUCTION OBJECTIVE  Data Analytics mindset – shallow and wide, deep when you need it  Quick overview, useful tidbits,
Advertisements

Testing adaptive workload management Harumi Kuno HP Labs Stefan Krompass (TUM), Kevin Wilkinson, Umeshwar Dayal, Goetz Graefe, Janet Wiener.
10 REASONS Why it makes a good option for your DB IN-MEMORY DATABASES Presenter #10: Robert Vitolo.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Energy Conservation in Datacenters through Cluster Memory Management and Barely-Alive Memory Servers Vlasia Anagnostopoulou Susmit.
Hiperspace Lab University of Delaware Antony, Sara, Mike, Ben, Dave, Sreedevi, Emily, and Lori.
Information Retrieval in Practice
Elke A. Rundensteiner Database Systems Research Group Office: Fuller 238 Phone: Ext. – 5815 WebPages:
1 Murali Mani Topics projects in databases and web applications and XML Database Systems Research Lab @cs.wpi.eduWebpages:
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
1 Elke A. Rundensteiner Topics projects in database and Information systems, such as, web information systems, distributed databases, Etc. Database Systems.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
Continuous Stream Monitoring Technology Elke A. Rundensteiner Database Systems Research Laboratory Department of Computer Science Worcester Polytechnic.
Dunja Mladenić Marko Grobelnik Jožef Stefan Institute, Slovenia.
1 Murali Mani Topics projects in databases and web applications and XML Database Systems Research Lab @cs.wpi.eduWebpages:
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
1 DCAPE: Distributed and Self-Tuned Continuous Query Processing Tim Sutherland,Bin Liu,Mariana Jbantova, and Elke A. Rundensteiner Department of Computer.
Overview of Search Engines
Google Distributed System and Hadoop Lakshmi Thyagarajan.
1 CSCE 5013: Hot Topics in Mobile and Pervasive Computing Nilanjan Banerjee Hot Topic in Mobile and Pervasive Computing University of Arkansas Fayetteville,
Darema Dr. Frederica Darema NSF Dynamic Data Driven Application Systems (Symbiotic Measurement&Simulation Systems) “A new paradigm for application simulations.
Load Balancing Dan Priece. What is Load Balancing? Distributed computing with multiple resources Need some way to distribute workload Discreet from the.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Overview of the Course. Critical Facts Welcome to CISC 672 — Advanced Compiler Construction Instructor: Dr. John Cavazos Office.
Tyson Condie.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
Distributed Real-Time Systems for the Intelligent Power Grid Prof. Vincenzo Liberatore.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
UNIT - 1Topic - 2 C OMPUTING E NVIRONMENTS. What is Computing Environment? Computing Environment explains how a collection of computers will process and.
Cloud Computing Energy efficient cloud computing Keke Chen.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
1 Dynamically Adaptive Distributed System for Processing CompleX Continuous Queries Bin Liu, Yali Zhu, Mariana Jbantova, Brad Momberger, and Elke A. Rundensteiner.
(C) 2008 Clusterpoint(C) 2008 ClusterPoint Ltd. Empowering You to Manage and Drive Down Database Costs April 17, 2009 Gints Ernestsons, CEO © 2009 Clusterpoint.
Real-Time Systems Mark Stanovich. Introduction System with timing constraints (e.g., deadlines) What makes a real-time system different? – Meeting timing.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Research Topics CSC Parallel Computing & Compilers CSC 3990.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Job scheduling algorithm based on Berger model in cloud environment Advances in Engineering Software (2011) Baomin Xu,Chunyan Zhao,Enzhao Hua,Bin Hu 2013/1/251.
Sarat Sreepathi North Carolina State University Internet2 – SURAgrid Demo Dec 6, 2006.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
1 Elke. A. Rundensteiner Worcester Polytechnic Institute Elisa Bertino Purdue University 1 Rimma V. Nehme Microsoft.
Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, ESRI Vana Kalogeraki, AUEB
Economic and On Demand Brain Activity Analysis on Global Grids A case study.
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.
CompSci Self-Managing Systems Shivnath Babu.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Ethan Galstad What Is Nagios? What Nagios Is IT Infrastructure Monitoring.
Big Data Analytics and HPC Platforms
COMPSCI 110 Operating Systems
Clouds , Grids and Clusters
Applying Control Theory to Stream Processing Systems
COMPSCI 110 Operating Systems
Modern Data Management
Dynamic Data Driven Application Systems
Java programming lecture one
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Dynamic Data Driven Application Systems
Data Warehousing and Data Mining
F# for Parallel and Asynchronous Programming
Overview of big data tools
Spark and Scala.
Panel on Research Challenges in Big Data
Adaptive Query Processing (Background)
Presentation transcript:

Elke A. Rundensteiner Database Systems Research Group Office: Fuller 238 Phone: Ext. – 5815 WebPages:

Elke A. Rundensteiner Topics projects in database and Information systems, such as, web information systems, distributed databases, Etc. Database Systems Research Lab Office: Fuller 238 Phone: x – 5815 Webpages:

Project Topics in a Nutshell:  Distributed Data Sources:  EVE : Data Warehousing over Distributed Data  TOTAL-ETL : Distributed Extract Transform Load [NSF’96,NSF02,NSF05?]  XML/Web Data Systems:  RAINBOW : XML to Relational Databases  MASS : Native XQuery Processing System [Verizon,IBM,NSF05, NSF05?]  Distributed Data Sources:  EVE : Data Warehousing over Distributed Data  TOTAL-ETL : Distributed Extract Transform Load [NSF’96,NSF02,NSF05?]  XML/Web Data Systems:  RAINBOW : XML to Relational Databases  MASS : Native XQuery Processing System [Verizon,IBM,NSF05, NSF05?]  Databases & Visualization:  Scalable Visual High-Dim. Data Exploration  Data and Visual Quality Support in XMDV [NSF’97,NSF01,NSF05]  Stream Monitoring System:  Scalable Query Engine for Data Streams  Fire Prediction and Monitoring Appl. [NSF05a?, NSF05b?]

CAPE : Engine for Querying and Monitoring Streaming Data Example of Stream Data Applications: Market Analysis –Streams of Stock Exchange Data - get rich Critical Care –Streams of Vital Sign Measurements – save lives Physical Plant Monitoring –Streams of Environmental Readings – protect env

Databases Upside Down data Query data streams of data static data Standing queries one-time queries

Stream Query Processing Register Continuous Queries Distributed Stream Query Engine Distributed Stream Query Engine Streaming Data Streaming Result Real-time and accurate responses required May have time- varying rates and high-volumes Available resources for executing each operator may vary over time. Run-time Distribution and Adaptations required. High workload of queries Receive Answers Memory- and CPU resource limitations

Good news … for a research student  We can lean on the oldie and goodie,  Yet so many new and unsolved problems at our finger tips due to new light !  Interesting (yet doable) research challenges  Even possibilities for start-up (if you are so inclined)  We can lean on the oldie and goodie,  Yet so many new and unsolved problems at our finger tips due to new light !  Interesting (yet doable) research challenges  Even possibilities for start-up (if you are so inclined)

Research Contributions  Scalable Query Operators (Punctuations)  Adapt and select among tasks such as memory purging, stream reading, memory- to-disk shuffling, punctuation propagation, index selection, etc.  Synchronized Plan Spilling  Operators selectively spill data to disk to off-set the system overload with adaptive re-load to improve performance  Adaptive Operator Scheduling  Selector scores alternate scheduling algorithm based on their effect on QoS requirements, and selects candidate.  On-line Query Plan Migration  On-line plan restructuring and then online migration to the new plan even for stateful operators.  Distributed Plan Execution  Adaptively distribute computations across multiple machines to optimize QoS requirements without information loss  Scalable Query Operators (Punctuations)  Adapt and select among tasks such as memory purging, stream reading, memory- to-disk shuffling, punctuation propagation, index selection, etc.  Synchronized Plan Spilling  Operators selectively spill data to disk to off-set the system overload with adaptive re-load to improve performance  Adaptive Operator Scheduling  Selector scores alternate scheduling algorithm based on their effect on QoS requirements, and selects candidate.  On-line Query Plan Migration  On-line plan restructuring and then online migration to the new plan even for stateful operators.  Distributed Plan Execution  Adaptively distribute computations across multiple machines to optimize QoS requirements without information loss

We got it all... and more  If you like theory  algorithms for np-complete optimization, graph theory  If you like systems  distributed allocation, scheduling, and parallelism of query execution  If you like networking  quality-of-query, load-shedding, grid-computing  If you like AI  learning of scheduling selection, run-time adaptation  If you like software engineering  huge query engine code base, we really need you  If you like theory  algorithms for np-complete optimization, graph theory  If you like systems  distributed allocation, scheduling, and parallelism of query execution  If you like networking  quality-of-query, load-shedding, grid-computing  If you like AI  learning of scheduling selection, run-time adaptation  If you like software engineering  huge query engine code base, we really need you So where is the database in this stuff?

 One answer :  Who cares ? If it’s fun, it’s database stuff  Second answer :  Development of a new generation of “data query engine”  One answer :  Who cares ? If it’s fun, it’s database stuff  Second answer :  Development of a new generation of “data query engine”

 A driving application: FIRE

Sensors in Rooms

Engineering Data for Fire Science

Futuristic Monitoring Queries ?  Track a smoke cloud (moving cluster) in terms of its speed and severity ?  Find the scope and direction of fire spreads ?  Match given sensors readings of fire with a fire stream simulation to determine similarity ?  Is this a prank (outlier), or are we dealing with an actual fire ?  What path should people be leaving this building ?  Any sensor readings are faulty, and should be ignored?  Track a smoke cloud (moving cluster) in terms of its speed and severity ?  Find the scope and direction of fire spreads ?  Match given sensors readings of fire with a fire stream simulation to determine similarity ?  Is this a prank (outlier), or are we dealing with an actual fire ?  What path should people be leaving this building ?  Any sensor readings are faulty, and should be ignored?

FireEngine : Fire Stream Processing

If Questions, me: Better, drop by DSRG Labs : Fuller 319 & 318 My office : Fuller 238