Advanced Database Systems: DBS CB, 2nd Edition

Slides:



Advertisements
Similar presentations
Hello i am so and so, title/role and a little background on myself (i.e. former microsoft employee or anything interesting) set context for what going.
Advertisements

Chapter 10: Designing Databases
SSRS 2008 Architecture Improvements Scale-out SSRS 2008 Report Engine Scalability Improvements.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
1 11. Streaming Data Management Chapter 18 Current Issues: Streaming Data and Cloud Computing The 3rd edition of the textbook.
1 Continuous Queries over Data Streams Vitaly Kroivets, Lyan Marina Presentation for The Seminar on Database and Internet The Hebrew University of Jerusalem,
1 Stream-based Data Management IS698 Min Song 2 Characteristics of Data Streams  Data Streams Data streams — continuous, ordered, changing, fast, huge.
1 PODS 2002 Motivation. 2 PODS 2002 Data Streams data sets Traditional DBMS – data stored in finite, persistent data sets data streams New Applications.
The Stanford Data Streams Research Project Profs. Rajeev Motwani & Jennifer Widom And a cast of full- and part-time students: Arvind Arasu, Brian Babcock,
Cloud Usability Framework
Wally Kowal, President and Founder Canadian Cloud Computing Inc.
Be Smart, Use PwrSmart What Is The Cloud?. Where Did The Cloud Come From? We get the term “Cloud” from the early days of the internet where we drew a.
SPRING 2011 CLOUD COMPUTING Cloud Computing San José State University Computer Architecture (CS 147) Professor Sin-Min Lee Presentation by Vladimir Serdyukov.
Cloud computing Tahani aljehani.
Duncan Fraiser, Adam Gambrell, Lisa Schalk, Emily Williams
SOFTWARE AS A SERVICE PLATFORM AS A SERVICE INFRASTRUCTURE AS A SERVICE.
Plan Introduction What is Cloud Computing?
Effectively and Securely Using the Cloud Computing Paradigm.
Clouds on IT horizon Faculty of Maritime Studies University of Rijeka Sanja Mohorovičić INFuture 2009, Zagreb, 5 November 2009.
CLOUD COMPUTING & COST MANAGEMENT S. Gurubalasubramaniyan, MSc IT, MTech Presented by.
Introduction to Cloud Computing
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
Data Stream Systems Reynold Cheng 12 th July, 2002 Based on slides by B. Babcock et.al, “Models and Issues in Data Stream Systems”, PODS’02.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Plan  Introduction  What is Cloud Computing?  Why is it called ‘’Cloud Computing’’?  Characteristics of Cloud Computing  Advantages of Cloud Computing.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Data Stream Management Systems
Aum Sai Ram Security for Stream Data Modified from slides created by Sujan Pakala.
PaaSport Introduction on Cloud Computing PaaSport training material.
CLOUD COMPUTING RICH SANGPROM. What is cloud computing? “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware A Cloud Computing Methodology Study of.
Web Technologies Lecture 13 Introduction to cloud computing.
Template V.17, July 29, 2011 What’s the Cloud Got to do with HR Transformation? Heath Brownsworth, Director Technology Strategy.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
Streaming Semantic Data COMP6215 Semantic Web Technologies Dr Nicholas Gibbins –
Advanced cloud infrastructures and services SAULIUS ŽIŪKAS.
Managing Data Resources File Organization and databases for business information systems.
Data Streams COMP3017 Advanced Databases Dr Nicholas Gibbins –
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: DB the Cloud, and SQL & Stream Processing.
CLOUD COMPUTING Presented to Graduate Students Mechanical Engineering Dr. John P. Abraham Professor, Computer Engineering UTPA.
Clouding with Microsoft Azure
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Data Platform and Analytics Foundational Training
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Introduction to Cloud Computing
The Future? Or the Past and Present?
COMP3211 Advanced Databases
Dieter Gawlick, Oracle October, 2005 (GGF15 in Boston)
The Future? Or the Past and Present?
Cloud Computing By P.Mahesh
Chapter 21: Cloud Computing and Related Security Issues
Introduction to Cloud Computing
Cloud Computing.
Chapter 22: Cloud Computing Technology and Security
CNIT131 Internet Basics & Beginning HTML
Dr. John P. Abraham Professor, Computer Engineering UTPA
Advanced Operating Systems
MANAGING DATA RESOURCES
Ch 4. The Evolution of Analytic Scalability
Models and Issues in Data Stream Systems
Technical Capabilities
Cloud Computing: Concepts
Basics of Cloud Computing
Adaptive Query Processing (Background)
An Analysis of Stream Processing Languages
Microsoft Virtual Academy
Presentation transcript:

Advanced Database Systems: DBS CB, 2nd Edition Advanced Topics of Interest: DB the Cloud, and SQL & Stream Processing

Outline DB in the Cloud SQL and Stream Processing

DB in the Cloud 3 3 3

Cloud Introduction Provide “why”, “what”, and “how” around SQL in the cloud Cloud perception: costs? New capabilities? Massively scalable computing? IaaS, PaaS, SaaS Private vs. public clouds Interoperability & standards

Cloud Introduction Definition: Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction The cloud model promotes availability and is composed of five essential characteristics: three service models and four deployment models

Cloud Characteristics 5 Essential Cloud Characteristics: On-demand self-service (service & deployment model) Broad network access (deployment model) Resource pooling (service & deployment model) Location independent Rapid elasticity (deployment model) Measured service (service model)

Cloud Business View Reducing cost New capabilities Risk Opportunities Remaking value, decision, influence chain

Cloud: Platform Dimensions Productivity Scale Trust Viability Velocity Relationship Platform vendors succeed when the platform helps others succeed

Cloud Computing Architecture Infrastructure-as-a-Service Security-as-a-Service Storage-as-a-Service Integration-as-a-Service Database-as-a-Service Information-as-a-Service Process-as-a-Service Platform-as-a-Service Application-as-a-Service Management/Governance-as-a-Service Testing-as-a-Service

Cloud: Success Factors Utility Computing Capability Technical capability Datacenter Innovation capability Application Pattern Capability Not just about the browser Platform, delivery, and tooling Platform Ecosystem Work with ISVs, Sis, VARs & Business to get to Cloud

Cloud Challanges Challenges: Identity and Access Management Composition / Workflow Trust Availability Performance Information protection Latency It matters Separating logical / physical administration

QoS-Aware Cloud Design and development of QoS Ability to meet QoS application requirements as specified in hosting SLA Current AS technology is not fully instrumented to meet those requirements Two principal middleware services: Configuration Service (CS) Monitoring Service (MS) Complemented by adaptive Load Balancing Service Operate both on single AS and cluster of ASs

Cloud: Evolution of Virtualization

Cloud: Evolution of Virtualization Power Saving with Distributed Power Management (PDM)

SQL and Stream Processing 15 15 15

Agenda Data Streams DSMS: Architecture & Issues Query Processing What are they? Why now? Applications… DSMS: Architecture & Issues Query Processing

Data Streams – What and Where? Continuous, unbounded, rapid, time-varying streams of data elements (tuples) Occur in a variety of modern applications Network monitoring and traffic engineering Sensor networks, RFID tags Telecom call records Financial applications Web logs and click-streams Manufacturing processes DSMS = Data Stream Management System

DBMS versus DSMS Persistent relations One-time queries Random access Access plan determined by query processor and physical DB design Transient streams (and persistent relations) Continuous queries Sequential access Unpredictable data characteristics and arrival patterns

Continuous Queries One time queries – Run once to completion over the current data set Continuous queries – Issued once and then continuously evaluated over the data Example: Notify me when the temperature drops below X Tell me when prices of stock Y > 300

The (Simplified) Big Picture DSMS Scratch Store Input streams Register Query Streamed Result Stored Archive Relations Stanford stream data manager

(Simplified) Network Monitoring Register Monitoring Queries DSMS Scratch Store Network measurements, Packet traces Intrusion Warnings Online Performance Metrics Archive Lookup Tables 21

Triggers? Recall triggers in traditional DBMSs? Why not use triggers to process continuous queries over data streams? 22

Making Things Concrete DSMS Outgoing (call_ID, caller, time, event) Incoming (call_ID, callee, time, event) event = start or end Central Office ALICE BOB

Query 1 (self-join) Find all outgoing calls longer than 2 minutes SELECT O1.call_ID, O1.caller FROM Outgoing O1, Outgoing O2 WHERE (O2.time – O1.time > 2 AND O1.call_ID = O2.call_ID AND O1.event = start AND O2.event = end) Result requires unbounded storage Can provide result as data stream Can output after 2 min, without seeing end

Query 2 (join) Pair up callers and callees SELECT O.caller, I.callee FROM Outgoing O, Incoming I WHERE O.call_ID = I.call_ID Can still provide result as data stream Requires unbounded temporary storage … … unless streams are near-synchronized

Query 3 (group-by aggregation) Total connection time for each caller SELECT O1.caller, sum(O2.time – O1.time) FROM Outgoing O1, Outgoing O2 WHERE (O1.call_ID = O2.call_ID AND O1.event = start AND O2.event = end) GROUP BY O1.caller Cannot provide result in (append-only) stream Output updates? Provide current value on demand? Memory?

DSMS – Architecture & Issues Data streams and stored relations – Architectural differences. Declarative language for registering continuous queries Flexible query plans and execution strategies Centralized ? Distributed ?

Agenda Data Streams DSMS: Architecture & Issues Query Processing What are they? Why now? Applications.. DSMS: Architecture & Issues Query Processing

DSMS – Issues Relation: Tuple Set or Sequence? Updates: Modifications or Appends? Query Answer: Exact or Approximate? Query Evaluation: One of multiple Pass? Query Plan: Fixed or Adaptive?

Architectural Issues DSMS DBMS Resource (memory, per-tuple computation) limited Reasonably complex, near real time, query processing Useful to identify what data to populate in database Query Evaluation: One pass Query Plan: Adaptive Resource (memory, disk, per-tuple computation) rich Extremely sophisticated query processing, analysis Useful to audit query results of data stream systems Query Evaluation: Arbitrary Query Plan: Fixed.

STREAM System Challenges Must cope with: Stream Rates that may be high, variable, and bursty Stream data that may be unpredictable, variable Continuous query loads that may be high, variable Query Answer: Exact or Approximate?

Query Model User/Application Query Processor DSMS 32

Agenda Data Streams DSMS: Architecture & Issues Query Processing What are they? Why now? Applications.. DSMS: Architecture & Issues Query Processing Language Operators Optimization Multi-Query Optimization

Agenda Data Streams DSMS: Architecture & Issues Query Processing What are they? Why now? Applications.. DSMS: Architecture & Issues Query Processing Language Operators Optimization Multi-Query Optimization

Stream Query Language SQL extension Queries reference/produce relations or streams Examples: GSQL [Gigascope], CQL [STREAM] Stream or Finite Relation Stream or Finite Relation Stream Query Language

Example: Continuous Query Language – CQL Start with SQL Then add… Streams as new data type Continuous instead of one-time semantics Windows on streams (derived from SQL-99) Sampling on streams (basic)

Impact of Limited Memory Continuous streams grow unboundedly Queries may require unbounded memory One solution: Approximate query evaluation

Approximate Query Evaluation Why? Handling load – streams coming too fast Avoid unbounded storage and computation Ad hoc queries need approximate history How? Sliding windows, synopsis, samples, load-shed Major Issues? Metric for set-valued queries Composition of approximate operators How is it understood/controlled by user? Integrate into query language Query planning and interaction with resource allocation Accuracy-efficiency-storage tradeoff and global metric

Finite relations manipulated using SQL Windows Mechanism for extracting a finite relation from an infinite stream Various window proposals for restricting operator scope: Windows based on ordering attribute (e.g. time) Windows based on tuple counts Windows based on explicit markers (e.g. punctuations) Variants (e.g., partitioning tuples in a window) Stream Finite relations manipulated using SQL Window specifications streamify

Windows Terminology Start time Current time t1 t2 t3 t4 t5 time Sliding Window time Tumbling Window

Query Operators Selections - Where clause Projections - Select clause Joins - From clause Group-by (Aggregations) – Group-by clause

Query Operators Selections and projections on streams - straightforward Local per-element operators Projection may need to include ordering attribute Joins – Problematic May need to join tuples that are arbitrarily far apart Equijoin on stream ordering attributes may be tractable Majority of the work focuses on joins using windows

Blocking Operators Blocking Simple Aggregates – output “update” stream No output until entire input seen Streams – input never ends Simple Aggregates – output “update” stream Set Output (sort, group-by) Root – could maintain output data structure Intermediate nodes – try non-blocking analogs Join Apply sliding-window restrictions

Optimization in DSMS Traditionally table based cardinalities used in query optimizer. Goal of query optimizer: Minimize the size of intermediate results Problematic in a streaming environment – All streams are unbounded = infinite size! Need novel optimization objectives that are relevant when the input sources are streams

Query Optimization in DSMS Novel notions of optimization: Stream rate based [e.g. NiagaraCQ] Resource-based [e.g. STREAM] QoS based [e.g. Aurora] Continuous adaptive optimization Possibilities that objectives cannot be met: Resource constraints Bursty arrivals under limited processing capabilities.

Stream Projects Amazon/Cougar (Cornell) – sensors Aurora (Brown/MIT) – sensor monitoring, dataflow Hancock (AT&T) – telecom streams Niagara (OGI/Wisconsin) – Internet XML databases OpenCQ (Georgia) – triggers, incr. view maintenance Stream (Stanford) – general-purpose DSMS Tapestry (Xerox) – pub/sub content-based filtering Telegraph (Berkeley) – adaptive engine for sensors Tribeca (Bellcore) – network monitoring

END