1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: DB the Cloud, and SQL & Stream Processing.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
1 11. Streaming Data Management Chapter 18 Current Issues: Streaming Data and Cloud Computing The 3rd edition of the textbook.
Chapter 22: Cloud Computing and Related Security Issues Guide to Computer Network Security.
Clouds C. Vuerli Contributed by Zsolt Nemeth. As it started.
Data Stream Computation Lecture Notes in COMP 9314 modified from those by Nikos Koudas (Toronto U), Divesh Srivastava (AT & T), and S. Muthukrishnan (Rutgers)
Data Streams & Continuous Queries The Stanford STREAM Project stanfordstreamdatamanager.
1 Continuous Queries over Data Streams Vitaly Kroivets, Lyan Marina Presentation for The Seminar on Database and Internet The Hebrew University of Jerusalem,
An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS
1 Stream-based Data Management IS698 Min Song 2 Characteristics of Data Streams  Data Streams Data streams — continuous, ordered, changing, fast, huge.
Building a Data Stream Management System Prof. Jennifer Widom Joint project with Prof. Rajeev Motwani and a team of graduate studentshttp://www-db.stanford.edu/stream.
1 PODS 2002 Motivation. 2 PODS 2002 Data Streams data sets Traditional DBMS – data stored in finite, persistent data sets data streams New Applications.
The Stanford Data Streams Research Project Profs. Rajeev Motwani & Jennifer Widom And a cast of full- and part-time students: Arvind Arasu, Brian Babcock,
SWIM 1/9/20031 QoS in Data Stream Systems Rajeev Motwani Stanford University.
Cloud Usability Framework
Wally Kowal, President and Founder Canadian Cloud Computing Inc.
Be Smart, Use PwrSmart What Is The Cloud?. Where Did The Cloud Come From? We get the term “Cloud” from the early days of the internet where we drew a.
M.A.Doman Model for enabling the delivery of computing as a SERVICE.
SPRING 2011 CLOUD COMPUTING Cloud Computing San José State University Computer Architecture (CS 147) Professor Sin-Min Lee Presentation by Vladimir Serdyukov.
Cloud computing Tahani aljehani.
Duncan Fraiser, Adam Gambrell, Lisa Schalk, Emily Williams
SOFTWARE AS A SERVICE PLATFORM AS A SERVICE INFRASTRUCTURE AS A SERVICE.
EA and IT Infrastructure - 1© Minder Chen, Stages in IT Infrastructure Evolution Mainframe/Mini Computers Personal Computer Client/Sever Computing.
Plan Introduction What is Cloud Computing?
Effectively and Securely Using the Cloud Computing Paradigm.
Clouds on IT horizon Faculty of Maritime Studies University of Rijeka Sanja Mohorovičić INFuture 2009, Zagreb, 5 November 2009.
CLOUD COMPUTING & COST MANAGEMENT S. Gurubalasubramaniyan, MSc IT, MTech Presented by.
Introduction to Cloud Computing
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Abstract Load balancing in the cloud computing environment has an important impact on the performance. Good load balancing makes cloud computing more.
Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
Data Stream Systems Reynold Cheng 12 th July, 2002 Based on slides by B. Babcock et.al, “Models and Issues in Data Stream Systems”, PODS’02.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Plan  Introduction  What is Cloud Computing?  Why is it called ‘’Cloud Computing’’?  Characteristics of Cloud Computing  Advantages of Cloud Computing.
PODS Models and Issues in Data Stream Systems Rajeev Motwani Stanford University (with Brian Babcock, Shivnath Babu, Mayur Datar, and Jennifer Widom)
Aum Sai Ram Security for Stream Data Modified from slides created by Sujan Pakala.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
PaaSport Introduction on Cloud Computing PaaSport training material.
CLOUD COMPUTING RICH SANGPROM. What is cloud computing? “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a.
Web Technologies Lecture 13 Introduction to cloud computing.
Distributed Geospatial Information Processing (DGIP) Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Template V.17, July 29, 2011 What’s the Cloud Got to do with HR Transformation? Heath Brownsworth, Director Technology Strategy.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
Streaming Semantic Data COMP6215 Semantic Web Technologies Dr Nicholas Gibbins –
Welcome To We have registered over 5,000 domain names and host over 1,500 cloud servers for individuals and organizations, Our fast and reliable.
Advanced cloud infrastructures and services SAULIUS ŽIŪKAS.
Data Streams COMP3017 Advanced Databases Dr Nicholas Gibbins –
CLOUD COMPUTING Presented to Graduate Students Mechanical Engineering Dr. John P. Abraham Professor, Computer Engineering UTPA.
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
Advanced Database Systems: DBS CB, 2nd Edition
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Introduction to Cloud Computing
The Future? Or the Past and Present?
COMP3211 Advanced Databases
Dieter Gawlick, Oracle October, 2005 (GGF15 in Boston)
Chapter 21: Cloud Computing and Related Security Issues
Introduction to Cloud Computing
Cloud Computing.
Chapter 22: Cloud Computing Technology and Security
CNIT131 Internet Basics & Beginning HTML
MANAGING DATA RESOURCES
Models and Issues in Data Stream Systems
Technical Capabilities
Cloud Computing: Concepts
Adaptive Query Processing (Background)
An Analysis of Stream Processing Languages
Presentation transcript:

1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: DB the Cloud, and SQL & Stream Processing

2 Outline DB in the Cloud SQL and Stream Processing

333 DB in the Cloud

Cloud Introduction Provide “why”, “what”, and “how” around SQL in the cloud Cloud perception: costs? New capabilities? Massively scalable computing? IaaS, PaaS, SaaS Private vs. public clouds Interoperability & standards 4

Cloud Introduction Definition: Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction The cloud model promotes availability and is composed of five essential characteristics: three service models and four deployment models 5

Cloud Characteristics 5 Essential Cloud Characteristics :  On-demand self-service (service & deployment model)  Broad network access (deployment model)  Resource pooling (service & deployment model) Location independent  Rapid elasticity (deployment model)  Measured service (service model) 6

Cloud Business View 7 Reducing costNew capabilitiesRiskOpportunities Remaking value, decision, influence chain

Cloud: Platform Dimensions Platform Dimensions:  Productivity  Scale  Trust  Viability  Velocity  Relationship Platform vendors succeed when the platform helps others succeed 8

Cloud Computing Architecture 9 Infrastructure-as-a-Service Security-as-a-Service Storage-as-a-Service Integration-as-a-Service Database-as-a-Service Information-as-a-Service Process-as-a-Service Platform-as-a-Service Application-as-a-Service Management/Governance-as-a-Service Testing-as-a-Service

Cloud: Success Factors Cloud Success Factors:  Utility Computing Capability Technical capability Datacenter Innovation capability  Application Pattern Capability Not just about the browser Platform, delivery, and tooling  Platform Ecosystem Work with ISVs, Sis, VARs & Business to get to Cloud 10

Cloud Challanges Challenges:  Identity and Access Management  Composition / Workflow  Trust Availability Performance Information protection  Latency It matters  Separating logical / physical administration 11

QoS-Aware Cloud Design and development of QoS  Ability to meet QoS application requirements as specified in hosting SLA  Current AS technology is not fully instrumented to meet those requirements Two principal middleware services:  Configuration Service (CS)  Monitoring Service (MS)  Complemented by adaptive Load Balancing Service Operate both on single AS and cluster of ASs 12

Cloud: Evolution of Virtualization 13

Cloud: Evolution of Virtualization 14 Power Saving with Distributed Power Management (PDM)

15 SQL and Stream Processing

Agenda Data Streams  What are they?  Why now? Applications… DSMS: Architecture & Issues Query Processing 16

Data Streams – What and Where? Continuous, unbounded, rapid, time-varying streams of data elements (tuples) Occur in a variety of modern applications  Network monitoring and traffic engineering  Sensor networks, RFID tags  Telecom call records  Financial applications  Web logs and click-streams  Manufacturing processes DSMS DSMS = Data Stream Management System 17

18 DBMS versus DSMS DBMS versus DSMS Persistent relations One-time queries Random access Access plan determined by query processor and physical DB design Transient streams (and persistent relations) Continuous queries Sequential access Unpredictable data characteristics and arrival patterns

Continuous Queries One time queries – Run once to completion over the current data set Continuous queries – Issued once and then continuously evaluated over the data  Example: Notify me when the temperature drops below X Tell me when prices of stock Y >

Stanford stream data manager 20 The (Simplified) Big Picture DSMS Scratch Store Input streams Register Query Streamed Result Stored Result Archive Stored Relations

(Simplified) Network Monitoring Register Monitoring Queries DSMS Scratch Store Network measurements, Packet traces Intrusion Warnings Online Performance Metrics Archive Lookup Tables 21

Triggers? Recall triggers in traditional DBMSs? Why not use triggers to process continuous queries over data streams? 22

Making Things Concrete DSMS Outgoing (call_ID, caller, time, event) Incoming (call_ID, callee, time, event) event = start or end Central Office Central Office ALICE BOB 23

24 Query 1 ( self-join ) Find all outgoing calls longer than 2 minutes SELECT O1.call_ID, O1.caller FROM Outgoing O1, Outgoing O2 WHERE (O2.time – O1.time > 2 AND O1.call_ID = O2.call_ID AND O1.event = start AND O2.event = end) Result requires unbounded storage Can provide result as data stream Can output after 2 min, without seeing end

25 Query 2 ( join ) Pair up callers and callees SELECT O.caller, I.callee FROM Outgoing O, Incoming I WHERE O.call_ID = I.call_ID Can still provide result as data stream Requires unbounded temporary storage … … unless streams are near-synchronized

26 Query 3 ( group-by aggregation ) Total connection time for each caller SELECT O1.caller, sum(O2.time – O1.time) FROM Outgoing O1, Outgoing O2 WHERE (O1.call_ID = O2.call_ID AND O1.event = start AND O2.event = end) GROUP BY O1.caller Cannot provide result in (append-only) stream  Output updates?  Provide current value on demand?  Memory?

27 DSMS – Architecture & Issues Data streams and stored relations – Architectural differences. Declarative language for registering continuous queries Flexible query plans and execution strategies Centralized ? Distributed ?

Agenda Data Streams  What are they?  Why now? Applications.. DSMS: Architecture & Issues Query Processing 28

DSMS – Issues Relation: Tuple Set or Sequence? Updates: Modifications or Appends? Query Answer: Exact or Approximate? Query Evaluation: One of multiple Pass? Query Plan: Fixed or Adaptive? 29

Architectural Issues DSMSDBMS Resource (memory, per- tuple computation) limited Reasonably complex, near real time, query processing Useful to identify what data to populate in database Query Evaluation: One pass Query Plan: Adaptive Resource (memory, disk, per-tuple computation) rich Extremely sophisticated query processing, analysis Useful to audit query results of data stream systems Query Evaluation: Arbitrary Query Plan: Fixed. 30

STREAM System Challenges Must cope with:  Stream Rates that may be high, variable, and bursty  Stream data that may be unpredictable, variable  Continuous query loads that may be high, variable Query Answer: Exact or Approximate? 31

32 Query Model User/ Application DSMS Query Processor 32

Agenda Data Streams What are they? Why now? Applications.. DSMS: Architecture & Issues Query Processing Language Operators Optimization Multi-Query Optimization 33

Agenda Data Streams  What are they?  Why now? Applications.. DSMS: Architecture & Issues Query Processing  Language  Operators  Optimization  Multi-Query Optimization 34

Stream Query Language SQL extension Queries reference/produce relations or streams Examples: GSQL [Gigascope], CQL [STREAM] Stream or Finite Relation Stream Query Language 35

Example: Continuous Query Language – CQL Start with SQL Then add… Streams as new data type Continuous instead of one-time semantics Windows on streams (derived from SQL-99) Sampling on streams (basic) 36

Impact of Limited Memory Continuous streams grow unboundedly Queries may require unbounded memory One solution: Approximate query evaluation 37

Approximate Query Evaluation Why?  Handling load – streams coming too fast  Avoid unbounded storage and computation  Ad hoc queries need approximate history How? Sliding windows, synopsis, samples, load-shed Major Issues?  Metric for set-valued queries  Composition of approximate operators  How is it understood/controlled by user?  Integrate into query language  Query planning and interaction with resource allocation  Accuracy-efficiency-storage tradeoff and global metric 38

Windows Mechanism for extracting a finite relation from an infinite stream Various window proposals for restricting operator scope:  Windows based on ordering attribute (e.g. time)  Windows based on tuple counts  Windows based on explicit markers (e.g. punctuations)  Variants (e.g., partitioning tuples in a window) Stream Finite relations manipulated using SQL Window specifications streamify 39

Windows Terminology Start timeCurrent time time t1t2t3 t4t5 Sliding Window timeTumbling Window 40

Query Operators Selections - Where clause Projections - Select clause Joins - From clause Group-by (Aggregations) – Group-by clause 41

Query Operators Selections and projections on streams - straightforward  Local per-element operators Projection may need to include ordering attribute Joins – Problematic  May need to join tuples that are arbitrarily far apart  Equijoin on stream ordering attributes may be tractable Majority of the work focuses on joins using windows 42

Blocking Operators Blocking  No output until entire input seen  Streams – input never ends Simple Aggregates – output “update” stream Set Output (sort, group-by)  Root – could maintain output data structure  Intermediate nodes – try non-blocking analogs Join  Apply sliding-window restrictions 43

Optimization in DSMS Traditionally table based cardinalities used in query optimizer.  Goal of query optimizer: Minimize the size of intermediate results Problematic in a streaming environment – All streams are unbounded = infinite size! Need novel optimization objectives that are relevant when the input sources are streams 44

Query Optimization in DSMS Novel notions of optimization:  Stream rate based [e.g. NiagaraCQ]  Resource-based [e.g. STREAM]  QoS based [e.g. Aurora] Continuous adaptive optimization Possibilities that objectives cannot be met:  Resource constraints  Bursty arrivals under limited processing capabilities. 45

Stream Projects Amazon/Cougar Amazon/Cougar (Cornell) – sensors Aurora (Brown/MIT) – sensor monitoring, dataflow Hancock Hancock (AT&T) – telecom streams Niagara (OGI/Wisconsin) – Internet XML databases OpenCQ OpenCQ (Georgia) – triggers, incr. view maintenance Stream (Stanford) – general-purpose DSMS Tapestry Tapestry (Xerox) – pub/sub content-based filtering Telegraph (Berkeley) – adaptive engine for sensors Tribeca Tribeca (Bellcore) – network monitoring 46

47 END