Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Database Systems: DBS CB, 2nd Edition

Similar presentations


Presentation on theme: "Advanced Database Systems: DBS CB, 2nd Edition"— Presentation transcript:

1 Advanced Database Systems: DBS CB, 2nd Edition
Advanced Topics of Interest: DB the Cloud, and SQL & Stream Processing

2 Outline DB in the Cloud SQL and Stream Processing

3 DB in the Cloud 3 3 3

4 Cloud Introduction Provide “why”, “what”, and “how” around SQL in the cloud Cloud perception: costs? New capabilities? Massively scalable computing? IaaS, PaaS, SaaS Private vs. public clouds Interoperability & standards

5 Cloud Introduction Definition: Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction The cloud model promotes availability and is composed of five essential characteristics: three service models and four deployment models

6 Cloud Characteristics
5 Essential Cloud Characteristics: On-demand self-service (service & deployment model) Broad network access (deployment model) Resource pooling (service & deployment model) Location independent Rapid elasticity (deployment model) Measured service (service model)

7 Cloud Business View Reducing cost New capabilities Risk Opportunities
Remaking value, decision, influence chain

8 Cloud: Platform Dimensions
Productivity Scale Trust Viability Velocity Relationship Platform vendors succeed when the platform helps others succeed

9 Cloud Computing Architecture
Infrastructure-as-a-Service Security-as-a-Service Storage-as-a-Service Integration-as-a-Service Database-as-a-Service Information-as-a-Service Process-as-a-Service Platform-as-a-Service Application-as-a-Service Management/Governance-as-a-Service Testing-as-a-Service

10 Cloud: Success Factors
Utility Computing Capability Technical capability Datacenter Innovation capability Application Pattern Capability Not just about the browser Platform, delivery, and tooling Platform Ecosystem Work with ISVs, Sis, VARs & Business to get to Cloud

11 Cloud Challanges Challenges: Identity and Access Management
Composition / Workflow Trust Availability Performance Information protection Latency It matters Separating logical / physical administration

12 QoS-Aware Cloud Design and development of QoS
Ability to meet QoS application requirements as specified in hosting SLA Current AS technology is not fully instrumented to meet those requirements Two principal middleware services: Configuration Service (CS) Monitoring Service (MS) Complemented by adaptive Load Balancing Service Operate both on single AS and cluster of ASs

13 Cloud: Evolution of Virtualization

14 Cloud: Evolution of Virtualization
Power Saving with Distributed Power Management (PDM)

15 SQL and Stream Processing
15 15 15

16 Agenda Data Streams DSMS: Architecture & Issues Query Processing
What are they? Why now? Applications… DSMS: Architecture & Issues Query Processing

17 Data Streams – What and Where?
Continuous, unbounded, rapid, time-varying streams of data elements (tuples) Occur in a variety of modern applications Network monitoring and traffic engineering Sensor networks, RFID tags Telecom call records Financial applications Web logs and click-streams Manufacturing processes DSMS = Data Stream Management System

18 DBMS versus DSMS Persistent relations One-time queries Random access
Access plan determined by query processor and physical DB design Transient streams (and persistent relations) Continuous queries Sequential access Unpredictable data characteristics and arrival patterns

19 Continuous Queries One time queries – Run once to completion over the current data set Continuous queries – Issued once and then continuously evaluated over the data Example: Notify me when the temperature drops below X Tell me when prices of stock Y > 300

20 The (Simplified) Big Picture
DSMS Scratch Store Input streams Register Query Streamed Result Stored Archive Relations Stanford stream data manager

21 (Simplified) Network Monitoring
Register Monitoring Queries DSMS Scratch Store Network measurements, Packet traces Intrusion Warnings Online Performance Metrics Archive Lookup Tables 21

22 Triggers? Recall triggers in traditional DBMSs?
Why not use triggers to process continuous queries over data streams? 22

23 Making Things Concrete
DSMS Outgoing (call_ID, caller, time, event) Incoming (call_ID, callee, time, event) event = start or end Central Office ALICE BOB

24 Query 1 (self-join) Find all outgoing calls longer than 2 minutes
SELECT O1.call_ID, O1.caller FROM Outgoing O1, Outgoing O2 WHERE (O2.time – O1.time > 2 AND O1.call_ID = O2.call_ID AND O1.event = start AND O2.event = end) Result requires unbounded storage Can provide result as data stream Can output after 2 min, without seeing end

25 Query 2 (join) Pair up callers and callees
SELECT O.caller, I.callee FROM Outgoing O, Incoming I WHERE O.call_ID = I.call_ID Can still provide result as data stream Requires unbounded temporary storage … … unless streams are near-synchronized

26 Query 3 (group-by aggregation)
Total connection time for each caller SELECT O1.caller, sum(O2.time – O1.time) FROM Outgoing O1, Outgoing O2 WHERE (O1.call_ID = O2.call_ID AND O1.event = start AND O2.event = end) GROUP BY O1.caller Cannot provide result in (append-only) stream Output updates? Provide current value on demand? Memory?

27 DSMS – Architecture & Issues
Data streams and stored relations – Architectural differences. Declarative language for registering continuous queries Flexible query plans and execution strategies Centralized ? Distributed ?

28 Agenda Data Streams DSMS: Architecture & Issues Query Processing
What are they? Why now? Applications.. DSMS: Architecture & Issues Query Processing

29 DSMS – Issues Relation: Tuple Set or Sequence?
Updates: Modifications or Appends? Query Answer: Exact or Approximate? Query Evaluation: One of multiple Pass? Query Plan: Fixed or Adaptive?

30 Architectural Issues DSMS DBMS
Resource (memory, per-tuple computation) limited Reasonably complex, near real time, query processing Useful to identify what data to populate in database Query Evaluation: One pass Query Plan: Adaptive Resource (memory, disk, per-tuple computation) rich Extremely sophisticated query processing, analysis Useful to audit query results of data stream systems Query Evaluation: Arbitrary Query Plan: Fixed.

31 STREAM System Challenges
Must cope with: Stream Rates that may be high, variable, and bursty Stream data that may be unpredictable, variable Continuous query loads that may be high, variable Query Answer: Exact or Approximate?

32 Query Model User/Application Query Processor DSMS 32

33 Agenda Data Streams DSMS: Architecture & Issues Query Processing
What are they? Why now? Applications.. DSMS: Architecture & Issues Query Processing Language Operators Optimization Multi-Query Optimization

34 Agenda Data Streams DSMS: Architecture & Issues Query Processing
What are they? Why now? Applications.. DSMS: Architecture & Issues Query Processing Language Operators Optimization Multi-Query Optimization

35 Stream Query Language SQL extension
Queries reference/produce relations or streams Examples: GSQL [Gigascope], CQL [STREAM] Stream or Finite Relation Stream or Finite Relation Stream Query Language

36 Example: Continuous Query Language – CQL
Start with SQL Then add… Streams as new data type Continuous instead of one-time semantics Windows on streams (derived from SQL-99) Sampling on streams (basic)

37 Impact of Limited Memory
Continuous streams grow unboundedly Queries may require unbounded memory One solution: Approximate query evaluation

38 Approximate Query Evaluation
Why? Handling load – streams coming too fast Avoid unbounded storage and computation Ad hoc queries need approximate history How? Sliding windows, synopsis, samples, load-shed Major Issues? Metric for set-valued queries Composition of approximate operators How is it understood/controlled by user? Integrate into query language Query planning and interaction with resource allocation Accuracy-efficiency-storage tradeoff and global metric

39 Finite relations manipulated using SQL
Windows Mechanism for extracting a finite relation from an infinite stream Various window proposals for restricting operator scope: Windows based on ordering attribute (e.g. time) Windows based on tuple counts Windows based on explicit markers (e.g. punctuations) Variants (e.g., partitioning tuples in a window) Stream Finite relations manipulated using SQL Window specifications streamify

40 Windows Terminology Start time Current time t1 t2 t3 t4 t5 time
Sliding Window time Tumbling Window

41 Query Operators Selections - Where clause Projections - Select clause
Joins - From clause Group-by (Aggregations) – Group-by clause

42 Query Operators Selections and projections on streams - straightforward Local per-element operators Projection may need to include ordering attribute Joins – Problematic May need to join tuples that are arbitrarily far apart Equijoin on stream ordering attributes may be tractable Majority of the work focuses on joins using windows

43 Blocking Operators Blocking Simple Aggregates – output “update” stream
No output until entire input seen Streams – input never ends Simple Aggregates – output “update” stream Set Output (sort, group-by) Root – could maintain output data structure Intermediate nodes – try non-blocking analogs Join Apply sliding-window restrictions

44 Optimization in DSMS Traditionally table based cardinalities used in query optimizer. Goal of query optimizer: Minimize the size of intermediate results Problematic in a streaming environment – All streams are unbounded = infinite size! Need novel optimization objectives that are relevant when the input sources are streams

45 Query Optimization in DSMS
Novel notions of optimization: Stream rate based [e.g. NiagaraCQ] Resource-based [e.g. STREAM] QoS based [e.g. Aurora] Continuous adaptive optimization Possibilities that objectives cannot be met: Resource constraints Bursty arrivals under limited processing capabilities.

46 Stream Projects Amazon/Cougar (Cornell) – sensors
Aurora (Brown/MIT) – sensor monitoring, dataflow Hancock (AT&T) – telecom streams Niagara (OGI/Wisconsin) – Internet XML databases OpenCQ (Georgia) – triggers, incr. view maintenance Stream (Stanford) – general-purpose DSMS Tapestry (Xerox) – pub/sub content-based filtering Telegraph (Berkeley) – adaptive engine for sensors Tribeca (Bellcore) – network monitoring

47 END


Download ppt "Advanced Database Systems: DBS CB, 2nd Edition"

Similar presentations


Ads by Google