Index Tuning for Adaptive Multi-Route Data Stream Systems Karen Works, Elke A. Rundensteiner, and Emmanuel Agu Database Systems Research.

Index Tuning for Adaptive Multi-Route Data Stream Systems Karen Works, Elke A. Rundensteiner, and Emmanuel Agu kworks@wpi.edu Database Systems Research Group (DSRG) Computer Science Department Worcester Polytechnic Institute This work is supported under NSF Grant 0917017, NSF CNS CRI Grant 0551584 (equipment grant), NSF Grant 0414567, and GAANN Grant. 1

Rudiments of Stream Processing essential to produce rapid results function over long periods of time data arrival rates commonly experience frequent fluctuations 2

1)Memory/CPU utilization 2)Query responsiveness Q1 : Select * From StreamA, StreamB, StreamC Where StreamA.z = StreamB.z and StreamB.y= StreamC.y and StreamC.x = StreamA.x [Range 5 Minutes] *- Avnur and et. al., Eddies: continuously adaptive query processing. (SIGMOD'00). *- Raman and et. al., Using State Modules for Adaptive Query Processing. (ICDE'03). ABC ABC STeMs StatesStored Tuples Join Operators Eddy ABCStreamsNew Tuples Output results Adaptive Multi-Route Systems (AMR) 3

Background Indexing Research Adaptive Multi-Route System Research 4

StatePossible Indices Ax, z, and (x and z combined) By, z, and (y and z combined) Cx, y, and (x and y combined) Eddy ABC ABC ABC STeMs States Streams Stored Tuples New Tuples Join Operators Access Modules Indexx X&Z Y x X&Y Y Output results 1)Memory/CPU utilization 2)Query responsiveness Q1 : Select * From StreamA, StreamB, StreamC Where StreamA.z = StreamB.z and StreamB.y= StreamC.y and StreamC.x = StreamA.x [Range 5 Minutes] 5 *- Avnur and et. al., Eddies: continuously adaptive query processing. (SIGMOD'00). *- Raman and et. al., Using State Modules for Adaptive Query Processing. (ICDE'03).

Goal Indexing Research Adaptive Multi-Route System Research (AMR) Can we customize an index design for AMR Systems to improve query responsiveness ? 6

B Index Requirements for AMR Eddy ABCStreams ACSTeMs ABC States New Index Design results B 1)support many access patterns 2)require minimal CPU to maintain 3)maintainable in main memory 4)easily adaptable to work loads 7

Index Data Structure Data structure  bit-address index based solution … search request hashA1(1001) = 7 = 00111 hashA2(*) = 00 ~ 11 hashA3(‘MA’) = 2 = 010 bucket_addr1 = 0011100010 = 226 bucket_addr2 = 0011101010 = 234 bucket_addr3 = 0011110010 = 242 bucket_addr4 = 0011111010 = 250 insert tuple Partition Address hashA1(1001) = 7 = 00111 hashA2(‘student’) = 3 = 11 hashA3(‘MA’) = 2 = 010 bucket_addr = 0011111010 = 250 Address Book … 0 1023 1 Bucket 0Bucket 1023 … Bucket 1 A1A2A3 Bucket 250 IMportance-based Partitioning Index (IMP Index) 1001studentMA 1001*MA A. Aho and et. al.: Optimal Partial-Match Retrieval When Fields Are Independently Specified. (ACM TODS ‘79) L. Ding and et, al, Index Tuning for Parameterized Streaming Groupby Queries. (SSPS'08). 8

B Bit-address Index Meets the Requirements Eddy ABCStreams ACSTeMs ABC States Bit- address Index results B 1)support many access patterns 2)require minimal CPU to maintain 3)maintainable in main memory 4)easily adaptable to work loads 9

Index Assessment 1)Should all possible statistics be maintained? Periodically the router sends search requests to suboptimal operators to update system statistics. The extremely low frequencies of these suboptimal search requests are not likely to influence the final indices selected, yet they add additional overhead. 2)How much resources should be dedicated to Index Assessment? the overhead of assessment must not affect query responsiveness (i.e., index assessment must be light weight) Goal - gather statistics about query paths selected by the router 10

Index Assessment: Statistics Collected 11

Assessment Statistics Storage – Option 1 Self Reliant Index Assessment - SRIA What? – Store count of every access pattern received How? – Hash table. Maps each access pattern to a unique binary representation 12

Compact Self Reliant Index Assessment - CSRIA *- modeled after a heavy hitter algorithm proposed by Manku, and Motwan. Approximate frequency counts over data streams. (VLDB’02). What? – Remove access patterns that fall below a preset threshold How? – Hash table. Map each access pattern to a unique binary representation During assessment – removes the statistics that fall below a preset error rate End of assessment – returns all statistics above a preset threshold Assessment Statistics Storage – Option 2 13

CSRIA Example 14

Relationships between access patterns 15

Assessment Storage – Option 3 Dependent Index Assessment - DIA What? – Store count of every access pattern received Keep search benefit relationships How? – Logically - Lattice Physically - Hash table. 16

Compressed Dependent Index Assessment CDIA  Random combination randomly picks a single parent node  Highest count combination picks the single parent node with the highest frequency count thus far $ *- modeled after a hierarchical heavy hitter algorithm proposed by Cormode and et. al., Finding hierarchical heavy hitters in data streams. (VLDB’03). What? – Combine access patterns that fall below a preset threshold How? – Hash table-keep search benefit relationships During assessment – removes the statistics that fall below a preset error rate End of assessment – returns all statistics above a preset threshold Assessment Storage – Option 4 17

CDIA Example Level 4 Level 3 Level 2 Level 1 After Compression Before Compression locates the optimal index configuration 18

AMRI Framework Eddy ABCStreams ACSTeMs ABC States Bit- address Index results B 19 Access pattern statisticsIndex configuration AMR Online Index Tuner Index Assessor Index Selector AMR Query Executor

Experiments 20

Experimental Set Up  Testing system CAPE * prototype continuous query engine  Testing machine 3GHz Intel® Pentium-IV, 1GB RAM Windows XP, Java 1.5.0_06 SDK  Design 4 way join query across 4 data streams The IC on each state uses 64 bits The maximum error = 5% and threshold 10% *-E. A. Rundensteiner and et. al., CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity. (VLDB Demo, 2004). 21

Summary of Experimental Results CDIA using highest count compression produced on average 19% more results (cumulative throughput) than both DIA and SRIA, and 30% more results than CSRIA over the same period of time. AMRI produced on average 93% more results (cumulative throughput) than the current indexing approach and 75% more results than the bitmap indexing approach over the same period of time. 24

Conclusion We developed the first customized Adaptive Multi-Route Index for AMR systems. We proposed 4 customized AMR systems assessment methods (SRIA, CSRIA, DIA, and CDIA). Our experiments demonstrate overall effectiveness of our AMRI at improving throughput in dynamic stream environments compared to the state-of-art approach. 25

Thank you! Welcome to DSRG Website http://davis.wpi.edu/dsrg/ 26

Index Tuning for Adaptive Multi-Route Data Stream Systems Karen Works, Elke A. Rundensteiner, and Emmanuel Agu Database Systems Research.

Similar presentations

Presentation on theme: "Index Tuning for Adaptive Multi-Route Data Stream Systems Karen Works, Elke A. Rundensteiner, and Emmanuel Agu Database Systems Research."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Index Tuning for Adaptive Multi-Route Data Stream Systems Karen Works, Elke A. Rundensteiner, and Emmanuel Agu Database Systems Research.

Similar presentations

Presentation on theme: "Index Tuning for Adaptive Multi-Route Data Stream Systems Karen Works, Elke A. Rundensteiner, and Emmanuel Agu Database Systems Research."— Presentation transcript:

Similar presentations

About project

Feedback