Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Software-Defined Networking based Approach for Performance Management of Analytical Queries on Distributed Data Stores Pengcheng Xiong (NEC Labs America)

Similar presentations


Presentation on theme: "A Software-Defined Networking based Approach for Performance Management of Analytical Queries on Distributed Data Stores Pengcheng Xiong (NEC Labs America)"— Presentation transcript:

1 A Software-Defined Networking based Approach for Performance Management of Analytical Queries on Distributed Data Stores Pengcheng Xiong (NEC Labs America) Hakan Hacigumus (NEC Labs America) Jeffrey F. Naughton (Univ. of Wisconsin)

2 Agenda Why?  Motivation and background How?  System architecture and implementation So what?  Real system and benchmark query evaluation Conclusion 2

3 Motivation Data analytics applications or data scientists query the data from distributed stores.  A huge amount of data traffic on the network. Join  Many applications want to share a cluster Data backup, video streaming, etc  Response time is critical Deadline-driven reports  Query service differentiation Batch queries, interactive queries 3

4 An example query (TPC-H Q14) 4 We assume that tables are distributed at relational data stores. Relational data stores are connected by networking

5 Network change implies plan perf. change 5 Phase 1 Phase 2 Phase 3 (1) Huge gap (2) The best plan can become the worst one Network status changes

6 What if? 6 Phase 1 Phase 2 Phase 3 What if query optimizer can dynamically monitor the network bandwidth and adaptively choose plan? Adaptive plan is chosen and query execution time is kept short.

7 Network busy implies no good plan 7 Run query right now and right away. I need that ASAP to catch my deadline! User Distributed DBMS Well… I am sorry. None of the candidate plans can meet your deadline due to current busy network status.

8 What if? 8 Run query right now and right away. I need that ASAP to catch my deadline! User Distributed DBMS OK. Although current network is busy, I can control it to prioritize the bandwidth for the query. What if query optimizer can control the network?

9 Distributed query optimizer monitors and controls the network? 9

10 Sounds like a mission impossible Database always treats the underneath networking as a black box  unable to monitor  let alone to control With software-defined networking  inquire about the current status of the network, or  control the network with directives 10 Networking With SDN Unable to monitor, let alone to control Able to inquire and control

11 Sounds interesting, but how? 11 Ethernet Switch/Router

12 12 Data Path (Hardware) Control Path (Software)

13 13 Data Path (Hardware) Control Path OpenFlow OpenFlow Controller OpenFlow Protocol (SSL/TCP) Dist. Query Optimizer API Our contribution

14 14 System architecture

15 15 System implementation NEC PFS5240

16 Plan generation 16 Stores lineitem table Stores part table

17 Cost estimation 17 Cost model for network operator  Amount of data transferred  Real-time transfer speed (Monitor)  Take any bandwidth left (Control)  Assign the highest priority  Make a bandwidth reservation

18 Evaluation Setup  TPC-H, scaling factor 100, Q14  Small tables (supplier, nation, region) are replicated.  Other tables are placed at a single data store site  Neighbor traffic generator-iperf  Summary of case studies 18

19 Case 1: single user, single-thread, iperf 19 Phase 1 Phase 2 Phase 3 Bottleneck Based on SDN, query optimizer can dynamically monitor the network bandwidth and adaptively choose the best plan

20 Case 3: multiple users, multiple-thread, no contention traffic, priority queue 20 Based on SDN, premium queries run faster than regular ones. Based on SDN, all queries run faster.

21 Case study 5: single user, multi-thread, iperf, weighted-fair queue 21 Based on SDN, more reservation makes queries run faster.

22 Conclusion SDN can be effectively exploited for performance management of analytical queries on distributed data stores  Directly monitor the network and adaptively pick the best plan.  Control the priority of network traffic or make network bandwidth reservations to differentiate the query service. Lots of opportunities 22

23 23 Thanks!


Download ppt "A Software-Defined Networking based Approach for Performance Management of Analytical Queries on Distributed Data Stores Pengcheng Xiong (NEC Labs America)"

Similar presentations


Ads by Google