HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier.

Slides:



Advertisements
Similar presentations
Shared-Memory Model and Threads Intel Software College Introduction to Parallel Programming – Part 2.
Advertisements

1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Computer Networks TCP/IP Protocol Suite.
1 UNIT I (Contd..) High-Speed LANs. 2 Introduction Fast Ethernet and Gigabit Ethernet Fast Ethernet and Gigabit Ethernet Fibre Channel Fibre Channel High-speed.
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Chapter 1 The Study of Body Function Image PowerPoint
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
Processes and Operating Systems
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
OSPF 1.
1 IEEE Media Independent Handoff Overview of services and scenarios for 3GPP2 Stefano M. Faccin Liaison officer to 3GPP2.
Evaluating Caching and Storage Options on the Amazon Web Services Cloud Gagan Agrawal, Ohio State University - Columbus, OH David Chiu, Washington State.
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
1 Hyades Command Routing Message flow and data translation.
DCV: A Causality Detection Approach for Large- scale Dynamic Collaboration Environments Jiang-Ming Yang Microsoft Research Asia Ning Gu, Qi-Wei Zhang,
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Chapter 6 File Systems 6.1 Files 6.2 Directories
1 Chapter 11 I/O Management and Disk Scheduling Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and.
1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
SE-292 High Performance Computing
Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel.
Solid-state storage & DBMS CIDR 2013 Manos Athanassoulis 1.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
HyLog: A High Performance Approach to Managing Disk Layout Wenguang Wang Yanping Zhao Rick Bunt Department of Computer Science University of Saskatchewan.
Database Performance Tuning and Query Optimization
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
Database Performance Tuning and Query Optimization
Chapter 1: Introduction to Scaling Networks
OOAD – Dr. A. Alghamdi Mastering Object-Oriented Analysis and Design with UML Module 3: Requirements Overview Module 3 - Requirements Overview.
The Platform as a Service Model for Networking Eric Keller, Jennifer Rexford Princeton University INM/WREN 2010.
MySQL Access Privilege System
Online Algorithm Huaping Wang Apr.21
1 A Case for MLP-Aware Cache Replacement International Symposium on Computer Architecture (ISCA) 2006 Moinuddin K. Qureshi Daniel N. Lynch, Onur Mutlu,
Virtual Memory II Chapter 8.
Memory Management.
A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:
1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.
VOORBLAD.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Chapter 6 File Systems 6.1 Files 6.2 Directories
1 Developing a Predictive Model for Internet Video Quality-of-Experience Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica,
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 EN0129 PC AND NETWORK TECHNOLOGY I IP ADDRESSING AND SUBNETS Derived From CCNA Network Fundamentals.
Database System Concepts and Architecture
© 2012 National Heart Foundation of Australia. Slide 2.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Executional Architecture
Global Analysis and Distributed Systems Software Architecture Lecture # 5-6.
HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
25 seconds left…...
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Addressing the Network – IPv4 Network Fundamentals – Chapter 6.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
PSSA Preparation.
The DDS Benchmarking Environment James Edmondson Vanderbilt University Nashville, TN.
Chapter 13 The Data Warehouse
A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees Shimin Chen* Phillip B. Gibbons* Suman Nath + *Intel Labs Pittsburgh.
Management Information Systems, 10/e
NetSlices: Scalable Multi-Core Packet Processing in User-Space Tudor Marian, Ki Suh Lee, Hakim Weatherspoon Cornell University Presented by Ki Suh Lee.
Scalable Rule Management for Data Centers Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan 4/3/2013.
New Opportunities for Load Balancing in Network-Wide Intrusion Detection Systems Victor Heorhiadi, Michael K. Reiter, Vyas Sekar UNC Chapel Hill UNC Chapel.
Hystor : Making the Best Use of Solid State Drivers in High Performance Storage Systems Presenter : Dong Chang.
Lecture 9: Data Storage and IO Models
(A Research Proposal for Optimizing DBMS on CMP)
Presentation transcript:

hStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier Feng Chen The Ohio State University Intel Labs

2 Heterogeneous Storage Resources vs. Diverse QoS Requirements of DB Requests Storage advancement provides us with – High capacity, low cost, but slow hard disk devices (HDD) – Fast, low power, but expensive solid state devices (SSD) – HDD and SSD co-exist due to their unique merits and limits DB requests have diverse QoS requirements – Different access patterns: bandwidth/latency demands – Different priorities of data processing requests – Dynamic changes of requirements Hybrid storage can well satisfy diverse QoS of DB requests – should be automatic and adaptive with low overhead – But with challenges

3 Challenges for Hybrid Storage Systems to Satisfy Diverse QoS Requirements DBMS (What I/O services do I need as a storage user?) – Classifications of I/O requests into different types – hStorage awareness – DBMS enhancements to utilize classifications automatically hStorage (What can I do for you as a service provider?) – Clear definition of supported QoS classifications – Hide device details to DBMS – Efficient data management among heterogeneous devices Communication between DBMS and hStorage – Rich information to deliver but limited by interface abilities – Need a standard and general purpose protocol

4 Current interface to access storage read/write(int fd, void *buf, size_t count); On-disk locationIn-memory data Request size This interface cannot inform storage the per-request QoS. So, we must take other approaches.

5 DBA-based Approach DBAs decide data placement among heterogeneous devices based on experiences Limitations: – Significant human efforts: expertise on both DB and storage. – Large granularity, e.g. table/partition-based data placements – Static storage layout: Tuned for the “common” case Could not well respond to execution dynamics Indexes Other data DBMS SSDHDD

6 Monitoring-based Solutions Storage systems automatically make data placement and replacement decisions, by monitoring access patterns – LRU (a basic structure), LIRS (MySQL), ARC (IBM I/O controller) – Examples from industry: Solid State Hybrid Drive (Seagate) Easy Tier (IBM) Limitations: – Takes time to recognize access patterns Hard to handle dynamics in short periods – With concurrency, access patterns cannot be easily detected – Certain critical insights are not access patterns related Domain information (available from DBMS) is not utilized

7 What information from DBMS we can use? System catalog – Data type: index, regular table – Ownership of data sets: e.g. VIP user, regular user Query optimizer – Orders of operations and access path – Estimated frequency of accesses to related data Query planner – Access patterns Execution engine – Life cycles of data usage They are un-organized semantic information for I/O requests

8 DBMS Knowledge is not Well Utilized Buffer Pool Manager Storage Manager Request Storage I/O Request Query Optimizer System Catalog Execution Engine … Block interface: r/w, LBN, data, size Does not consider critical semantic information for storage management

9 Goal: organize/utilize DBMS semantic Information Buffer Pool Query Optimizer Checkpoint Vacuum Bkgd. processes Connection pool User1 User2 。。。 DBMS Sequential Random Repeated scan Sys tableIndexUser TableTemp data The mission of hStorage-DB is to fill this gap. Storage Semantic gap

10 hStorage-DB: DBMS for hStorage Objectives: – Automatic system management – High performance Utilizing available domain knowledge within DBMS for storage I/O Fine-grained data management (block granularity) Well respond to the dynamics of DB requests with different QoS reqs System Design Outline – A hStorage system specifies a set of QoS policies – At runtime, the DBMS selects the needed policy for each I/O request based on the organized semantic information – I/O requests and their QoS policies are passed to hStorage system – The hStorage system makes data placement actions accordingly.

11 Outline  Introduction  hStorage-DB  Caching priority of each I/O request  Evaluation

12 Structure of hStorage-DB Buffer Pool Manager Storage Manager Info 1... Info NQoS policy (Policy assignment table) Request+ Semantic Information Storage System Control Logic I/O Request+ QoS policy SSD …… HDD Query Optimizer Query Planner Execution Engine …

13 Highlights of hStorage-DB Policy assignment table – Stores all the rules to assign a QoS policy for each I/O request – Assignments are made on organized DB semantic information Communication between a DBMS and hStorage – The QoS policy for each I/O request is delivered to a hStorage system by protocol of “Differentiated Storage Services” (SOSP’11) – hStorage system makes action accordingly

14 The Interface Used in hStorage-DB fd=open("foo", O_RDWR|O_CLASSIFIED, 0666); qos = 19; myiov[0].iov_base = &qos; myiov[0].iov_len = 1; myiov[1].iov_base = “Hello, world!”; myiov[1].iov_len = 13; writev(fd, myiov, 2); Open with a flag QoS policy of this equestPayload QoS is delivered with the payload

15 QoS Policies They are high-level abstractions of hStorage systems – Hide device complexities – Match resource characteristics QoS policy examples: High bandwidth (parallelism in SSD/disk array) Low latency for random accesses (SSD) Low latency for large sequential accesses (HDD) Reliability (data duplications) For a caching system – caching priorities: Priority 1, Priority 2, …, Bypass

16 Outline  Introduction  Design of hStorage-DB  Caching priority for each I/O request  Evaluation

17 Caching Priorities as QoS Policies Priorities are enumerated – E.g. 1, 2, 3, …, N – Priority 1 is the highest priority Data from high-priority requests can evict data cached for low-priority requests Special “priorities” – Bypass Requests with this priority will not affect in-cache data – Eviction Data accessed by requests with a eviction “priority” will be immediately evicted out of cache – Write buffer

18 From Semantic Information to Caching Priorities Principle: 1.possibility of data reuse: no reuse, no cache 2.benefit from cache: no benefit, no cache (repeated scan) Methodology: 1.Classify requests into different types (focus on OLAP) Sequential access Random access Temporary data requests Update requests 2.Associate each type with a caching priority Some types are further divided into subtypes 3.The hStorage system makes placement decisions accordingly upon receiving each I/O request

19 Policy Assignment Table Sequential accessesPriority 1 Priority 2 Priority N Bypass Eviction Write Buffer … Random accesses Temporary data accesses Temporary Data delete Updates

20 Random Requests Determined by operator position in query plan tree Follows the iteration model Join on: t.a Index Scan Join on: t.a Index Scan Join on: t.b Index Scan on: t.b Sequential Scan Hash Join on: t.c Index Scan Priority 2 Priority 4 Bypass

21 Concurrent Queries Concurrent queries may access the same object – Causing non-deterministic priority for random requests: Because each query may have a different query plan tree Solution – A data structure that “aggregates” all concurrent query plan trees – The data structure is updated at the start and end of each query – Each of the concurrent queries will be assigned a QoS policy based on analytics

22 Outline  Introduction  Design of hStorage-DB  Caching priority each I/O request  Evaluation

23 Experimental setup Dual-machine setup (with 10GB Ethernet) – A DBMS: hStorage-DB based on PostgreSQL – A dedicated storage system, with an SSD cache Configuration – Xeon, 2-way, quad-core 2.33GHz, 8GB RAM, – 2 Seagate 15.7K rpm HDD – SSD cache: Intel 320 Series 300GB (use 32GB) Workload – (46GB with 7 indexes)

24 Diverse Request Types in TPC-H Most queries are dominated by sequential requests Queries 2,8,9,20,21 have a large number of random requests Query 18 has a large number of temporary data requests

25 No overhead for cache-insensitive queries Current SSD cannot speed up these queries Caching may harm performance (LRU) hStorage-DB does not incur overhead for sequential requests

26 Working Well for Cache-Effective Queries Random requests benefit from SSD High locality can be captured by the traditional LRU hStorage-DB achieves high performance without monitoring efforts 5.77X5.86X7.19X

27 Efficiently Handling Temporary Data Requests hStroage-DB: – Temporary data is cached as long as its lifetime, and evicted immediately at the end of lifetime – Lifetime is hard to detect, if not informed semantically 1.49X1.46X 1.03X

28 Concurrency (Throughput) Performance in concurrencyPerformance in independent execution

29 Summary DBMS could exploit organized semantic information DBMS should be hStorage-aware (QoS policies) A set of rules to determine the QoS policy (caching priority) for each I/O request Experiments on hStorage-DB shows its effectiveness

30 Thank you! Questions?