Presented by Vigneshwar Raghuram

Slides:



Advertisements
Similar presentations
HANA database lectures March ©2013 SAP AG or an SAP affiliate company. All rights reserved.2 Outline Part 1 Motivation - Why main memory processing.
Advertisements

Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
Transaction.
6.814/6.830 Lecture 8 Memory Management. Column Representation Reduces Scan Time Idea: Store each column in a separate file GM AAPL.
Final Exam Coverage. E/R Converting E/R to Relations. SQL. –Joins and outerjoins –Subqueries –Aggregations –Views –Inserts, updates, deletes –Ordering.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Chapter 3 Database Management
Physical Database Monitoring and Tuning the Operational System.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Copying, Managing, and Transforming Data With DTS.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
Cloud Computing Lecture Column Store – alternative organization for big relational data.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads Iraklis Psaroudakis (EPFL), Tobias Scheuer (SAP AG), Norman May.
Introduction to Databases A line manager asks, “If data unorganized is like matter unorganized and God created the heavens and earth in six days, how come.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
1 Oracle Database 11g – Flashback Data Archive. 2 Data History and Retention Data retention and change control requirements are growing Regulatory oversight.
September 2011Copyright 2011 Teradata Corporation1 Teradata Columnar.
Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)
Workload-Aware Aggregate Maintenance in Columnar In-Memory Databases Stephan Müller, Lars Butzmann, Stefan Klauck, Hasso Plattner 2013 IEEE International.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Sensor Database System Sultan Alhazmi
DATABASE MGMT SYSTEM (BCS 1423) Chapter 5: Methodology – Conceptual Database Design.
1 Data Warehouses BUAD/American University Data Warehouses.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
MIS2502: Data Analytics The Information Architecture of an Organization.
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
The Volcano Optimizer Generator Extensibility and Efficient Search.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Supporting Large-scale Social Media Data Analyses with Customizable Indexing Techniques on NoSQL Databases.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Session 1 Module 1: Introduction to Data Integrity
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.
Chapter 1 Database Access from Client Applications.
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
REX: RECURSIVE, DELTA-BASED DATA-CENTRIC COMPUTATION Yavuz MESTER Svilen R. Mihaylov, Zachary G. Ives, Sudipto Guha University of Pennsylvania.
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
An Overview of Data Warehousing and OLAP Technology
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
Bigtable A Distributed Storage System for Structured Data.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Intro to MIS – MGS351 Databases and Data Warehouses
Indexes By Adrienne Watt.
Methodology – Physical Database Design for Relational Databases
Physical Database Design for Relational Databases Step 3 – Step 8
Data Warehouse.
Paritosh Aggarwal Rushi Nadimpally
Data, Databases, and DBMSs
SQL 2014 In-Memory OLTP What, Why, and How
Database management concepts
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
View and Index Selection Problem in Data Warehousing Environments
Introduction to SAP HANA
Database management concepts
Data Warehousing Concepts
ANTOL, HOLIČ, KEDA PA195 SPRING 2016
Presentation transcript:

Presented by Vigneshwar Raghuram 4/15/2017 8:00 PM Efficient Transaction Processing in SAP HANA Database – The End of a Column Store Myth - Vishal Sikka, Franz Farber, Wolfgang Lehner, Sang Kyun Cha, Thomas Peh, Christof Bornhovd Presented by Vigneshwar Raghuram © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Outline Motivation Architecture of SAP HANA Lifecycle Management of Database Records Merge Optimization Conclusion

Motivation Usage perspective -various types of workloads and usage patterns OLTP - high concurrency, frequent updates, and selective point queries OLAP - long transactions, infrequent updates, aggregation queries, and historical data Zoo of specialized systems Complex and error-prone High total cost of ownership (TCO) Used for performance

SAP HANA Appliance At a Glance Replace the zoo of specialized systems with a flexible platform

Features of SAP HANA database Has a girl's name (Hanna) Comprises multiple engines from relational data to graphs to unstructured text data Supports application-specific business objects and logic directly Communicates with the application layer efficiently Supports efficient processing for both OLTP and OLAP workloads

Outline Motivation Architecture of SAP HANA Lifecycle Management of Database Records Merge Optimization Conclusion

Architecture of SAP HANA

Calculation Graph Model An internal representation of query is mapped to a Calculation Graph Source nodes - table structures or outcome of other calc graphs Inner nodes - logical operators Operators Intrinsic operators like projection, joins, union etc Business algorithms like currency conversion Dynamic SQL nodes, custom nodes, R nodes, and L nodes Split and combine

Calculation Graph Model - Example

Outline Motivation Architecture of SAP HANA Lifecycle Management of Database Records Merge Optimization Conclusion

Lifecycle Management of Database Records

L1-delta Storage L1-delta Storage Accepts all incoming data requests Stores records in row format (write-optimized) No data compression Holds 10,000 to 100,000 rows per single-node

L2-delta Storage Accepts bulk inserts Stores records in column format (an index vector) Uses dictionary encoding for better memory usage Unsorted dictionary CSB-Tree based secondary index for point access Inverted index mapping value IDs to positions

Main Store Stores records in column format Employs a sorted dictionary Highest compression rate Positions in the dictionary are stored in a bit-packed manner Dictionary is also compressed using RLE and other techniques

Unified Table Access A common abstract interface to access different stores Records are propagated asynchronously Two transformations between stores called merge steps

L1-to-L2-delta Merge Row format to column format conversion Merge Steps Appending new entries to the dictionary (in parallel) Storing column values using the dictionary encodings (in parallel) Removing propagated entries from the L1-delta

Merge from L1-delta to L2-delta A straightforward task The first two steps can be performed in parallel L2-delta data structures are not reconstructed Incremental merge Minimally invasive to running transactions

L2-delta-to-main Merge A resource intensive task The old L2-delta is closed for updates A new empty L2-delta is created A new main structure is created The merge is retried on failures

Persistency Mapping No fine-grained UNDO mechanisms Using REDO logs for new data in L1- and L2-delta and the event of merge Propagating pages that contain data structures in L2- delta to persistent storage at next savepoint Storing a new version of the main store on the persistent storage

Outline Motivation Architecture of SAP HANA Lifecycle Management of Database Records Merge Optimization Conclusion

Merge Optimization The classic merge needs optimization L2-delta to main merge is resource intensive Main store needs high compression rate Re-sorting merge: higher compression rate Partial merge: reduce overhead of merge

Classic Merge

Re-Sorting Merge Individual columns are re-sorted to gain higher compression rate A mapping table of row positions is added to reconstruct the row Sort order of columns are based on statistics from main and L2-delta

Re-Sorting Merge

Partial Merge Reduce merge overhead due to a large table size Split the main into two independent structures Passive main - not part of the merge process Active main - takes part in the merge process with the L2-delta - only holds new values not in the passive main Accesses are resolved in both dictionaries and parallel scans are performed on both structures

Partial Merge

Conclusion The HANA database is the core of SAP application ecosystem a main-memory database that efficiently supports both OLTP and OLAP consisting of different states of data structures but providing a common interface optimized for memory requirements and query processing