Next-Generation Databases Miguel Branco on behalf of the RAW team.

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

Query Methods (SQL). What is SQL A programming language for databases. SQL (structured Query Language) It allows you add, edit, delete and run queries.
Michael Pizzo Software Architect Data Programmability Microsoft Corporation.
Robust query processing Goetz Graefe, Christian König, Harumi Kuno, Volker Markl, Kai-Uwe Sattler Dagstuhl – September 2010.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
SQL SERVER 2012 XVELOCITY COLUMNSTORE INDEX Conor Cunningham Principal Architect SQL Server Engine.
Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31 st 2011.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Putting the Sting in Hive Page 1 Alan F.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 2 Hidden Gems of APEX David Gale Software Engineer Oracle Application Express November,
Virtual techdays INDIA │ august 2010 Building ASP.NET applications using SQL Server Compact Chaitanya Solapurkar │ Partner Technical Consultant,
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Business ByDesign Attributes Analysis Team: Ekaterina Gavrilova, Lars Butzmann.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Introduction to Accounting Information Systems
Introduction. Outline What is database tuning What is changing The trends that impact database systems and their applications What is NOT changing The.
Data: Migrating, Distributing and Audit Tracking Michelle Ayers, Advisory Solution Consultant
An introduction to SQL 1/21/2014 – See chapter 2.3 and 6.1 PostgreSQL -
Goodbye rows and tables, hello documents and collections.
VectorWise The world’s fastest database GIUA, 13 September 2011.
1 Data Warehouses BUAD/American University Data Warehouses.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
© 2008 Quest Software, Inc. ALL RIGHTS RESERVED. Perfmon and Profiler 101.
NoDB: Querying Raw Data --Mrutyunjay. Overview ▪ Introduction ▪ Motivation ▪ NoDB Philosophy: PostgreSQL ▪ Results ▪ Opportunities “NoDB in Action: Adaptive.
BACS 287 Structured Query Language 1. BACS 287 Visual Basic Table Access Visual Basic provides 2 mechanisms to access data in tables: – Record-at-a-time.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
John Pickford IBM H11 Wednesday, October 4, :30. – 14:30. Platform: Informix Practical Applications of IDS Extensibility (Part 2 of 2)
Relational DBs Basics. Formally understood Set theoretic Originally defined with an algebra, with Selection, Projection, Join, and Union/Difference/Intersection.
Clusterpoint Margarita Sudņika ms RDBMS & NoSQL Databases & tables → Document stores Columns, rows → Schemaless documents Scales UP → Scales UP.
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Google Map Engine Can export images to Map Engine from Earth Engine
High-Performance Querying on RAW data Anastasia Ailamaki EPFL.
RAW A database for high-performance querying of raw data Miguel Branco.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
The ATLAS TAGs Database - Experiences and further developments Elisabeth Vinek, CERN & University of Vienna on behalf of the TAGs developers group.
Andrea Valassi (CERN IT-DB)CHEP 2004 Poster Session (Thursday, 30 September 2004) 1 HARP DATA AND SOFTWARE MIGRATION FROM TO ORACLE Authors: A.Valassi,
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
1 Introduction to Database Systems, CS420 SQL Views and Indexes.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
SAP Tuning 실무 SK㈜ ERP TFT.
Oracle Exalytics Business Intelligence Machine Eshaanan Gounden – Core Technology Team.
Hadoop file format studies in IT-DB Analytics WG meeting 20 th of May, 2015 Daniel Lanza, IT-DB.
Data Visualization with Tableau
A Look Under The Hood The fastest, in-memory analytic database in the world Dan Andrei STEFAN
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
Designing High Performance BIRT Reports
Integrating the R Language Runtime System with a Data Stream Warehouse
Execution Planning for Success
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Prepared by : Ankit Patel (226)
Database Performance Tuning and Query Optimization
1 Demand of your DB is changing Presented By: Ashwani Kumar
MANAGING DATA RESOURCES
Cloud Web Filtering Platform
(A Research Proposal for Optimizing DBMS on CMP)
Chapter 11 Database Performance Tuning and Query Optimization
Dashboard in an Hour Using Power BI
Presentation transcript:

Next-Generation Databases Miguel Branco on behalf of the RAW team

Trends More complex hardware –Multicores, GPUs, Cloud, NUMA*, PoP+SoC**, … More complex questions –“Last month sales”  “Next month sales” More complex apps –Distributed, Service-oriented, Rack-aware,... More data analysts –Easy-of-use, Interactivity, Collaboration,.. More data –Volume, File Formats,... 2 * Non-uniform memory architectures ** Package on Package, System on a Chip

3

No data loading –No “physical” data copy: support existing file formats No database tuning –Instead, self-tuned based on actual usage patterns Not restricted to tables –Add support for trees, vectors, matrices, … Not just SQL –Instead, enable domain-specific languages 4

Traditional Database 5 Data adapts to the query engine DBMS SQL CSVXML JSON

RAW 6 Query engine adapts to the data DBMS SQL CSVXML JSON RAW lang “DSL”

How RAW adapts to data CSV ROOT join scan root scan csv filter … containing “good” run numbers … containing physics events Code Generate the Access Paths Code Generate the Query Build Position and Data Caches SELECT event.jet… FROM csv, root WHERE csv.RunNumber = root.RunNumber AND root. EF_2mu13 == TRUE AND … Adapt to format, file instance and query just-in-time

Adapting to schema & query GENERAL-PURPOSE readInt(); skipField(); readFloat(); skipRestLine(); JUST-IN-TIME Remove overhead of generic operators 8

Adapting to format Unroll Columns Free navigation in files Embedded indexes/existing APIs readInt(); skipField(); readFloat(); skipRest(); -fieldLength:10 -tupleLength:100 -Need fields 2 & 5 of 2 nd row moveTo(110); readInt(); moveTo(140); readFloat(); -Bitmaps, R-Trees etc. -readNextField() vs. readField(filename,id) 9

JIT – OPTION 2 Filter Tuple Construction Col1 Col9 Col1 Col9 Col Ad-Hoc Operators for Raw Data Col9 Scan CSV Columns 1,9 Filter Tuple Construction Col1 JIT – OPTION 1 Col1 Col9 CSV file: SELECT col9 WHERE col1 < [X] Scan CSV Column 1 Fine-grained, raw-data-aware decisions Scan CSV Column 9 10 Processes 3/5 raw col9 fields

Electron eventID INT eta FLOAT pt FLOAT Jet eventID INT eta FLOAT pt FLOAT Event eventID INT runNumber INT Muon eventID INT eta FLOAT pt FLOAT ROOT - C++RAW class Event { class Muon { float pt, eta; … } class Electron { float pt, eta; … } class Jet { float pt, eta; … } int runNumber; vector muons; vector electrons; vector jets; } HEP analysis: Data 11

HEP analysis: Queries “Identify events of interest → Filter out background events → Plot aggregated results in a histogram” SELECT event FROM root:/data1/ATLAS/*.root, csv:/data1/ATLAS/events.csv WHERE ( csv.id = event.id AND event.EF_e24vhi_medium1 OR event.EF_e60_medium1 OR event.EF_2e12Tvh_loose1 OR event.EF_mu24i_tight OR event.EF_mu36_tight OR event.EF_2mu13) AND event.muon.mu_ptcone20 < 0.1 * event.muon.mu_pt AND event.muon.mu_pt > AND ABS(event.muon.mu_eta) < 2.4 AND … lines of C++ for (unsigned int imuon = 0 ; imuon size(); imuon++) { if (((*curr_entries)[jentry]. mu_ptcone20)->at(imuon) < 0.1 * ((*curr_entries)[jentry]. mu_pt)->at(imuon) && ((*curr_entries)[jentry]. mu_pt)->at(imuon) > && fabs(((*curr_entries)[jentry]. mu_eta)->at(imuon)) < 2.4 && … }... ROOT - C++RAW 12

RAW vs. the ROOT framework [Xeon CPU 2.13GHz 1TB HDD RPM,192GB RAM] ROOT: 900 GB in 127 files CSV: 1 “table” of IDs 13 Declarative queries + up to 90x improvement

RAW for High-Energy Physics End-users: –Performance (JIT, codegen, vectorwise, …) –Easy-to-use (declarative) query language Infrastructure Providers: –Data kept in original location & file format –Declarative query language  More optimization opportunities “Event” caches Thank You! 14