10-fold increase in data volume every 5 years “DW has shifted almost entirely towards the appliance model due to speed of the balanced appliance and.

Slides:



Advertisements
Similar presentations
Indexing HDFS Data in PDW: Splitting the data from index 1 Vinitha Gankidi #, Nikhil Teletia *, Jignesh M. Patel #, Alan Halverson *, David J. DeWitt *
Advertisements

Relational and Non-Relational Data Living in Peace and Harmony
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN TechTalk Beste Skalierbarkeit dank massiv.
Exadata Distinctives Brown Bag New features for tuning Oracle database applications.
Big Data Working with Terabytes in SQL Server Andrew Novick
10 REASONS Why it makes a good option for your DB IN-MEMORY DATABASES Presenter #10: Robert Vitolo.
High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.
A Fast Growing Market. Interesting New Players Lyzasoft.
Danny Tambs Solution Architect. VOLUME (Size) VARIETY (Structure) VELOCITY (Speed)
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Microsoft Ignite /16/2017 4:08 PM
Microsoft Ignite /16/2017 5:47 PM
Making Data Warehouse Easy Conor Cunningham – Principal Architect Thomas Kejser – Principal PM.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design BI Best Practices and Tuning for Scaling SQL Server.
April 10-12, Chicago, IL PDW Architecture Gets Real: Customer Implementations Brian Walker | Microsoft Corporation PDW Center of Excellence Murshed Zaman.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
SQL Server Warehousing (Fast Track 4.0 & PDW)
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Introduction To Windows Azure Cloud
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information herein is subject to change without notice. HP Restricted. HP AppSystem for.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Introduction to Hadoop and HDFS
SQL Server 2008 R2 for the DBA Patrick LeBlanc. Objectives  New Editions  Datacenter  Parallel Data Warehouse  Multi-server management  Utility Control.
Data Warehousing at Acxiom Paul Montrose Data Warehousing at Acxiom Paul Montrose.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
(C) 2008 Clusterpoint(C) 2008 ClusterPoint Ltd. Empowering You to Manage and Drive Down Database Costs April 17, 2009 Gints Ernestsons, CEO © 2009 Clusterpoint.
The Oracle9i Multi-Terabyte Data Warehouse Jeff Parker Manager Data Warehouse Development Amazon.com Session id:
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN SQL Server 2012 Parallel Data Warehouse.
Indexing HDFS Data in PDW: Splitting the data from the index VLDB2014 WSIC、Microsoft Calvin
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Solution to help customers and partners accelerate their data.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Directions Greg.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Rushabh Mehta Managing Director (India) | Solid Quality Mentors
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
SMP MPP with PDW ** Workload requirements usually drive the architecture decision.
Microsoft Analytics Platform System Stefan Cronjaeger, Microsoft.
MSBIC Hadoop Series Hadoop & Microsoft BI Bryan Smith
Modern Data Warehousing Symmetric Multi-Processing SQL (SMP) vs Massive Parallel Processing SQL (MPP) Alain Dormehl P-Cubed Session Level : Intermediary.
SQL Server 2008 R2 Parallel Data Warehouse: Under the Hood Brian Mitchell Senior Premier Field Engineer.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
An Introduction To Big Data For The SQL Server DBA.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Redmond Protocols Plugfest 2016 Casey Karst PolyBase in SQL Server 2016.
…the secret sauce! Diagrams and video from Microsoft white papers and slide decks.
Data Platform and Analytics Foundational Training
Flash Storage 101 Revolutionizing Databases
Data Warehouse in the Cloud – Marketing or Reality?
SQL Server 2008 R2 – The Newest and the Best
Data Warehousing: SQL Server Parallel Data Warehouse AU3 update
A developers guide to Azure SQL Data Warehouse
Microsoft Analytics Platform System
A developers guide to Azure SQL Data Warehouse
Hadoop Technopoints.
TechEd /2/2018 7:32 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Managing batch processing Transient Azure SQL Warehouse Resource
Sunil Agarwal | Principal Program Manager
Context about the Data Warehouse
SQL Server 2016 High Performance Database Offer.
Presentation transcript:

10-fold increase in data volume every 5 years “DW has shifted almost entirely towards the appliance model due to speed of the balanced appliance and scalability of scale out (MPP) solutions”. – Jim Cobelius, Forrester Research Mobile and social technologies are driving an explosion of unstructured data At the same time, Gartner estimates that by 2016 over 70% of existing data warehouses will require replacement as they fail to provide Big Data integration

Scale out

SQL Server PDW Appliance

SELECT count(*) FROM SalesWeb Control Node a1a SalesWeb Compute Node 1 a1a SalesWeb Compute Node 2 PDW Engine SELECT count(*) FROM SalesWeb SELECT count(*) FROM SalesWeb TempTable a1 into TempTable: SELECT count (*) FROM SalesWeb results

xVelocity in SQL Server 2012 PDW 5-10x performance improvement in customer workloads 2-3x compression improvement New HW/SW Architecture Scalability: ¼ rack to 5 PB Double the memory, 70% better disk IO Continued investment into DMS and PDW Engine Lower acquisition costs ½ the cost in terms of hardware Moving from SAN to JBODs Virtualization to reduce overhead costs Matching size and requirements Smaller entry point and smaller increments Lower operational costs Solution simplicity Alignment with SQL Server ecosystem and tools High performance access to data from hadoop Integrated native query, fully parallelized w/o user intervention Without loading into PDW first Query across structured and unstructured data Full SQL support Full metadata support Normal tools (PowerView etc.) function

General Details Windows Server 2012 on all hosts and VMs. Fabric and workload activity happens in VMs Fabric VMs, MAD01 and CTL share 1 server, lower overhead costs especially for small topologies Windows Storage Spaces handles mirroring and spares, allows us to use lower cost DAS (JBODs) rather than SAN VM based provisioning cuts down time and complexity for setup and other maintenance tasks PDW Workload Details SQL Server 2012 Enterprise Edition (PDW build) is used on control node and compute nodes for PDW workload Host 2 Host 1 Host 3 Host 4 JBOD IB & Ethernet Direct attached SAS CTL FAB AD MAD 01 VMM Compute 1Compute 2 Window Server 2012 DMS Core SQL Server 2012 Similar layout relative to V1, but more files per filegroup to leverage larger number of spindles in parallel Window Server 2012 PDW engine DMS Manager SQL Server Shell DBs just as in AU3+

CONTROL RACK DATA RACK Control Node Mgmt. Node LZ Backup Node Estimated Total HW component List Price: $1MM $ Estimated Total HW component List Price: $500K $ Infiniband & Ethernet Fiber Channel Pure hardware costs are ~50% lower Price per raw TB is close to 70% lower due to higher capacity 70% more disk I/O bandwidth RACK 1 Infiniband & Ethernet 128 cores on 8 compute nodes 2TB of RAM on compute Up to 168 TB of temp DB Up to 1PB of user data 160 cores on 10 compute nodes 1.28 TB of RAM on compute Up to 30 TB of temp DB Up to 150 TB of user data

Start small, then easily scale to petabytes 2 to 56 compute nodes 15TB to 1.3PB raw Up to 6PB user data Capacity additions at small increments DELLComputeIncr.SpareRaw disk: 1TBRaw disk: 3TBCapacity Quarter-rack3N/A TB 2 thirds6100% TB Full rack950% TB One and third1233% TB One and 2 third1525% TB 2 racks1820% TB 2 and a third2117% TB 2 and 2 thirds2414% TB Three racks2713% TB Four racks3633% TB Five racks4525% TB Six racks5420% TB HPComputeIncr.SpareRaw disk: 1TBRaw disk: 3TBCapacity Quarter-rack2N/A TB Half4100% TB Three-quarters650% TB Full rack833% TB One-&-quarter1025% TB One-&-half1220% TB Two racks1633% TB Two and a half2025% TB Three racks2420% TB Four racks3233% TB Five racks4025% TB Six racks4820% TB Seven racks5617% TB

Supports all PDW data types Full DML Support Support for Create table, CTAS, Alter Table, partition switching, etc. Uses PDW cost model Mixed-mode processing: presence of row operators does not prevent operators to be executed in the batch mode Batch mode spilling More operators supported (e.g., inner and outer joins, union all, local aggs) Overarching goal: Offer the same functionality as row store, while providing the performance boost. Column store is the preferred storage engine for SQL Server 2012 PDW CREATE TABLE user_db.dbo.user_table (C1 int, C2 varchar(20)) WITH (DISTRIBUTION = HASH (id), CLUSTERED COLUMNSTORE INDEX)

Dramatic performance increases 5-10x on customer workloads Confirmed with TAP customer workloads Improved compression on disk and in backups 2-3x better compression vs. row store Preserved appliance model Few tuning knobs Improved memory management Run-time memory mgmt. respects resource governor Batch processing can now spill

Sensor & RFID Web Apps Unstructured dataStructured data Traditional schema- based DW applications RDBMS Hadoop Social Apps Mobile Apps How to overcome the ‘Impedance Mismatch’? Increasingly massive amounts of unstructured data driven by new sources At the same time, vast amounts of corporate data and data sources, and the bulk of their data analysis Polybase addresses this challenge for advanced data analytics by allowing native query across PDW and Hadoop, integrating structured and unstructured data

CREATE TABLE ClickStream_PDW WITH DISTRIBUTION = HASH(url) AS SELECT url, event_date, user_IP FROM ClickStream Enhanced PDW query engine CTAS Results External Table DMS Reader 1 DMS Reader N … HDFS bridge Parallel HDFS Reads Parallel Importing Sensor & RFID Web Apps Unstructured data Hadoop Social Apps Mobile Apps Structured data Traditional DW applications PDW CREATE EXTERNAL TABLE ClickStream(url varchar(50), event_date date, user_IP varchar(50)) WITH (LOCATION =‘hdfs://MyHadoop:5000/tpch1GB/employee.tbl’, FORMAT_OPTIONS (FIELD_TERMINATOR = '|')); SELECT url.description FROM ClickStream cs, Url_Description url WHERE cs.url = url.name and cs.url=’ 1 2 SELECT user_name FROM ClickStream cs, Users u WHERE cs.user_IP = u.user_IP and cs.url=’ 3 SELECT top 10 (url) FROM ClickStream where user_IP = ‘ ’ Query Examples

Sensor & RFID Web Apps Unstructured data Social Apps Mobile Apps HDFS data nodes CREATE EXTERNAL TABLE ClickStream (url, event_date, user_IP) WITH (LOCATION =‘hdfs://MyHadoop:5000/users/outputDir’, FORMAT_OPTIONS (FIELD_TERMINATOR = '|')) AS SELECT url, event_date, user_IP FROM ClickStream_PDW Enhanced PDW query engine CETAS Results External Table DMS Writer 1 DMS Writer N … HDFS bridge Parallel HDFS Writes Parallel Reading Structured data Traditional DW applications PDW

Windows 2012 Storage Spaces

Resource classes are implemented as pre-built server roles, the user (DBA) can add or remove members into/from resource classes: ALTER SERVER ROLE resource_class_name { { ADD | DROP } MEMBER server_principal } Each resource class maps to pre-built resource governor settings on compute nodes (control node is not governed). PDW honors resource classes at run-time, no need to reconnect (though running queries continue unchanged) By default, the product preserves current (V1) behavior. One has to explicitly opt-in to use resource classes.