Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal & External Data sources Non-Relational Data The modern data warehouse

About Analytics Platform System  Pre-built HW appliance  Windows Server 2012 R2 + SQL Server 2014  Massively Parallel Processing (MPP) to scale to 6 PBs  In-memory columnstore for 100x speed improvement  Dedicated region for HDInsight (Hadoop)  Integrated query model joining relational and HDInsight (Hadoop)  Available from HP and Dell SQL Server Parallel Data Warehouse & HDInsight in a single appliance

Integrate relational + non-relational 5  Query relational and Hadoop in parallel  Single query  No need to ETL Hadoop data into DW  Query Hadoop with existing T-SQL skills Query relational + non relational SQLResult set Relational data PolyBase Integrated query with PolyBase in SQL PDW Semi\Unstructured\Streaming

What is Hadoop? 6 MapReduce (Job Scheduling/Execution System) HDFS (Hadoop Distributed File System) HBase (Column DB) Hive Mahout Oozie Sqoop HBase/Cassandra/Couch/ MongoDB Avro Zookeeper Pig Hadoop = MapReduce + HDFS Flume Cascad- ing R Ambari HCatalog

Move HDFS into the Warehouse Before Analysis ETL Learn new skills T-SQL Build Integrate Manage Maintain Support Hadoop Ecosystem New data sources “New” data sources New data sources

One standard node type 2x 8 core Intel processors Memory 256 GB Uses newest Infiniband connectivity (FDR – 56 GB/sec) Using JBODs Using Windows Server 2012 R2 technologies managing JBOD drives to achieve the same level of reliability and robustness as we know from SAN solutions Backup- and Load Server (LS) are not in the appliance Customers can use their own hardware Customers can use more than 1 BU or LS for high availability Scale unit concept Base unit: Minimum configuration; populates rack with networking Scale unit: Adds capacity by 2–3 compute nodes/related storage Passive unit: Increases high availability (HA) capacity by adding more spares HST04 HST03 JBOD IB and Ethernet Direct attached SAS HSA03 HSA04 HST02 JBOD HST01 HSA01 HSA02

General details All hosts run Windows Server 2012 R2 Standard All virtual machines run Windows Server 2012 R2 Standard as a guest operating system All fabric and workload activity happens in Hyper-V virtual machines Fabric virtual machines and CTL share one server Lower overhead costs especially for small topologies PDW Agent runs on all hosts and all virtual machines and collects appliance health data on fabric and workload DWConfig and Admin Console continue to exist Minor extensions expose host-level information Windows Storage Spaces handles mirroring and spares and enables use of lower cost DAS (JBODs) PDW workload details SQL Server 2014 Enterprise Edition (PDW build) control node and compute nodes for PDW workload Storage details 2 Files on 2 LUNs per Filegroup, 8 Filegroups per compute node Each LUN is configured as RAID 1  Large numbers of spindles are used in parallel HST02 HST01 HSA02 HSA01 JBOD IB and Ethernet Direct attached SAS Compute 2 Compute 1 Window Server 2012 R2 Standard PDW engine DMS Manager SQL Server 2014 Enterprise Edition (PDW build) Shell databases just as in older versions Window Server 2012 R2 Standard DMS Core SQL Server 2014 Enterprise Edition (PDW build) CTL WDSAD01 VMM AD02

SQL Server PDW 2014 Control Architecture Cost-Based Query Optimizer Shell Appliance (SQL Server) Engine Service Plan Steps Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Control Node SELECT PDW 2014 uses SQL Server on the Control node to run a “shell appliance” Every database with all its objects exists in the shell appliance as an empty “shell,” lacking the user data (which sits on all the compute nodes) Every DDL operation is executed against both the shell and the compute nodes Large parts of basic RDBMS functionality now provided by that shell Authentication and authorization of queries, but also the full security system Schema binding Metadata catalog foo

General details All hosts run Windows Server 2012 R2 Standard All virtual machines run Windows Server 2012 R2 Standard as a guest operating system All HDI workload activity happens in Hyper-V virtual machines Lower overhead costs especially for small topologies Windows Storage Spaces handles mirroring and spares and enables use of lower cost DAS (JBODs) rather than SAN HDI workload details Windows HDI Head-/Security-/Management- node and Data nodes for HDI workload Storage details 16 Data Disks per Data Node No RAID 1 – single drives only! But each file is stores 3 times in HDFS Software details HST04 HST03 HSA04 HSA03 JBOD IB and Ethernet Direct attached SAS HMN01HHN01 Data 3 Data 1 Window Server 2012 R2 Standard Windows HDI distribution software (version 2.x) Window Server 2012 R2 Standard Windows HDI distribution software (version 2.x) Data 4 Data 2 HSN01

...... JBOD...... Disk 67Disk 68 Disk 69Disk 70 Node 1: Distribution A – file 1 Node 1: Distribution A – file 2 Node 1: Distribution B – file 2 Node 1: Distribution B – file 1 Node 1: Distribution H – file 1 Node 1: Distribution H – file 2 Hot spares Fabric storage (VHDXs for nodes) Temp DB Log Temp DB Log...... Node 2: Distribution A – file 1...... Disk 1Disk 2 Disk 3Disk 4 Disk 5Disk 6 Disk 7Disk 8 Disk 65Disk 66 Disk 29Disk 30 Disk 33Disk 34 Disk 31Disk 32 Disk 35Disk 36 Each LUN is composed of two drives in a RAID1 mirroring configuration Distributions are split across two files/LUNS TempDB and Log are across all 16 LUNs No fixed tempDB or log size allocation VHDXs are on JBODs to ensure HA Disk I/O further parallelized Bandwidth: 2 cables with 4x 6 GBit/sec Lanes each

Distribution storage PDW stores eight distributions and one replicate per compute node. This value is fixed and cannot be configured dynamically nor preconfigured differently in the factory. The value of eight does not change between smaller and larger appliances. Each distribution is stored on separate physical disks in the JBOD. Replicated tables are striped across all disks in the JBOD. The physical location for each distribution is controlled via SQL Server filegroups at the compute node level. Comput e node 1 Distributio n A Distributio n B Distributio n C Distributio n D Distributio n H Distributio n G Distributio n F Distributio n E Replicated Data distribution layout Comput e node 2 Distributio n A Distributio n B Distributio n C Distributio n D Distributio n H Distributio n G Distributio n F Distributio n E Replicated

Distribution storage Data skew occurs when a collection of rows are stored in a distribution disproportionately to the count of rows in other distributions. Generally this is to the value chosen as the distribution key occurs significantly more often than other values in the data set. The following graph shows a real-world scenario where a table was distributed on an IP address. We should expect the distribution to be even given that the IP is unique. In this situation a number of connections came via proxy server, so the IP address was identical for a number of different connections. Skew will affect performance and limit storage capacity by creating a hot spot for CPU and storage on a single compute node. Distribution of data

InfiniBand Ethernet Control node Passive node Master node Passive node Economical disk storage Compute nodes Economical disk storage Compute nodes Economical disk storage Compute nodes Networking PDW region HDInsight region Rack #1 InfiniBand Ethernet Passive node Economical disk storage Compute nodes Economical disk storage Compute nodes Economical disk storage Compute nodes HDI extension base unit HDI active scale unit HDI extension base unit HDI active scale unit Rack #2 HST-04 HST-03 HSA-03 HST-04 Economic al disk storage IB and Ethernet Active Unit Addition of two or three compute nodes depending on OEM hardware configuration and related storage Passive Unit Host available to accept the workload from the associated workload nodes Failover Node High availability for the rack

3/4 Rack 45.3TB (Raw) ¼ Rack 15.1TB (Raw) 1/2 Rack 30.2TB (Raw) Full Rack 60TB (Raw) 1¼ Rack 75.5TB (Raw) 3 Rack 181.2TB (Raw) 1 1/2 Rack 90.6TB (Raw) 2 Rack 120.8TB (Raw)

1/3 Rack 22.6TB (Raw) 2/3 Rack 45.3TB (Raw) Full Rack 67.9TB (Raw)

# Compute Nodes (1TB Drives) 2468101216202432404856 Compression Ratio 115.130.245.360.475.590.6121151181242302362423 230.260.490.6121151181242302362483604725846 345.390.613618122727236245354472590610871268 460.4121181242302362483604725966120814501691 575.51512273023784536047559061208151018122114 690.618127236245354472590610871450181221742537 7106211317423529634846105712681691211425372960 8121242362483604725966120814501933241628993382 91362724085446808151087135916312174271832623805 101513024536047559061208151018122416302036244228 111663324986648319971329166119932658332239864651 1218136254472590610871450181221742899362443495074 1319639358978598211781570196323563141392647115496 14211423634846105712681691211425373382422850745919 15227453680906113313591812226527183624453054366342

# Compute Nodes (1TB Drives) 369121518212427364554 Compression Ratio 123456891113136159181204272340408 24591136181227272317362408544680815 36813620427234040847654461281510191223 491181272362453544634725815108713591631 51132273404535666807939061019135916992039 613627240854468081595110871223163120392446 7159317476634793951111012681427190323782854 81813625447259061087126814501631217427183262 920440861281510191223142716311835244630583669 1022745368090611331359158618122039271833984077 1124949874799712461495174419932242299037374485 12272544815108713591631190321742446326240774892 13294589883117814721767206123562650353344175300 14317634951126815861903222025372854380547575708 153406801019135916992039237827183058407750966116

# Seconds to Scan a 1 TB table at 200 MB/second # Compute Nodes 234689 # Distributions162432486472 Compression Ratio 1327.68218.45163.84109.2381.9272.82 2163.84109.2381.9254.6140.9636.41 3109.2372.8254.6136.4127.3124.27 481.9254.6140.9627.3120.4818.20 565.5443.6932.7721.8516.3814.56 654.6136.4127.3118.2013.6512.14 746.8131.2123.4115.6011.7010.40 840.9627.3120.4813.6510.249.10 936.4124.2718.2012.149.108.09 1032.7721.8516.3810.928.197.28 1129.7919.8614.899.937.456.62 1227.3118.2013.659.106.836.07 1325.2116.8012.608.406.305.60 1423.4115.6011.707.805.855.20 1521.8514.5610.927.285.464.85

General details Physical hosts Virtual machines required to maintain the appliance infrastructure and workload virtual machine configurations Windows Storage Spaces handles mirroring and spares PDW workload details SQL Server 2014 Enterprise Edition (PDW build) Control node and compute nodes for PDW workload Storage details 2 Files on 2 LUNs per Filegroup, 8 Filegroups each LUN configured as RAID 1  Takes advantage of large number of spindles in parallel HDI workload details Windows HDI (100% Apache Hadoop) based on Hortonworks HDP Head-/Security-/Management- node and Data nodes Fabric Domain PDW Workload Region HDI Workload Region IB and Ethernet HST04 JBOD Direct attached SAS HSA01 Compute 1 HSA02 Compute 2 HST02 JBOD HST01 HSA03 Data 1 Data 2 HSA04 Data 3Data 4 HST03 HMN01HHN01 HSN01 CTL WDSAD01 VMM AD02

General details Manages access between physical hosts Required to support the clusters Holds Appliance Active Directory and DNS VMs are stored on local disks, not on CSVs and to have redundancy we have two AD VMs on two physical machines General details (new in v2 AU3) Used to deploy Windows operating systems over the appliance network VM is stored on CSVs General details Manages the configuration/image of virtual machines within the appliance VM is stored on CSVs Fabric Active Directory (AD01 and AD02) SC Virtual Machine Manager (VMM) Windows Deployment Services (WDS) IB and Ethernet HST04 JBOD Direct attached SAS HSA01 Compute 1 HSA02 Compute 2 HST02 JBOD HST01 HSA03 Data 1 Data 2 HSA04 Data 3Data 4 HST03 HMN01HHN01 HSN01 CTL WDSAD01 VMM AD02

General details Client connections always go through the control node Contains no persistent user data Contains metadata System metadata User shell databases Parallel Data Warehouse advantages: Processes SQL requests Prepares execution plan Orchestrates distributed execution Local SQL Server processes final query plan and aggregates results Runs the “Data Movement Service” (DMS) Manages connectivity to Compute Nodes Manages Query execution Runs the “Azure Data Management Gateway” (since v2 AU3) - Enables Query from Cloud to On Prem through APS VM is stored on CSVs Control Node (CTL) IB and Ethernet HST04 JBOD Direct attached SAS HSA01 Compute 1 HSA02 Compute 2 HST02 JBOD HST01 HSA03 Data 1 Data 2 HSA04 Data 3Data 4 HST03 HMN01HHN01 HSN01 CTL WDSAD01 VMM AD02

General details Each MPP node is a highly tuned symmetric multi- processing (SMP/NUMA) node with standard interfaces Provides dedicated hardware, database, and storage Runs SQL Server 2014 Enterprise Edition (PDW Build) Runs the Data Movement Service (DMS) VMs are stored on CSVs General details Storage managed by Windows Server 2012 R2 Storage Spaces Cross connected via dual 4x 6 GB/sec SAS connections Compute Nodes Just a Bunch Of Disks (JBOD) IB and Ethernet HST04 JBOD Direct attached SAS HSA01 Compute 1 HSA02 Compute 2 HST02 JBOD HST01 HSA03 Data 1 Data 2 HSA04 Data 3Data 4 HST03 HMN01HHN01 HSN01 CTL WDSAD01 VMM AD02

General details Client connections always go through the Head Node Ambari Agent, Namenode (1,2,3), Jobtracker, History Server, HiveServer (1,2,3), HiveMetastore, OoziService, WebcatServer… IIS Developer Dash Board Secure Gateway IIS, Ambari Agent SQL Server Ambari Agent Datanode TaskTracker Head Node (HHN01) Secure Gateway Node (HSN01) Management Node (HMN01) Data Node (HDN001 - HDNxxx) IB and Ethernet HST04 JBOD Direct attached SAS HSA01 Compute 1 HSA02 Compute 2 HST02 JBOD HST01 HSA03 Data 1 Data 2 HSA04 Data 3Data 4 HST03 HMN01HHN01 HSN01 CTL WDSAD01 VMM AD02

HSA02 HST01 HST02 HST01 Failover Cluster Manager starts virtual machine on a new host after failure Cluster Shared Volumes Enables all nodes to access the LUNs on the JBOD as long as at least one of the hosts attached to the JBOD is active Uses SMB3 protocol Failover details One cluster across the whole appliance Virtual machine images are automatically started on new host in the event of failover Rules enforced by affinity and anti-affinity maps Failback continues to be through CSS Uses Windows Failover Cluster Manager Adding Passive Unit increases HA capacity Enables another virtual machine to fail without disabling the appliance All hosts connected to a single JBOD cannot failover HSA01 JBOD IB & Ethernet Direct attached SAS CTL WDSAD01 VMM Compute 1 Compute 2 HST03 HST02 CTL Compute 2 CTL Sample: PDW Region (Base Unit - HP) WDSAD01 VMM Compute 2 AD02

HSA02 Single type of node Sole differentiator—storage attached vs. storage unattached Execution commonality regardless if host is being replaced Workloads migrate with virtual machines Replace Node follows a subset of the bare metal provisioning using WDS and executes the APS Setup.exe with the replace node action specified, along with the necessary information targeting the replacement node Workload virtual machines do not have to be re-provisioned Workload virtual machines are failed back using Windows Failover Cluster Manager Failback still incurs small downtime May be a small performance impact by failed over compute nodes; documentation will suggest fail-back Currently not using Live Migration for failover and failback HSA01 JBOD IB and Ethernet Direct attached SAS Compute 1 Compute 2 HST02 HST01 CTL WDSAD01 VMM Compute 2 Sample: PDW Region (Base Unit - HP) AD02

HSA04 HSA03 JBOD HSA03 Compute 3 HSA04 Compute 4 Compute 3 Addition to the appliance is in the form of one or more scale units IHV owns installation and cabling of new scale units Software provisioning consists of three phases Bare metal provisioning of new nodes (online since AU1) Provisioning of workload virtual machines (online since AU1) Redistribution of data (offline) CSS assistance (may have to help prepare user data) Tools to validate environment/data transition Develop strategy for successful addition Deleting old data Partition switching from largest tables CRTAS to move data off appliance temporarily HST02 HST01 CTL WDSAD01 VMM JBOD IB and Ethernet Direct attached SAS Sample: PDW Region (Base Unit - HP) AD02 HSA02 HSA01 JBOD Compute 1 Compute 2 PDW Region must have enough free space to re-distribute the largest table.

Replace Node Hardware failure Replace VM VM corruption

Background Research done by Gray System Lab lead by Technical Fellow David DeWitt High-level goals for V2 Seamless Integration with Hadoop via regular T-SQL Enhancing PDW query engine to process data coming from the Hadoop Distributed File System (HDFS) Fully parallelized query processing for highly performing data import and export from HDFS Integration with various Hadoop implementations Hadoop on Windows Server, Hortonworks, and Cloudera

Both distributed systems Parallel data access between PDW and Hadoop Different goals and internal architecture Combined power of Big Data integration Control Node Compute Node Name Node Data Node PDW

Direct parallel data access between PDW Compute Nodes and Hadoop Data Nodes Support of all HDFS file formats Introducing “structure” on the “unstructured” data Hadoop HDFS DB SQL in, results out Hadoop HDFS DB SQL in, results stored in HDFS

Cost-based decision on how much data needs to be pushed to PDW SQL operations on HDFS data pushed into Hadoop as MapReduce jobs HDFS Hadoop DB MapReduce

Introducing structure to semi\unstructured data Representation of data residing in Hadoop/HDFS Introducing new T-SQL Syntax Syntax very similar as if it were a regular table

--Create a new external table in SQL Server PDW CREATE EXTERNAL TABLE [ database_name. [ dbo ]. | dbo. ] table_name ( [,...n ] ) WITH ( LOCATION = 'hdfs_folder_or_filepath', DATA_SOURCE = external_data_source_name, FILE_FORMAT = external_file_format_name [, [,...n ] ] ) [;] ::= { | REJECT_TYPE = value | percentage | REJECT_VALUE = reject_value | REJECT_SAMPLE_VALUE = reject_sample_value } Indicates external table 1. Required location of Hadoop file 2. File Format options associated with data import from HDFS (for example, arbitrary field delimiters and reject-related thresholds) 4. Required Data Source Definition of Hadoop Cluster 3.

--STEP 1: Create an external data source for Hadoop -- DROP EXTERNAL Data Source FXR_TEST_DSRC; CREATE EXTERNAL DATA SOURCE FXR_TEST_DSRC WITH ( TYPE = HADOOP, LOCATION = 'hdfs://192.168.210.145:8020', JOB_TRACKER_LOCATION = '192.168.210.145:50300' ---- default: 8021=Cloudera; 50300=HDInsight ); --STEP 2: Create an external file format for a Hadoop text-delimited file. --DROP EXTERNAL FILE FORMAT FXR_Test_Format; CREATE EXTERNAL FILE FORMAT FXR_Test_Format WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS ( FIELD_TERMINATOR = N';', USE_TYPE_DEFAULT = TRUE, STRING_DELIMITER = '‘ ) );

--STEP 3: Create a new external table in SQL Server PDW drop external table Test; go create external table Test ( name nvarchar(17), startzeitpunkt nvarchar(35), endzeitpunkt varchar(35), flms_system_realtime nvarchar(19), dummy nvarchar(19) NULL, Counter1DTonDur nvarchar(19), Counter1DMileage nvarchar(19), dummy2 nvarchar(2) NULL ) WITH (LOCATION = '/user/fxr47511/pdwtest', DATA_SOURCE = FXR_TEST_DSRC, FILE_FORMAT = FXR_Test_Format, REJECT_TYPE = value, REJECT_VALUE = 1000 );

Direct HDFS access Functional part of Data Movement Service Hides HDFS complexity HDFS file types supported by use of appropriate RecordReader interface DMS SQL Server DMS SQL Server HDFS Hadoop Cluster HDFS HDFS Bridge PDW Node

DMS Ser er PDW Engine Load Manager DMS Manager DMS SQL Server DMS ConverterSender Receiver Writer Query against external table executed in PDW HDFS Bridge reads data blocks by using Hadoop RecordReaders interface Each row is converted for bulk insert and hashed based on the distribution column Hashed row is sent to appropriate node receiver for loading HDFS Hadoop Cluster HDFS HDFS Bridge DMS ConverterSender Receiver Writer HDFS Bridge Row is bulk inserted into destination table Control Node Compute Nodes APS Appliance

exec sp_configure 'hadoop connectivity', 1 ParameterEnables support for… 0 Disable Hadoop connectivity 1 HortonWorks (HDP 1.3) for Microsoft Server HDInsight on Analytics Platform System (version AU1) (HDP 1.3) Azure blob storage on Microsoft Azure (WASB[S]) (AU1) 2 HortonWorks (HDP 1.3) for Linux 3 Cloudera CDH 4.3 for Linux 4 HortonWorks (HDP 2.0) for Windows Server HDInsight on Analytics Platform System (AU2) (HDP 2.0) Azure blob storage on Microsoft Azure (WASB[S]) (AU2) 5 HortonWorks (HDP 2.0) for Linux

Columnstore overview Clustered columnstore index is comprised of two parts: Columnstore Deltastore Data is compressed into segments Ideally ~1 million rows (subject to system resource availability) A collection of segments representing a set of entire rows is called a row group The minimum unit of I/O between disk and memory is a segment (red block is a single segment) Execution in batch mode (as opposed to traditional row mode) moves multiple rows between iterators: ~ 1000 Rows Dictionaries (primary and secondary) are used to store additional metadata about segments Terminology C1 C2 C3 C5C6C4 Row group Segment s C1 C2 C3 C5C6C4 … Delta (row) store Columnstore

Columnstore indexes are now clustered Only a single index exists for a table: Clustered index + non-clustered columnstore is not supported Clustered columnstore + non-clustered row store index is not supported Full DML (Insert, Update, Delete, Select) is supported directly on columnstore A previous workaround involved maintaining a separate secondary row store table with UNION ALL All PDW data types are supported in columnstore indexes: Decimal with precision greater than 18 was not supported in SQL Server 2012 Binary/varbinary was not supported in SQL Server 2012 Datetimeoffset with scale greater than 2 was not supported in SQL Server 2012 Query processing Batch mode hash join spill implemented (previously this would revert to row mode). Aggregations without GROUP BY supported Some limitations still apply Avoid string data types for filtering or join conditions Some SQL clauses, for example: ROW_NUMBER() / RANK() etc. OVER (PARTITION BY … ORDER BY …)

SQL 2012 implemented batch mode processing to handle rows a batch-at-a-time in addition to a row-at-a-time SQL 2008 and before only had row processing Typically batches of about 1,000 rows are moved between iterators Significantly less CPU is required due to the average number of instructions per row decreasing Batch mode processing: Hash Join/Aggregate are supported Merge Join, Nested Loop Join, and Stream Aggregate are not supported SELECT COUNT(*) FROM FactInternetSales_Column 352 ms SELECT COUNT(*) FROM FactInternetSales_Row 6704 ms Batch mode scan example Row mode scan example

INSERTED a single record into a table with clustered columnstore index Screenshots taken from DMV sys.pdw_nodes_column_store_row_groups (subset of total rows returned) Before row inserted: After single row inserted (deltastore has been created and single row represented) After REBUILD REORGANIZE only moves “Closed” delta store segments into a “Compressed” status REBUILD affects entire index (or entire partition index) Segment row count increased by 1

The state_description field has three states: COMPRESSED, OPEN, and CLOSED COMPRESSED represents a row group that is stored in columnstore format OPEN represents a deltastore that is accepting new rows CLOSED represents a full deltastore ready for REORGANIZE When inserting 102,400 rows or more in a single batch into a columnstore index distribution, the data will compress automatically When inserting 102,399 or less rows in a single batch, the data will be stored in the delta store The actual maximum number of rows per delta store is 1,048,576 at which point it is CLOSED This is also the ideal segment size that SQL Server will try to create when first building a columnstore index from a table When Index Build encounters memory pressure, DOP is reduced first and then segment size is reduced Only the REBUILD statement can compress a delta store that is not in the CLOSED state Neither REORGANIZE or the Tuple Mover process will have any effect on an OPEN delta store

Get Started Today!  Sign up for a free architectural design session with your Microsoft representative  Learn about the Microsoft Analytics Platform System at www.microsoft.com/aps  Try HDInsight at www.microsoft.com/bigdata  Try SQL Server for data warehousing in Windows Azure VMs at www.windowsazure.com  Try SQL Server 2014 at www.microsoft.com/en-us/sqlserver/sql-server-2014.aspx

DBI-B337 Polybase in the Modern Data Warehouse

www.microsoft.com/learning http://developer.microsoft.com http://microsoft.com/technet http://channel9.msdn.com/Events/TechEd

A financial customer required a powerful analytics platform… to improve performance, deliver an enhanced level of customer service, handle terabytes of data and an unprecedented level of query complexity. The right solution Microsoft Analytics Platform System 960x FASTER

Appliance Topology: Password Reset Time Zone Network Parallel Data Warehouse Topology: Certificate Firewall PDW Service Status Instant File Initialization Restore Master Database HDInsight Topology: Certificate Firewall HDI Service Status User Management

Dashboard Queries Activity Load Activity Backup and Restore Active Locks Active Sessions Alerts Appliance State https://controlnodeipaddress

HDFS Health MAP/Reduce Storage Performance Monitor For each HDI Node: OS Data

Actual performance figures for data export from PDW PDW= HP Full Rack V2 Appliance (8 nodes). Spoke = HP DL980, 4 proc (32 core), 1TB RAM Data Nodes = VM’s built within the spoke DL980 (2 cores, 64GB RAM)

Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Similar presentations

Presentation on theme: "Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Similar presentations

Presentation on theme: "Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal."— Presentation transcript:

Similar presentations

About project

Feedback