Tamir Dresher Senior Software Architect July 2, 2014 Where is my Data? (In the Cloud)

Slides:



Advertisements
Similar presentations
No SQL is not about SQL No SQL is a Zoo.. Key-Value Stores Wide Column Stores Document Stores Graph Databases.
Advertisements

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Amazon RDS (MySQL and Oracle) and SQL Azure Emil Tabakov Telerik Software Academy academy.telerik.com.
Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
Platform as a Service (PaaS)
Google AppEngine. Google App Engine enables you to build and host web apps on the same systems that power Google applications. App Engine offers fast.
Windows Azure Storage Services Saranya Sriram, Technology Evangelist, Microsoft, India.
Cross Platform Mobile Backend with Mobile Services James
Windows Azure SQL Database and Storage Name Title Organization.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
NoSQL for the SQL Server Pro
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
Mastering Amazon RDS Data Masters. Special Thanks To… Miami Innovation Center for Entrepreneurship
HDInsight on Azure and Map-Reduce Richard Conway Windows Azure MVP Elastacloud Limited.
Austin code camp 2010 asp.net apps with azure table storage PRESENTED BY CHANDER SHEKHAR DHALL
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
WTT Workshop de Tendências Tecnológicas 2014
Goodbye rows and tables, hello documents and collections.
Introduction to Hadoop and HDFS
Modern Databases NoSQL and NewSQL Willem Visser RW334.
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.
Overview of Cloud Computing Sven Rosvall ACCU
Windows Azure Storage Name Title Microsoft Corporation.
WINDOWS AZURE STORAGE SERVICES A brief comparison and overview of storage services offered by Microsoft.
Windows Azure Storage Cloud Computing Soup to Nuts Mike Benkovich Microsoft Corporation btlod-72.
T.N.C.Venkata Rangan CEO, Vishwak Solutions Your Data on Cloud.
Amazon Web Services MANEESH MOHANAVILASAM. OLD IS GOLD?...NOT Predicting peaks Developing partnerships Buying and maintaining hardware Upgrading hardware.
Windows Azure Conference 2014 Caching Data in the Cloud with Windows Azure.
Windows Azure Storage Anton Boyko.NET developer.
Azure in a Day Azure Tables Module 1: Azure Tables Overview Module 2: REST API – DEMO: Azure Table REST API Module 3: Querying Azure Tables – DEMO: Querying.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Cloud Computing is a Nebulous Subject Or how I learned to love VDF on Amazon.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Text Microsoft to Or Tweet #uktechdays Questions?
NOSQL DATABASE Not Only SQL DATABASE
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014.
Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |
An Introduction to Super-Scalability But first…
JSON C# Libraries Parsing JSON Files “Deserialize” OR Generating JSON Files “Serialize” JavaScriptSerializer.NET Class JSON.NET.
uses of DB systems DB environment DB structure Codd’s rules current common RDBMs implementations.
Azure Table Storage Cheap, fast and scalable storage Anton Boyko Ukrainian Azure Community Founder Microsoft Azure MVP
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
Windows Azure Custom Software Development Mobile Middleware Windows Azure Storage Dipl.-Ing. Damir Dobric Lead Architect daenet
Big Data Anton Boyko. Agenda What is Big Data? Why Big Data? How to Big Data?
Aaron Stanley King. What is SQL Azure? “SQL Azure is a scalable and cost-effective on- demand data storage and query processing service. SQL Azure is.
Presented by: Aaron Stanley King.  Benefits of SQL Azure  Features of SQL Azure  Demos, Demos, Demos!  How to query in SQL Azure  More Demos!  Recent.
Dive into NoSQL with Azure Niels Naglé Hylke Peek.
Course: Cluster, grid and cloud computing systems Course author: Prof
and Big Data Storage Systems
Amazon Storage- S3 and Glacier
NOSQL.
NOSQL databases and Big Data Storage Systems
02 | Design and implement database
03 | Data Storage Bruno Terkaly | Technical Evangelist
1 Demand of your DB is changing Presented By: Ashwani Kumar
NoSQL Databases An Overview
11/18/2018 2:14 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Explore the Azure Cosmos DB with .NET Core 2.0
Let's make a complex dataset simple using Azure Cosmos DB
MS AZURE By Sauras Pandey.
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
The Database World of Azure
Presentation transcript:

Tamir Dresher Senior Software Architect July 2, 2014 Where is my Data? (In the Cloud)

About Me Software architect, consultant and instructor Software Engineering Ruppin Academic Center Technology addict 10 years of experience.NET and Native Windows

Agenda Storage Blob Relational DB NoSql DB MapReduce 3

Storage 4 Where is my dataStorage

Numbers – 1 Second is 1,132 Instagram photos uploaded 5 Where is my dataStorage

Numbers – 1 Second is 1,132 Instagram photos uploaded 1,365 Tumblr posts 6 Where is my dataStorage

Numbers – 1 Second is 1,132 Instagram photos uploaded 1,365 Tumblr posts 7,241 Tweets sent Tweets sent 7 Where is my dataStorage

Numbers – 1 Second is 1,132 Instagram photos uploaded 1,365 Tumblr posts 7,241 Tweets sent Tweets sent 44,512 Google searchesGoogle searches 8 Where is my dataStorage

Numbers – 1 Second is 1,132 Instagram photos uploaded 1,365 Tumblr posts 7,241 Tweets sent Tweets sent 44,512 Google searchesGoogle searches 84,921 YouTube videos viewed Where is my dataStorage

Storage Prices 10

Types of information Product catalogs Employee data User profiles Images Session state Shopping cart Game scores and state 11 Social feeds Query output results Airline seating charts Inventory management system Game leaderboards Performance counters Weather Stock quotes Where is my dataStorage

Gartner Magic Quadrant 12 IaaS PaaS

North America Europe Asia Pacific Data centers Windows Azure Growing Global Presence Storage SLA – 99.99% minutes per year

AZURE BLOBS 14

What is a BLOB BLOB – Binary Large OBject Storage for any type of entity such as binary files and text documents Distributed File Service (DFS) – Scalability and High availability BLOB file is distributed between multiple server and replicated at least 3 times 15 Where is my dataBLOB

Azure Blob Storage Concepts BlobContainerAccount / Pages/ Blocks contoso PIC01.JPG Block/Page PIC02.JPG images VID1.AVIvideos 16 Where is my dataBLOB

Amazon Simple Storage Service(S3) Concepts ObjectBucketAccount s3.amazonaws.com/ contoso PIC01.JPG PIC02.JPG images VID1.AVIvideos 17 Where is my dataBLOB

Blob Operations 18 REST Where is my dataBLOB

DEMO Creating a Blob 19

BLOBS - Azure Block blob - up to 200 GB in size Page blobs – up to 1 TB in size Total Account Capacity TB 20 Where is my dataBLOB

BLOBS - AWS Object size – up to 5 TB AWS account can own up to 100 buckets at a time, unlimited objects % durability, 99.99% availability Reduced Redundancy Storage (RRS) % durability and 99.99% Amazon Glaciar - low-cost storage service as a storage option for data archival. 21 Where is my dataBLOB

Pricing - AWS pay for what you use Components: – Storage capacity used (per GB per month) – Data transfer out (per GB per month) – Requests (per n thousand requests per month) 22 Where is my dataBLOBPricing

Pricing - Azure pay for what you use or 6,12 months plan Components – Storage capacity used (per GB per month) – Replication option (LRS, GRS, RA-GRS) – Number of requests (per n thousand requests per month) – Data egress (per GB per month) 23 Where is my dataBLOBPricing

RELATIONAL DB 24

Relational Database Service (RDS) MySQL, Oracle, or Microsoft SQL Server in the cloud No administrative overheads Dedicated Hardware High Availability pay-as-you-grow pricing Familiar Development Model* * Despite missing features and some limitations Where is my dataRelational DB

SQL Azure SQL Server in the cloud No administrative overheads Shared or Reserved (Dedicated) Hardware High Availability pay-as-you-grow pricing Familiar Development Model* * Despite missing features and some limitations Where is my dataRelational DB

DEMO Creating and Using SQL Azure 27

PricingSQL - Azure 28 Where is my dataRelational DB

Pricing - RDS 29 Where is my dataRelational DB pay for what you use Components: – Storage capacity used (per GB-month and per million I/O requests) – Deployment type - Single-AZ/Multi-AZ (AZ-Availabiity Zone) – DB instance hours (per hour) – Additional backup storage (per GB-month( – Data transfer in / out (per GB per month)

Case Study Where is my dataSQL Azure

Case Study records-on.html records-on.html How do I make querying 154 million addresses as fast as possible? if I want 100GB of SQL Server and I want to hit it 10 million times, it’ll cost me $176 a month (now its ~20$) 31 Where is my dataSQL Azure

NoSql - Azure Tables, DynamoDB 32

NoSql Relational technology has long been the dominant approach for data. Large amount of data – Scaling across many servers is challenging. Different kind of data on Relational DB – JSON documents – Graphs ACID – Atomicity, Consistency, Isolation, Durability. CAP - Consistency, Availability, Partition tolerance. BASE - Basic Availability, Soft-state, Eventual consistency. 33 Where is my dataNoSql

34 Where is my dataNoSql

Table Storage Concepts EntityTableAccount contoso Name =… = … Name =… Add= customers Photo ID =… Date =… photos Photo ID =… Date =… 35 Where is my dataNoSqlAzure Tables

Table Storage Not RDBMS – No relationships between entities – NoSql Entity can have up to 255 properties - Up to 1MB per entity Mandatory Properties for every entity – PartitionKey & RowKey (only indexed properties) Uniquely identifies an entity Same RowKey can be used in different PartitionKey Defines the sort order – Timestamp - Optimistic Concurrency Strongly consistent 36 Where is my dataNoSqlAzure Tables

No Fixed Schema FIRSTLASTBIRTHDATE WadeWegner2/2/1981 NathanTotten3/15/1965 NickHarrisMay 1, 1976 FAV SPORT Canoeing 37 Where is my dataNoSqlAzure Tables

Table Object Model ITableEntity interface –PartitionKey, RowKey, Timestamp, and Etag properties – Implemented by TableEntity and DynamicTableEntity 38 // This class defines one additional property of integer type, // since it derives from TableEntity it will be automatically // serialized and deserialized. public class SampleEntity : TableEntity { public int SampleProperty { get; set; } } Where is my dataNoSqlAzure Tables

Sample – Inserting an Entity into a Table 39 // You will need the following using statements using Microsoft.WindowsAzure.Storage; using Microsoft.WindowsAzure.Storage.Table; // Create the table client. CloudTableClient tableClient = storageAccount.CreateCloudTableClient(); CloudTable peopleTable = tableClient.GetTableReference("people"); peopleTable.CreateIfNotExists(); // Create a new customer entity. CustomerEntity customer1 = new CustomerEntity("Harp", "Walter"); customer1. = customer1.PhoneNumber = " "; // Create an operation to add the new customer to the people table. TableOperation insertCustomer1 = TableOperation.Insert(customer1); // Submit the operation to the table service. peopleTable.Execute(insertCustomer1); Where is my dataNoSqlAzure Tables

Retrieve 40 // Create the table client. CloudTableClient tableClient = storageAccount.CreateCloudTableClient(); CloudTable peopleTable = tableClient.GetTableReference("people"); // Retrieve the entity with partition key of "Smith" and row key of "Jeff" TableOperation retrieveJeffSmith = TableOperation.Retrieve ("Smith", "Jeff"); // Retrieve entity CustomerEntity specificEntity = (CustomerEntity)peopleTable.Execute(retrieveJeffSmith).Result; Where is my dataNoSqlAzure Tables

Table Storage – Important Points Azure Tables can store TBs of data Tables Operations are fast Tables are distributed –PartitionKey defines the partition – A table might be stored in different partitions on different storage devices. 41 Where is my dataNoSqlAzure Tables

Pricing 42 Where is my dataNoSqlAzure Tables

Case Study Where is my dataNoSqlAzure Tables

Case Study - How do I make querying 154 million addresses as fast as possible? – the domain is the partition key and the alias is the row key if I want 100GB of storage and I want to hit it 10 million times, it’ll cost me $8 a month SQL Server will cost $176 a month - 22 times more expensive 44 Where is my dataNoSqlAzure Tables

DynamoDB Item can have up to 64KB per entity Item stored on SSDs and are replicated across multiple Availability Zones in a Region Item has a primary key can either be a single-attribute hash key or a composite hash-range key Supports secondary indexes 45 Where is my dataNoSqlAWS DynamoDB

DynamoDB Eventually-consistent reads (by default), and strongly-consistent reads (optional) Provisioned Throughput - the request throughput you want your table to be able to achieve – 10 units of Write Capacity (enough capacity to do up to 36,000 writes per hour)* – 50 units of Read Capacity (enough capacity to do up to 180,000 strongly consistent reads, or 360,000 eventually consistent reads, per hour) 46 Where is my dataNoSqlAWS DynamoDB

Pricing Pay for what you use Components: – Provisioned throughput capacity (per hour) – Indexed data storage (per GB per month) – Data transfer out (per GB per month) 47 Where is my dataNoSqlAWS DynamoDB

DynamoDB Item can have up to 64KB per entity Item stored on SSDs and are replicated across multiple Availability Zones in a Region Item has a primary key can either be a single-attribute hash key or a composite hash-range key Supports secondary indexes 48 Where is my dataNoSqlAWS DynamoDB

MapReduce on the Cloud 49

Hadoop in the cloud Hadoop on Azure Cloud Some Facts: – 2013 Global mobile data traffic reached 1.5 exabytes per month – Cisco predicts 1.1 zettabytes (1000 exabyte) of internet traffic in 2016 Cisco 50 Where is my dataMapReduce

MapReduce – The BigData Power Map – takes input and output key;value pairs 51 (Key1,Value1) (Key2,Value2) : (Key n,Value n ) Where is my dataMapReduce

MapReduce – The BigData Power Reduce – take group of values per key and produce new group of values 52 Key1: [value1-1,Value1-2…] Key2: [value2-1,Value2-2…] Key n : [valueN-1,ValueN-2…] [new_value1-1,new_value1-2…] [new_value2-1,new_value2-2…] [new_valueN-1,new_valueN-2…] :: Where is my dataMapReduce

Server MapReduce - How Does It Work? Files Server Where is my dataMapReduce

So How Does It Work? Server RUNTIME Code Where is my dataMapReduce

Elastic Map Reduce (EMR) 55 Where is my dataMapReduceEMR Amazon Hadoop on the Cloud Hortonworks and Microsoft Hadoop to Windows Cluster of EC2 Pricing: – hourly rate for every instance hour (by instance type) – Additional EMR price per EC2 instance –

HDInsight 56 Where is my dataMapReduceHDInsight MS Hadoop on (not only) Azure Cloud Hortonworks and Microsoft Hadoop to Windows Native integration with.NET

Finding common friends Facebook shows you how many common friends you have with someone There were 1,310,000,000 active users in facebook with130 friends on average ( ) Calculating the mutual friends 57 Where is my dataHDInsight

Finding common friends We can represent Friend Relationship as: Note that a Friend relationship is Symmetrical – if A is a friend of B then B is a friend of A 58 Where is my dataHDInsight Someone  [List of his\her friends] Common Friends

Example of Friends file U1 -> U2 U3 U4 U2 -> U1 U3 U4 U5 U3 -> U1 U2 U4 U5 U4 -> U1 U2 U3 U5 U5 -> U2 U3 U4 59 Where is my dataHDInsight Common Friends

Designing our MapReduce job Each line from the file will input line to the Mapper The Mapper will output key-value pairs Key: (user, friend) – Sorted, friend might be before user value: list of friends 60 Where is my dataHDInsight Common Friends

Designing our MapReduce job - Mapper Each line from the file will input line to the Mapper The Mapper will output key-value pairs Key: (user, friend) – Sorted, friend might be before user value: list of friends Having the key sorted will help us with the reducer, same pairs will be provided together 61 Where is my dataHDInsight Common Friends

Mapper Example 62 Where is my dataHDInsight Common Friends Mapper Output:Given the Line: (U1 U2)  U2 U3 U4 (U1 U3)  U2 U3 U4 (U1 U4)  U2 U3 U4 U1  U2 U3 U4

Mapper Example 63 Where is my dataHDInsight Common Friends Mapper Output:Given the Line: (U1 U2)  U2 U3 U4 (U1 U3)  U2 U3 U4 (U1 U4)  U2 U3 U4 U1  U2 U3 U4 (U1 U2) -> U1 U3 U4 U5 (U2 U3) -> U1 U3 U4 U5 (U2 U4) -> U1 U3 U4 U5 (U2 U5) -> U1 U3 U4 U5 U2  U1 U3 U4 U5

Mapper Example – final result 64 Where is my dataHDInsight Common Friends Mapper Output:Given the Line: (U1 U2)  U2 U3 U4 (U1 U3)  U2 U3 U4 (U1 U4)  U2 U3 U4 U1  U2 U3 U4 (U1 U2) -> U1 U3 U4 U5 (U2 U3) -> U1 U3 U4 U5 (U2 U4) -> U1 U3 U4 U5 (U2 U5) -> U1 U3 U4 U5 U2  U1 U3 U4 U5 (U1 U3) -> U1 U2 U4 U5 (U2 U3) -> U1 U2 U4 U5 (U3 U4) -> U1 U2 U4 U5 (U3 U5) -> U1 U2 U4 U5 U3 -> U1 U2 U4 U5 Mapper Output:Given the Line: (U1 U4) -> U1 U2 U3 U5 (U2 U4) -> U1 U2 U3 U5 (U3 U4) -> U1 U2 U3 U5 (U4 U5) -> U1 U2 U3 U5 U4 -> U1 U2 U3 U5 (U2 U5) -> U2 U3 U4 (U3 U5) -> U2 U3 U4 (U4 U5) -> U2 U3 U4 U5 -> U2 U3 U4

Designing our MapReduce job - Reducer The input for the reducer will be structured as: (friend1, friend2)  (friend1 friends) (friend2 friends) The reducer will find the intersection between the lists Output: (friend1, friend2)  (intersection of friend1 and friend2 friends) 65 Where is my dataHDInsight Common Friends

Reducer Example 66 Where is my dataHDInsight Common Friends Reducer Output:Given the Line: (U1 U2) -> (U3 U4)(U1 U2) -> (U1 U3 U4 U5) (U2 U3 U4) (U1 U3) -> (U2 U4)(U1 U3) -> (U1 U2 U4 U5) (U2 U3 U4) (U1 U4) -> (U2 U3)(U1 U4) -> (U1 U2 U3 U5) (U2 U3 U4) (U2 U3) -> (U1 U4 U5)(U2 U3) -> (U1 U2 U4 U5) (U1 U3 U4 U5) (U2 U4) -> (U1 U3 U5)(U2 U4) -> (U1 U2 U3 U5) (U1 U3 U4 U5) (U2 U5) -> (U3 U4)(U2 U5) -> (U1 U3 U4 U5) (U2 U3 U4) (U3 U4) -> (U1 U2 U5)(U3 U4) -> (U1 U2 U3 U5) (U1 U2 U4 U5) (U3 U5) -> (U2 U4)(U3 U5) -> (U1 U2 U4 U5) (U2 U3 U4) (U4 U5) -> (U2 U3)(U4 U5) -> (U1 U2 U3 U5) (U2 U3 U4)

Creating c# MapReduce 67 Where is my dataHDInsight Common Friends

Creating c# MapReduce - Mapper 68 Where is my dataHDInsight Common Friends public class CommonFriendsMapper:MapperBase { public override void Map(string inputLine, MapperContext context) { var strings = inputLine.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries); if (strings.Any()) { var currentUser = strings[0]; var friends = strings.Skip(1); foreach (var friend in friends) { var keyArr = new[] {currentUser, friend}; Array.Sort(keyArr); var key = String.Join(" ", keyArr); context.EmitKeyValue(key, string.Join(" ",friends)); }

Creating c# MapReduce - Reduce 69 Where is my dataHDInsight Common Friends public class CommonFriendsReducer:ReducerCombinerBase { public override void Reduce(string key, IEnumerable strings, ReducerCombinerContext context) { var friendsLists = strings.Select(friendList => friendList.Split(' ')).ToList(); var intersection = friendsLists[0].Intersect(friendsLists[1]); context.EmitKeyValue(key, string.Join(" ", intersection)); }

Creating c# MapReduce – Hadoop Job 70 Where is my dataHDInsight Common Friends HadoopJobConfiguration myConfig = new HadoopJobConfiguration(); myConfig.InputPath = "wasb:///example/data/friends/friends"; myConfig.OutputFolder = "wasb:////example/data/friends/output"; var hadoop = Hadoop.Connect(clusterUri, clusterUserName, hadoopUserName, clusterPassword, azureStorageAccount, azureStorageKey, azureStorageContainer, createContinerIfNotExist); var jobResult = hadoop.MapReduceJob.Execute (myConfig); int exitCode = jobResult.Info.ExitCode; // (0 – success, otherwise – failure)

Pricing 71 Where is my dataHDInsight 10 node cluster that will exist for 24 hours: Secure Gateway Node - free. head node USD per 24-hour day 1 data node USD per 24-hour day 10 data nodes USD per 24-hour day Total: $92.16 USD

WRAP UP 72

Comparing the alternatives 73 Storage TypeWhen Should you UseImplications BLOBUnstructured data Files -Application Logic Responsibility -Consider using HDInsight(Hadoop) Relational DBStructured Relational Data ACID transactions -SQL DML+DDL -Could affect scalability -BI Abilities -Reporting Azure Tables, DynamoDB Structured Data Loose Schema Geo Replication (High DR) Auto Sharding -OData, REST -Application Logic -Responsibility(Multiple Schemas) Where is my dataWrap Up

What have we seen Blobs Relational DB NoSql MapReduce in the Cloud 74 Where is my dataWrap Up

What’s Next NoSql – MongoDB, Cassandra, CouchDB, RavenDB Hadoop ecosystem – Hive, Pig, SQOOP, Mahout Cache Options - Amazon ElastiCache, Azure Cache, InRole Cache, Redis Where is my dataWrap Up

Presenter contact details c: t: e: b: TamirDresher.comTamirDresher.com w: