Download presentation
Presentation is loading. Please wait.
Published byRosamund Watts Modified over 8 years ago
1
Tamir Dresher Senior Software Architect July 2, 2014 Where is my Data? (In the Cloud)
2
About Me Software architect, consultant and instructor Software Engineering Lecturer @ Ruppin Academic Center Technology addict 10 years of experience.NET and Native Windows Programming @tamir_dresher tamirdr@codevalue.net http://www.TamirDresher.comhttp://www.TamirDresher.com.
3
Agenda Storage Blob Relational DB NoSql DB MapReduce 3
4
Storage 4 Where is my dataStorage
5
Numbers – 1 Second is 1,132 Instagram photos uploaded 5 Where is my dataStorage
6
Numbers – 1 Second is 1,132 Instagram photos uploaded 1,365 Tumblr posts 6 Where is my dataStorage
7
Numbers – 1 Second is 1,132 Instagram photos uploaded 1,365 Tumblr posts 7,241 Tweets sent Tweets sent 7 Where is my dataStorage
8
Numbers – 1 Second is 1,132 Instagram photos uploaded 1,365 Tumblr posts 7,241 Tweets sent Tweets sent 44,512 Google searchesGoogle searches 8 Where is my dataStorage
9
Numbers – 1 Second is 1,132 Instagram photos uploaded 1,365 Tumblr posts 7,241 Tweets sent Tweets sent 44,512 Google searchesGoogle searches 84,921 YouTube videos viewed http://www.internetlivestats.com/one-second/ http://onesecond.designly.com/ 9 Where is my dataStorage
10
Storage Prices 10
11
Types of information Product catalogs Employee data User profiles Images Session state Shopping cart Game scores and state 11 Social feeds Query output results Airline seating charts Inventory management system Game leaderboards Performance counters Weather Stock quotes Where is my dataStorage
12
Gartner Magic Quadrant 12 IaaS PaaS
13
North America Europe Asia Pacific Data centers Windows Azure Growing Global Presence Storage SLA – 99.99% 52.56 minutes per year http://azure.microsoft.com/en-us/support/legal/sla
14
AZURE BLOBS 14
15
What is a BLOB BLOB – Binary Large OBject Storage for any type of entity such as binary files and text documents Distributed File Service (DFS) – Scalability and High availability BLOB file is distributed between multiple server and replicated at least 3 times 15 Where is my dataBLOB
16
Azure Blob Storage Concepts BlobContainerAccount http://.blob.core.windows.net/ / Pages/ Blocks contoso PIC01.JPG Block/Page PIC02.JPG images VID1.AVIvideos 16 Where is my dataBLOB
17
Amazon Simple Storage Service(S3) Concepts ObjectBucketAccount http://. s3.amazonaws.com/ contoso PIC01.JPG PIC02.JPG images VID1.AVIvideos 17 Where is my dataBLOB
18
Blob Operations 18 REST Where is my dataBLOB
19
DEMO Creating a Blob 19
20
BLOBS - Azure Block blob - up to 200 GB in size Page blobs – up to 1 TB in size Total Account Capacity - 500 TB 20 Where is my dataBLOB
21
BLOBS - AWS Object size – up to 5 TB AWS account can own up to 100 buckets at a time, unlimited objects 99.999999999% durability, 99.99% availability Reduced Redundancy Storage (RRS) - 99.99% durability and 99.99% Amazon Glaciar - low-cost storage service as a storage option for data archival. 21 Where is my dataBLOB
22
Pricing - AWS pay for what you use Components: – Storage capacity used (per GB per month) – Data transfer out (per GB per month) – Requests (per n thousand requests per month) http://aws.amazon.com/s3/pricing/ 22 Where is my dataBLOBPricing
23
Pricing - Azure pay for what you use or 6,12 months plan Components – Storage capacity used (per GB per month) – Replication option (LRS, GRS, RA-GRS) – Number of requests (per n thousand requests per month) – Data egress (per GB per month) http://azure.microsoft.com/en-us/pricing/details/storage/ 23 Where is my dataBLOBPricing
24
RELATIONAL DB 24
25
Relational Database Service (RDS) MySQL, Oracle, or Microsoft SQL Server in the cloud No administrative overheads Dedicated Hardware High Availability pay-as-you-grow pricing Familiar Development Model* * Despite missing features and some limitations - http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_SQLServer.html http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_SQLServer.html 25 Where is my dataRelational DB
26
SQL Azure SQL Server in the cloud No administrative overheads Shared or Reserved (Dedicated) Hardware High Availability pay-as-you-grow pricing Familiar Development Model* * Despite missing features and some limitations - http://msdn.microsoft.com/en-us/library/ff394115.aspxhttp://msdn.microsoft.com/en-us/library/ff394115.aspx 26 Where is my dataRelational DB
27
DEMO Creating and Using SQL Azure 27
28
PricingSQL - Azure 28 Where is my dataRelational DB
29
Pricing - RDS 29 Where is my dataRelational DB pay for what you use Components: – Storage capacity used (per GB-month and per million I/O requests) – Deployment type - Single-AZ/Multi-AZ (AZ-Availabiity Zone) – DB instance hours (per hour) – Additional backup storage (per GB-month( – Data transfer in / out (per GB per month) http://aws.amazon.com/rds/pricing/
30
Case Study - https://haveibeenpwned.com/https://haveibeenpwned.com/ 30 Where is my dataSQL Azure
31
Case Study - https://haveibeenpwned.com/https://haveibeenpwned.com/ http://www.troyhunt.com/2013/12/working-with-154-million- records-on.html http://www.troyhunt.com/2013/12/working-with-154-million- records-on.html How do I make querying 154 million email addresses as fast as possible? if I want 100GB of SQL Server and I want to hit it 10 million times, it’ll cost me $176 a month (now its ~20$) 31 Where is my dataSQL Azure
32
NoSql - Azure Tables, DynamoDB 32
33
NoSql Relational technology has long been the dominant approach for data. Large amount of data – Scaling across many servers is challenging. Different kind of data on Relational DB – JSON documents – Graphs ACID – Atomicity, Consistency, Isolation, Durability. CAP - Consistency, Availability, Partition tolerance. BASE - Basic Availability, Soft-state, Eventual consistency. 33 Where is my dataNoSql
34
34 Where is my dataNoSql
35
Table Storage Concepts EntityTableAccount contoso Name =… Email = … Name =… EMailAdd= customers Photo ID =… Date =… photos Photo ID =… Date =… 35 Where is my dataNoSqlAzure Tables
36
Table Storage Not RDBMS – No relationships between entities – NoSql Entity can have up to 255 properties - Up to 1MB per entity Mandatory Properties for every entity – PartitionKey & RowKey (only indexed properties) Uniquely identifies an entity Same RowKey can be used in different PartitionKey Defines the sort order – Timestamp - Optimistic Concurrency Strongly consistent 36 Where is my dataNoSqlAzure Tables
37
No Fixed Schema FIRSTLASTBIRTHDATE WadeWegner2/2/1981 NathanTotten3/15/1965 NickHarrisMay 1, 1976 FAV SPORT Canoeing 37 Where is my dataNoSqlAzure Tables
38
Table Object Model ITableEntity interface –PartitionKey, RowKey, Timestamp, and Etag properties – Implemented by TableEntity and DynamicTableEntity 38 // This class defines one additional property of integer type, // since it derives from TableEntity it will be automatically // serialized and deserialized. public class SampleEntity : TableEntity { public int SampleProperty { get; set; } } Where is my dataNoSqlAzure Tables
39
Sample – Inserting an Entity into a Table 39 // You will need the following using statements using Microsoft.WindowsAzure.Storage; using Microsoft.WindowsAzure.Storage.Table; // Create the table client. CloudTableClient tableClient = storageAccount.CreateCloudTableClient(); CloudTable peopleTable = tableClient.GetTableReference("people"); peopleTable.CreateIfNotExists(); // Create a new customer entity. CustomerEntity customer1 = new CustomerEntity("Harp", "Walter"); customer1.Email = "Walter@contoso.com"; customer1.PhoneNumber = "425-555-0101"; // Create an operation to add the new customer to the people table. TableOperation insertCustomer1 = TableOperation.Insert(customer1); // Submit the operation to the table service. peopleTable.Execute(insertCustomer1); Where is my dataNoSqlAzure Tables
40
Retrieve 40 // Create the table client. CloudTableClient tableClient = storageAccount.CreateCloudTableClient(); CloudTable peopleTable = tableClient.GetTableReference("people"); // Retrieve the entity with partition key of "Smith" and row key of "Jeff" TableOperation retrieveJeffSmith = TableOperation.Retrieve ("Smith", "Jeff"); // Retrieve entity CustomerEntity specificEntity = (CustomerEntity)peopleTable.Execute(retrieveJeffSmith).Result; Where is my dataNoSqlAzure Tables
41
Table Storage – Important Points Azure Tables can store TBs of data Tables Operations are fast Tables are distributed –PartitionKey defines the partition – A table might be stored in different partitions on different storage devices. 41 Where is my dataNoSqlAzure Tables
42
Pricing 42 Where is my dataNoSqlAzure Tables
43
Case Study - https://haveibeenpwned.com/https://haveibeenpwned.com/ 43 Where is my dataNoSqlAzure Tables
44
Case Study - https://haveibeenpwned.com/https://haveibeenpwned.com/ How do I make querying 154 million email addresses as fast as possible? foo@bar.com – the domain is the partition key and the alias is the row key if I want 100GB of storage and I want to hit it 10 million times, it’ll cost me $8 a month SQL Server will cost $176 a month - 22 times more expensive 44 Where is my dataNoSqlAzure Tables
45
DynamoDB Item can have up to 64KB per entity Item stored on SSDs and are replicated across multiple Availability Zones in a Region Item has a primary key can either be a single-attribute hash key or a composite hash-range key Supports secondary indexes 45 Where is my dataNoSqlAWS DynamoDB
46
DynamoDB Eventually-consistent reads (by default), and strongly-consistent reads (optional) Provisioned Throughput - the request throughput you want your table to be able to achieve – 10 units of Write Capacity (enough capacity to do up to 36,000 writes per hour)* – 50 units of Read Capacity (enough capacity to do up to 180,000 strongly consistent reads, or 360,000 eventually consistent reads, per hour) 46 Where is my dataNoSqlAWS DynamoDB
47
Pricing Pay for what you use Components: – Provisioned throughput capacity (per hour) – Indexed data storage (per GB per month) – Data transfer out (per GB per month) http://aws.amazon.com/dynamodb/pricing/ 47 Where is my dataNoSqlAWS DynamoDB
48
DynamoDB Item can have up to 64KB per entity Item stored on SSDs and are replicated across multiple Availability Zones in a Region Item has a primary key can either be a single-attribute hash key or a composite hash-range key Supports secondary indexes 48 Where is my dataNoSqlAWS DynamoDB
49
MapReduce on the Cloud 49
50
Hadoop in the cloud Hadoop on Azure Cloud Some Facts: – 2013 Global mobile data traffic reached 1.5 exabytes per month – Cisco predicts 1.1 zettabytes (1000 exabyte) of internet traffic in 2016 Cisco 50 Where is my dataMapReduce
51
MapReduce – The BigData Power Map – takes input and output key;value pairs 51 (Key1,Value1) (Key2,Value2) : (Key n,Value n ) Where is my dataMapReduce
52
MapReduce – The BigData Power Reduce – take group of values per key and produce new group of values 52 Key1: [value1-1,Value1-2…] Key2: [value2-1,Value2-2…] Key n : [valueN-1,ValueN-2…] [new_value1-1,new_value1-2…] [new_value2-1,new_value2-2…] [new_valueN-1,new_valueN-2…] :: Where is my dataMapReduce
53
Server MapReduce - How Does It Work? Files Server Where is my dataMapReduce
54
So How Does It Work? Server RUNTIME Code Where is my dataMapReduce
55
Elastic Map Reduce (EMR) 55 Where is my dataMapReduceEMR Amazon Hadoop on the Cloud Hortonworks and Microsoft Hadoop to Windows Cluster of EC2 Pricing: – hourly rate for every instance hour (by instance type) – Additional EMR price per EC2 instance – http://aws.amazon.com/elasticmapreduce/pricing/ http://aws.amazon.com/elasticmapreduce/pricing/
56
HDInsight 56 Where is my dataMapReduceHDInsight MS Hadoop on (not only) Azure Cloud Hortonworks and Microsoft Hadoop to Windows Native integration with.NET
57
Finding common friends Facebook shows you how many common friends you have with someone There were 1,310,000,000 active users in facebook with130 friends on average (01.01.2014) Calculating the mutual friends 57 Where is my dataHDInsight
58
Finding common friends We can represent Friend Relationship as: Note that a Friend relationship is Symmetrical – if A is a friend of B then B is a friend of A 58 Where is my dataHDInsight Someone [List of his\her friends] Common Friends
59
Example of Friends file U1 -> U2 U3 U4 U2 -> U1 U3 U4 U5 U3 -> U1 U2 U4 U5 U4 -> U1 U2 U3 U5 U5 -> U2 U3 U4 59 Where is my dataHDInsight Common Friends
60
Designing our MapReduce job Each line from the file will input line to the Mapper The Mapper will output key-value pairs Key: (user, friend) – Sorted, friend might be before user value: list of friends 60 Where is my dataHDInsight Common Friends
61
Designing our MapReduce job - Mapper Each line from the file will input line to the Mapper The Mapper will output key-value pairs Key: (user, friend) – Sorted, friend might be before user value: list of friends Having the key sorted will help us with the reducer, same pairs will be provided together 61 Where is my dataHDInsight Common Friends
62
Mapper Example 62 Where is my dataHDInsight Common Friends Mapper Output:Given the Line: (U1 U2) U2 U3 U4 (U1 U3) U2 U3 U4 (U1 U4) U2 U3 U4 U1 U2 U3 U4
63
Mapper Example 63 Where is my dataHDInsight Common Friends Mapper Output:Given the Line: (U1 U2) U2 U3 U4 (U1 U3) U2 U3 U4 (U1 U4) U2 U3 U4 U1 U2 U3 U4 (U1 U2) -> U1 U3 U4 U5 (U2 U3) -> U1 U3 U4 U5 (U2 U4) -> U1 U3 U4 U5 (U2 U5) -> U1 U3 U4 U5 U2 U1 U3 U4 U5
64
Mapper Example – final result 64 Where is my dataHDInsight Common Friends Mapper Output:Given the Line: (U1 U2) U2 U3 U4 (U1 U3) U2 U3 U4 (U1 U4) U2 U3 U4 U1 U2 U3 U4 (U1 U2) -> U1 U3 U4 U5 (U2 U3) -> U1 U3 U4 U5 (U2 U4) -> U1 U3 U4 U5 (U2 U5) -> U1 U3 U4 U5 U2 U1 U3 U4 U5 (U1 U3) -> U1 U2 U4 U5 (U2 U3) -> U1 U2 U4 U5 (U3 U4) -> U1 U2 U4 U5 (U3 U5) -> U1 U2 U4 U5 U3 -> U1 U2 U4 U5 Mapper Output:Given the Line: (U1 U4) -> U1 U2 U3 U5 (U2 U4) -> U1 U2 U3 U5 (U3 U4) -> U1 U2 U3 U5 (U4 U5) -> U1 U2 U3 U5 U4 -> U1 U2 U3 U5 (U2 U5) -> U2 U3 U4 (U3 U5) -> U2 U3 U4 (U4 U5) -> U2 U3 U4 U5 -> U2 U3 U4
65
Designing our MapReduce job - Reducer The input for the reducer will be structured as: (friend1, friend2) (friend1 friends) (friend2 friends) The reducer will find the intersection between the lists Output: (friend1, friend2) (intersection of friend1 and friend2 friends) 65 Where is my dataHDInsight Common Friends
66
Reducer Example 66 Where is my dataHDInsight Common Friends Reducer Output:Given the Line: (U1 U2) -> (U3 U4)(U1 U2) -> (U1 U3 U4 U5) (U2 U3 U4) (U1 U3) -> (U2 U4)(U1 U3) -> (U1 U2 U4 U5) (U2 U3 U4) (U1 U4) -> (U2 U3)(U1 U4) -> (U1 U2 U3 U5) (U2 U3 U4) (U2 U3) -> (U1 U4 U5)(U2 U3) -> (U1 U2 U4 U5) (U1 U3 U4 U5) (U2 U4) -> (U1 U3 U5)(U2 U4) -> (U1 U2 U3 U5) (U1 U3 U4 U5) (U2 U5) -> (U3 U4)(U2 U5) -> (U1 U3 U4 U5) (U2 U3 U4) (U3 U4) -> (U1 U2 U5)(U3 U4) -> (U1 U2 U3 U5) (U1 U2 U4 U5) (U3 U5) -> (U2 U4)(U3 U5) -> (U1 U2 U4 U5) (U2 U3 U4) (U4 U5) -> (U2 U3)(U4 U5) -> (U1 U2 U3 U5) (U2 U3 U4)
67
Creating c# MapReduce 67 Where is my dataHDInsight Common Friends
68
Creating c# MapReduce - Mapper 68 Where is my dataHDInsight Common Friends public class CommonFriendsMapper:MapperBase { public override void Map(string inputLine, MapperContext context) { var strings = inputLine.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries); if (strings.Any()) { var currentUser = strings[0]; var friends = strings.Skip(1); foreach (var friend in friends) { var keyArr = new[] {currentUser, friend}; Array.Sort(keyArr); var key = String.Join(" ", keyArr); context.EmitKeyValue(key, string.Join(" ",friends)); }
69
Creating c# MapReduce - Reduce 69 Where is my dataHDInsight Common Friends public class CommonFriendsReducer:ReducerCombinerBase { public override void Reduce(string key, IEnumerable strings, ReducerCombinerContext context) { var friendsLists = strings.Select(friendList => friendList.Split(' ')).ToList(); var intersection = friendsLists[0].Intersect(friendsLists[1]); context.EmitKeyValue(key, string.Join(" ", intersection)); }
70
Creating c# MapReduce – Hadoop Job 70 Where is my dataHDInsight Common Friends HadoopJobConfiguration myConfig = new HadoopJobConfiguration(); myConfig.InputPath = "wasb:///example/data/friends/friends"; myConfig.OutputFolder = "wasb:////example/data/friends/output"; Environment.SetEnvironmentVariable("HADOOP_HOME", @"c:\hadoop"); Environment.SetEnvironmentVariable("Java_HOME", @"c:\hadoop\jvm"); var hadoop = Hadoop.Connect(clusterUri, clusterUserName, hadoopUserName, clusterPassword, azureStorageAccount, azureStorageKey, azureStorageContainer, createContinerIfNotExist); var jobResult = hadoop.MapReduceJob.Execute (myConfig); int exitCode = jobResult.Info.ExitCode; // (0 – success, otherwise – failure)
71
Pricing 71 Where is my dataHDInsight 10 node cluster that will exist for 24 hours: Secure Gateway Node - free. head node - 15.36 USD per 24-hour day 1 data node - 7.68 USD per 24-hour day 10 data nodes - 76.80 USD per 24-hour day Total: $92.16 USD
72
WRAP UP 72
73
Comparing the alternatives 73 Storage TypeWhen Should you UseImplications BLOBUnstructured data Files -Application Logic Responsibility -Consider using HDInsight(Hadoop) Relational DBStructured Relational Data ACID transactions -SQL DML+DDL -Could affect scalability -BI Abilities -Reporting Azure Tables, DynamoDB Structured Data Loose Schema Geo Replication (High DR) Auto Sharding -OData, REST -Application Logic -Responsibility(Multiple Schemas) Where is my dataWrap Up
74
What have we seen Blobs Relational DB NoSql MapReduce in the Cloud 74 Where is my dataWrap Up
75
What’s Next NoSql – MongoDB, Cassandra, CouchDB, RavenDB Hadoop ecosystem – Hive, Pig, SQOOP, Mahout Cache Options - Amazon ElastiCache, Azure Cache, InRole Cache, Redis http://blogs.msdn.com/b/windowsazure/ http://blogs.msdn.com/b/windowsazurestorage/ http://blogs.msdn.com/b/bigdatasupport/ 75 Where is my dataWrap Up
76
Presenter contact details c: +972-52-4772946 t: @tamir_dresher@tamir_dresher e: tamirdr@codevalue.nettamirdr@codevalue.net b: TamirDresher.comTamirDresher.com w: www.codevalue.netwww.codevalue.net
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.