Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction to Big Data and NoSQL SQL Azure Saturday April, 21, 2012 Don Demsak Advisory Solutions Architect EMC Consulting www.donxml.com.

Similar presentations


Presentation on theme: "1 Introduction to Big Data and NoSQL SQL Azure Saturday April, 21, 2012 Don Demsak Advisory Solutions Architect EMC Consulting www.donxml.com."— Presentation transcript:

1 1 Introduction to Big Data and NoSQL SQL Azure Saturday April, 21, 2012 Don Demsak Advisory Solutions Architect EMC Consulting www.donxml.com

2 2 Meet Don Advisory Solutions Architect –EMC Consulting Application Architecture, Development & Design DonXml.com, Twitter: donxml Email – don@donxml.comdon@donxml.com SlideShare - http://www.slideshare.net/dondemsakhttp://www.slideshare.net/dondemsak

3 3 The era of Big Data

4 4 How did we get here? Expensive –Processors –Disk space –Memory –Operating Systems –Software –Programmers Expensive –Processors –Disk space –Memory –Operating Systems –Software –Programmers Monoculture –Limit CPU cycles –Limit disk space –Limit memory –Limited OS Development –Limited Software –Programmers Mono-lingual Mono-persistence Monoculture –Limit CPU cycles –Limit disk space –Limit memory –Limited OS Development –Limited Software –Programmers Mono-lingual Mono-persistence

5 5 Typical RDBMS Implementations Fixed table schemas Small but frequent reads/writes Large batch transactions Focus on ACID –Atomicity –Consistency –Isolation –Durability

6 6 How we scale RDBMS implementations

7 7 1 st Step – Build a relational database Database

8 8 2 nd Step – Table Partitioning Database p1 p2 p3

9 9 3 rd Step – Database Partitioning Web Tier Browser B/L Tier Database Customer #2 Web Tier Browser B/L Tier Database Customer #1 Web Tier Browser B/L Tier Database Customer #3

10 10 4 th Step – Move to the cloud? Web Tier Browser B/L Tier SQL Azure Federation SQL Azure Federation Customer #2 Web Tier Browser B/L Tier SQL Azure Federation SQL Azure Federation Customer #1 Web Tier Browser B/L Tier SQL Azure Federation SQL Azure Federation Customer #3

11 11 There has to be other ways

12 12 Polyglot Persistence

13 13 Polyglot Programmer

14 14

15 15 Where Did NoSQL Originate? 1998 - Carlo Strozzi –NoSQL project - lightweight open-source relational DB with no SQL interface 2009 - Eric Evans & Johan Oskarsson of Last.fm wanted to organize an event to discuss open- source distributed databases

16 16 NoSQL (loose) Definition (often) Open source Non-relational Distributed (often) don’t guarantee ACID

17 17 Atlanta 2009 No:sql(east) conference –select fun, profit from real_world where relational=false Billed as “conference of no-rel datastores”

18 18 Types Of NoSQL Data Stores

19 19 5 Groups of Data Models RelationalDocumentKey ValueGraphColumn Family

20 20 Document Store Apache Jackrabbit CouchDB MongoDB SimpleDB XML Databases –MarkLogic Server –eXist.

21 21 Document? Okay think of a web page... –Relational model requires column/tag –Lots of empty columns –Wasted space Document model just stores the pages as is –Saves on space –Very flexible.

22 22 Graph Storage AllegroGraph Core Data Neo4j DEX FlockDB Microsoft Trinity (research project) –http://research.microsoft.com/en-us/projects/trinity/

23 23 What’s a graph? Graph consists of –Node (‘stations’ of the graph) –Edges (lines between them) FlockDB –Created by the Twitter folks –Nodes = Users –Edges = Nature of relationship between nodes.

24 24 Key/Value Stores On disk Cache in Ram Eventually Consistent –Weak Definition “If no updates occur for a period, eventually all updates will propagate through the system and all replicas will be consistent” –Strong Definition “for a given update and a given replica eventually either the update reaches the replica or the replica retires” Ordered –Distributed Hash Table allows lexicographical processing

25 25 Key/Value Examples Azure AppFabric Cache Memcache-d VMWare vFabric GemFire

26 26 Object Databases Db4o GemStone/S InterSystems Caché Objectivity/DB ZODB

27 27 Tabular BigTable Mnesia Hbase Hypertable Azure Table Storage SQL Server 2012

28 28 Azure Table Storage Demo

29 29 Big Data

30 30 Big Data Definition Volumes & volumes of data Unstructured Semi-structured Not suited for Relational Databases Often utilizes MapReduce frameworks

31 31 Big Data Examples Cassandra Hadoop Greenplum Azure Storage EMC Atmos Amazon S3 SQL Azure (with Federations support)

32 32 Real World Example Twitter –The challenges Needs to store many graphs  Who you are following  Who’s following you  Who you receive phone notifications from etc To deliver a tweet requires rapid paging of followers Heavy write load as followers are added and removed Set arithmetic for @mentions (intersection of users).

33 33 What did they try? Started with Relational Databases Tried Key-Value storage of denormalized lists Did it work? –Nope Either good at  Handling the write load  Or paging large amounts of data  But not both

34 34 What did they need? Simplest possible thing that would work Allow for horizontal partitioning Allow write operations to Arrive out of order –Or be processed more than once –Failures should result in redundant work Not lost work!

35 35 The Result was FlockDB Stores graph data Not optimized for graph traversal operations Optimized for large adjacency lists –List of all edges in a graph Key is the edge value a set of the node end points Optimized for fast read and write Optimized for page-able set arithmetic.

36 36 How Does it Work? Stores graphs as sets of edges between nodes Data is partitioned by node –All queries can be answered by a single partition Write operations are idempotent –Can be applied multiple times without changing the result And commutative –Changing the order of operands doesn’t change the result.

37 37 Working With Big Data

38 38 ACID Atomicity –All or Nothing Consistency –Valid according to all defined rules Isolation –No transaction should be able to interfere with another transaction Durability –Once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors

39 39 BASE Basically Available –High availability but not always consistent Soft state –Background cleanup mechanism Eventual consistency –Given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent.

40 40 Traditional (relational) Approach Extract Transform Load Transactional Data Store Data Warehouse

41 41 Big Data Approach MapReduce Pattern/Framework –an Input Reader –Map Function – To transform to a common shape (format) –a partition function –a compare function –Reduce Function –an Output Writer

42 42 MongoDB Example > // map function > m = function(){... this.tags.forEach(... function(z){... emit( z, { count : 1 } );... }... );...}; > // reduce function > r = function( key, values ){... var total = 0;... for ( var i=0; i<values.length; i++ )... total += values[i].count;... return { count : total };...}; > // execute > res = db.things.mapReduce(m, r, { out : "myoutput" } );

43 43 MongoDB Demo

44 44 Big Data on Azure Azure Table Storage –Azure Service Bus SQL Azure Federations MongoDB on Azure –http://www.mongodb.org/display/DOCS/MongoDB+on+Azurehttp://www.mongodb.org/display/DOCS/MongoDB+on+Azure Hadoop on Azure –https://www.hadooponazure.com/https://www.hadooponazure.com/

45 45 Using Azure for Computing Master Client Data Worker Data Job/Task Scheduler Sockets

46 46 Moving to Event Based Architecture Web Role Queue Req Web Role Req Monitor queue length against user’s expectations Web Role Worker Role

47 47 Aggregate Stores

48 48 Visualizing Aggregates ID: 1001 Customer: Ann Line Items 324112342$48$96 7074232341$56456 1251451$24 Payment Details Card: AmEx CC#: 12343 Expiration: 07/2015 Orders Customers Order Lines Credit Cards

49 49 Visualizing Aggregates ID: 1001 Customer: Ann Line Items 324112342$48$96 7074232341$56456 1251451$24 Payment Details Card: AmEx CC#: 12343 Expiration: 07/2015 { “SalesOrdersView”:{ ID: 1001, Customer: Ann, LineItems: [] …………….. ……………. …………….. }

50 50 MongoDB on Azure Demo

51 51 Next Steps Learn a NoSQL product –Great place to start – AppFabric Cache, Azure Table Storage, MongoDB Pick a new programming language to learn –Not Java or C#/VB –Node.js, JavaScript, F#

52 52 THANK YOU


Download ppt "1 Introduction to Big Data and NoSQL SQL Azure Saturday April, 21, 2012 Don Demsak Advisory Solutions Architect EMC Consulting www.donxml.com."

Similar presentations


Ads by Google