Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part III BigData Analysis Tools (Storm) Yuan Xue

Similar presentations


Presentation on theme: "Part III BigData Analysis Tools (Storm) Yuan Xue"— Presentation transcript:

1 Part III BigData Analysis Tools (Storm) Yuan Xue (yuan.xue@vanderbilt.edu)

2 Introduction  Limitation of Hadoop (MapReduce)  Batch-oriented big data solution at its heart  Gaps in ad-hoc and real-time data processing at massive scale  The need for a dedicated real-time analytics solution  “There’s no hack that will turn Hadoop into a realtime system; realtime data processing has a fundamentally different set of requirements than batch processing” -- Nathan Marz  Solution  Dremel (Google BigQuery) to support ad-hoc analytics  Storm (Twitter’s real-time computation) engine to provide solution in the real-time data analytics world.  Storm -- originally developed by BackType and running now under Twitter’s name, after BackType has been acquired by them.

3 Storm Architecture  Storm architecture very much resembles to Hadoop architecture  Two types of nodes: a master node and the worker nodes.  The master node runs Nimbus that is copying the code to the cluster nodes and assigns tasks to the workers – it has a similar role as JobTracker in Hadoop.  The worker nodes run the Supervisor which starts and stops worker processes – its role is similar to TaskTrackers in Hadoop.  The coordination and all states between Nimbus and Supervisors are managed by Zookepeer, so the architecture looks as follows:

4 Storm Concepts  Streams  Unbounded sequence of tuples  Spout  nodes that produce data to be processed by other nodes. It can read data from HTTP streams, databases, files, message queues, etc  Bolt  Bolts can both receive and produce data in the Storm cluster.  Execute: functions, filters, aggregation, joins, database access  Topology  Object that configures how the Storm cluster will look like: what Sprouts and Bolts it has and how they are chained together  Similar to a MR job

5 Stream Grouping  Question: When a tuple is emitted, which task does it go to?  Shuffle grouping: pick a random task  Fields grouping: consistent hashing on a subset of tuple fields  All grouping: send to all tasks  Global grouping: pick task with lowest id

6 Example Code  https://github.com/nathanmarz/storm- starter/blob/master/src/jvm/storm/starter/WordCountTopology.java

7 The Lambda architecture http://www.manning.com/marz/BDmeapch1.pdf http://lambda-architecture.net/

8 The Lambda architecture – Detailed View http://jameskinley.tumblr.com/post/37398560534/the-lambda-architecture-principles-for-architecting

9 Merge Realtime View into Batch View

10 Reference  http://bighadoop.wordpress.com/tag/storm/ http://bighadoop.wordpress.com/tag/storm/  http://jameskinley.tumblr.com/post/37398560534/the-lambda-architecture- principles-for-architecting http://jameskinley.tumblr.com/post/37398560534/the-lambda-architecture- principles-for-architecting  http://www.manning.com/marz/BDmeapch1.pdf http://www.manning.com/marz/BDmeapch1.pdf  https://github.com/nathanmarz/storm/wiki/Tutorial https://github.com/nathanmarz/storm/wiki/Tutorial


Download ppt "Part III BigData Analysis Tools (Storm) Yuan Xue"

Similar presentations


Ads by Google