Download presentation
Presentation is loading. Please wait.
Published byPaula Norman Modified over 8 years ago
1
강호영 2 0 1 3. 0 2.. 1 4.
2
2 Contents Storm introduction – Storm Architecture – Concepts of Storm – Operation Modes : Local Mode vs. Remote(Cluster) Mode – The key properties of Storm – Storm vs. Hadoop Development on Local Mode Construction of Storm Cluster Deployment in Storm Cluster Case Study: Storm-Contrib To-do List in the future ※ License - Storm: Eclipse Public License Version 1.0 - Zookeeper: Apache License Version 2.0 - ZeroMQ: GNU Lesser General Public License(LGPL) : 명시적 특허권 행사 불가 - JZMQ: GNU General Public License(GPL) Version 3: 비공개 소스에서 사용 불가
3
3 Storm introduction (1/5) Storm Architecture NimbusSupervisor 1UISupervisor N ·· · Zookeeper ØMQ JZMQ Python Storm Java Pre- requisite Master Node Worker Node Distributing code around the cluster Assigning tasks to each worker node Monitoring for failures ※ Analogous to a Hadoop JobTracker Distributing code around the cluster Assigning tasks to each worker node Monitoring for failures ※ Analogous to a Hadoop JobTracker Executes a portion of a topology. ※ Analogous to a Map-Reduce job ※ Topology: a graph of spouts and bolts - spout is a source of streams - bolt transforms the data in some way Executes a portion of a topology. ※ Analogous to a Map-Reduce job ※ Topology: a graph of spouts and bolts - spout is a source of streams - bolt transforms the data in some way Storm uses Zookeeper for coordinating the cluster. Zookeeper is not used for message passing, so the load Storm places on Zookeeper is quite low. Storm uses Zookeeper for coordinating the cluster. Zookeeper is not used for message passing, so the load Storm places on Zookeeper is quite low.
4
4 ConceptsContents Storm Topologies A logic for a realtime application Analogous to a MapReduce job in Hadoop - Storm Topology: runs forever until killing it - MapReduce job: eventually finishes Spouts A source of streams in a topology Read tuples from an external source and emit them into the topology Bolts All processing in topologies is done in bolts - filtering, functions, aggregations, joins, talking to databases, and more Stream groupings Tell how to send tuples between sets of tasks - Shuffle grouping, Fields grouping, All grouping, Global grouping, None grouping, Direct grouping Storm introduction (2/5) Concepts of Storm word- reader word- normalizer word- counter word- counter
5
5 Storm introduction (3/5) Operation Modes : Local Mode vs. Remote(Cluster) Mode Local ModeStorm Cluster Mode Purpose Development, testing, debugging Production (Real Life) Machine 1EA (Storm topologies run on the local machine in a single JVM) Nimbus(Master Node) : 1EA Zookeeper(Cluster Coordinator) : more than 1EA (odd num.) Supervisor(Worker Node) : lots of node OSLinux or Windows Recommended on Linux (but can be run in Windows) Running Using Maven or lein, can be run automatically (simple and easy) Zookeeper: zkServer.sh start Nimbus: storm nimbus Supervisor: storm supervisor UI(diagnostics on web): storm ui ※ Maven: a software project and comprehension tool. ※ leiningen: a build automation and dependency management tool.
6
6 Storm introduction (4/5) The key properties of Storm – Extremely broad set of use cases Stream Processing: Processing messages and updating databases Continuous Computation: doing a continuous query on data streams and streaming the results into clients Distributed RPC: parallelizing an intense query like a search query on the fly Scalable: by adding machines and increasing the parallelism setting of the topology due to Storm’s usages of Zookeeper for cluster coordination (ie. One of Storm’s initial applications processed 1,000,000 messages per second on a 10 node cluster) Guarantee no data loss: Storm guarantees that every message will be processed Extremely robust: Storm is easy to manage unlike Hadoop Fault-tolerant: If there are faults during execution of your computation, Storm will reassign tasks as necessary Programming language agnostic: Storm topologies and processing components can be defined in any language
7
7 Storm introduction (5/5) Storm vs. Hadoop StormHadoop Cluster Coordination Zookeeper Master Node Daemon NimbusJob Tracker Worker Node Daemon SupervisorMap-Reduced Job Computation Run forever (until killing the computation) Run at once Strong Purpose Real time processing (Real-time Hadoop) Batch processing Key functionRun incremental functionsRun idempotent functions latencyVery quicklyHigh ※ idempotent( 멱등 ( 冪等 ) 의 ): 메서드를 여러 번 호출해서 한 번만 호출한 것과 동일한 결과가 나오는 경우 ( 호출 시 같은 요청이 같은 결과를 리턴 )
8
8 Development on Local Mode Setting up development environment on Windows – JDK 1.6 : after installing, set JAVA_HOME and PATH variables – Maven 3.0.4 : to download the Storm dependencies (The Storm Maven dependencies reference all the libraries required to run Storm in Local Mode) – Git 1.4.1: to download(clone) the source code in GitHub – Storm 0.8.1: to interact with remote clusters for deploying after developing Building and Executing Example Source Code – Download Source Code git clone https://github.com/storm-book/examples-ch02-getting_started.git – Download Storm dependencies mvn dependency:copy-dependencies – Compile Source Code mvn compile – Execute Storm Topology on Local Mode mvn exec:java -Dexec.mainClass="TopologyMain" -Dexec.args="src/main/resources/words.txt“ ※ Example Source Code: has 1 Spout and 2 Bolt * WordReader(Spout): be responsible for reading words * WordNormalizer(Bolt): normalize words * WordCounter(Bolt): count words
9
9 Construction of Storm Cluster (1/4) Install Prerequisite Software (in all node) – JDK 1.6: sudo apt-get install openjdk-7-jdk – Python 2.7.3: sudo apt-get install python – ZeroMQ 3.2.2 wget http://download.zeromq.org/zeromq-3.2.2.tar.gz tar xvfzp zeromq-3.2.2.tar.gz cd zeromq-3.2.2./configure make sudo make install – JZMQ (on going project) git clone https://github.com/nathanmarz/jzmq.git cd jzmq./autogen.sh./configure make sudo make install
10
10 Construction of Storm Cluster (2/4) Overview of Storm Cluster Nimbus (Master Node) 192.168.0.101 Zookeeper (Coordinator) 192.168.0.102 Supervisor1 (Worker Node) 192.168.0.103 Supervisor2 (Worker Node) 192.168.0.104 Supervisor3 (Worker Node) 192.168.0.105 Stream Constructor (Source Router) Stream Constructor (Healthcare Device) Stream Constructor (Other Device)
11
11 Construction of Storm Cluster (3/4) Install ZooKeeper 3.4.5 (in Coordinator Node) – To install ZooKeeper, run: wget http://apache.mirror.cdnetworks.com/zookeeper/stable/zookeeper-3.4.5.tar.gz tar xvfzp zookeeper-3.4.5.tar.gz sudo mv zookeeper-3.4.5 /usr/local/zookeeper – Create and set configuration file: zoo.cfg cd /usr/local/zookeeper/conf cp –rp zoo_smaple.cfg zoo.cfg sudo vi zoo.cfg dataDir=/home/csos/storm/zk_data server.1=192.168.0.102:2888:3888 – Add following at $PATH variables sudo vi /etc/profile export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-armhf export ZOOKEEPER_HOME=/usr/local/zookeeper export PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin Source /etc/profile – Create Data Directory mkdir -p /home/csos/storm/zk_data cd /home/storm/zk_data vi myid 1 – Run Zookeeper Daemon under supervision zkServer.sh start – Stop Zookeeper Daemon zkServer.sh stop
12
12 Construction of Storm Cluster (4/4) Install Storm 0.8.1 (in Master Node and Worker Node) – To install ZooKeeper, run: wget https://github.com/downloads/nathanmarz/storm/storm-0.8.1.zip unzip storm-0.8.1.zip – Modify the configuration file: storm.yaml cd storm-0.8.1 vi conf/storm.yaml storm.zookeeper.servers: - "192.168.0.102" storm.local.dir: "/home/csos/storm" nimbus.host: "192.168.0.101" supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 – Add following at $PATH variables sudo vi /etc/profile export STORM_HOME=/home/csos/storm/storm-0.8.1 export PATH=$PATH:$STORM_HOME/bin – Run Nimbus Daemon under supervision (in Master Node) storm nimbus – Run Supervisor Daemon under supervision (in Worker Node) storm supervisor – Run Storm UI (in Master Node) storm ui
13
13 Deployment in Storm Cluster (1/2) Change LocalCluster to StormSubmitter – LocalCluster: to run the topology on local computer (on Local Mode) – StormSubmitter: to submit the topology to a running Storm Cluster (on Remote Mode) ★ After finishing developing, debugging and testing, change the LocalCluster to the StormSubmitter to run on Remote Mode
14
14 Deployment in Storm Cluster (2/2) Submit the topology to running Storm Cluster – To package as jar format using Maven, run: mvn package (after completing, you can see the jar file in target directory) – To submit the topology using Storm Client, run: storm jar target/Getting-Started-0.0.1-SNAPSHOT.jar TopologyMain /home/csos/temp/target/resources/words.txt – To stop/kill it, run: storm kill Getting-Started-Topologies
15
15 Case Study: Storm-Contrib (1/3) Storm-Contrib – A collection of spouts, bolts, serializers, DSLs, and other goodies to use with Storm (https://github.com/nathanmarz/storm-contrib) – The main projects (relevant to our projects) ProjectsContents storm-signals Storm primitives to allow out-of-band messaging to storm spouts and bolts storm-example- projects a library to compute moving average and spike detection for continuous stream of data that can be applied to finance or any other streams RealTimeTraffic A real time Traffic condition using GPS data based on Twitter storm stream compute cluster storm-tutorial Demonstrates realtime stream processing with the Storm framework mongo-storm Mongodb Storm Spout storm- websockets Use a Websockets stream as a spout in Storm
16
16 Case Study: Storm-Contrib (2/3) storm-signals v0.1.0 – Storm-Signals aims to provide a way to send messages (“signals”) to components (spouts/bolts) in a storm topology. Storm topologies: static in that modifications to a topology’s behavior Storm-Signals: provides a simple way to modify a topology’s behavior at runtime without redeployment – Uses Cases start/stop/pause/resume a spout from a process external to the storm topology change the source of a spout’s stream without redeploying the topology initiating processing of a set/batch of data based on a schedule periodically sending a dynamic SQL query to a spout – Usage Spout : inherits “BaseSignalSpout” and override “onSignal()” and “open()” method Signal Client: use to send some signals
17
17 Case Study: Storm-Contrib (3/3) storm-example-projects – Aims to create a library to continuously listen to and analyze stream of data generated by various sensor devices Compute moving average and spike detection for continuous stream of data Benchmark: processes 96,000 sensor values per second on cluster of 3 machines and detects spike within few milliseconds – Modify Spout LightEventSpout.java : change the PORT_NAMES[] entry to the serial port on your machine – To execute this topology, run: storm jar movingAverageSpikeDetection.jar movingAverage.SpikeDetectionTopology Fail to run because of not setting as proper environments (This project focuses on “arduino kit with circuit design of photo resistor”)
18
18 To-do List in the future Run storm-contrib projects in our Storm Cluster – Modify and Customize it by adjusting in our healthcare device environments (~2/20) – Apply the topologies to our source router and other realtime streamer (~2/25) Documentation about 2012 Source Router Program – Purpose: both understanding and refactoring/re-architecturing – Relevant module: SAMAL, SR, PillReminder, HDC, FPS and etc. – Schedule Tool-Chain Setting (Remote Linux Programming): Remote building, debugging, testing and deployment using VS2010 (already finished) Release Source Router API Draft Document Version (~2/22) Release refactoring/re-architectured Draft Version of Source Router Program (~2/28) Translate ICE-based code into ZeroMQ-based Code – After documentation, I try to do it and compare it to the original
19
19 Thank you
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.