Company LOGO An Introduction of JStorm
Longda Feng Alibaba Agenda Question and Answer. Basic Concept & Scenarios Background JStorm vs Storm Why start JStorm?
Who are we? JStorm Team was among one of the earliest that uses Storm in China. Storm 0.5.1/0.5.4/0.6.0/0.6.2/0.7.0/0.7.1 JStorm 0.7.1/0.9.0/0.9.1/0.9.2/0.9.3/… Our Duties Application Development JStorm System Development JStorm System Operation Longda Feng Alibaba
Who are Using JStorm Many small Chinese companies are using JStorm Longda Feng Alibaba
How Big? More than 3000 servers More than 3 trillion messages per day Longda Feng Alibaba
What is JStorm? JStorm is a distributed programming framework Similar to Hadoop MapReduce but designed for real-time/in-memory scenarios Users can build powerful distributed applications from very simple APIs Longda Feng Alibaba
What is JStorm? Redesigned Storm in Java. Proved stable running in huge clusters. Much faster Much more powerful Longda Feng Alibaba
Basic Conception Pipe-lined data processing Longda Feng Alibaba
Advantage 1 Easy learning: Simple Building Blocks: Topology/Spout/Bolt APIs Out of Box RPC/Fault-tolerance/Real-time Data Grouping & Combining Longda Feng Alibaba
Advantage 2 Excellent Scalability Horizontally Scalable DAG-based Adjustable parallelism of each component Longda Feng Alibaba
Stable Guarantees Fault-Tolerance No Single Point of Failure Nimbus HA Any Supervisor can be shutdown New worker will be spawned and replace the failed one automatically Longda Feng Alibaba
Accuracy Acking framework guarantees no lost of data Transaction framework guarantees data accuracy. Longda Feng Alibaba
Scenarios Stateless Computation All data come from Tuple Use Cases : Log Analysis Pipe-lined System Message converter Statistical Analysis Real-time Recommendation Algorithm Longda Feng Alibaba
Longda Feng Alibaba Why start JStorm Storm community is not as active as we’ve expected Tailored for enterprise environment Fixed critical bugs in Storm Provided professional technical support, improved app development pace. Reduced operational cost.
How Many Versions? 0.9.6(2014/9/22) (2014/9/14) (2014/8/27) (2014/8/15) 0.9.4(2014/7/18) (2014/5/31) (2014/5/10) (2014/4/8) 0.9.1(2014/1/24) 0.9.0(2013/12/30) 0.7.1(2013/4/28) Longda Feng Alibaba
JStorm is a superset of Storm The program run in Storm can run in JStorm without changing code Longda Feng Alibaba
More stable (1) -- nimbus HA Nimbus HA Dual-Nimbus HA Longda Feng Alibaba
More stable (2) -- RPC Netty supports 2 RPC modes Async Sync Sending speed keeps up with the receiving speed, therefore the data flow is more stable. Longda Feng Alibaba
More stable(3) – resource isolation Malicious Worker won’t mess up with others Supported CPU Isolation with cgroups Supported Memory Isolation Resources quota can be enforced on each group (before 0.9.5) Longda Feng Alibaba
More stable(4) -- Monitor Monitor every component in your Topology Many more metrics(70+) than storm Supported user-defined metrics Supported user-defined alerts Longda Feng Alibaba
More stable (5) – CPU usage Better utilizing CPU resource Improved disruptor implementation Drop CPU usage from 300% to 10% when processing queue is full Avoid CPU spin-waiting Relocating nextTuple/ack/fail work to a different thread Longda Feng Alibaba
More stable(6) -- more catch Add try-catch in any place. Nimbus/supervisor main thread Spout/bolt initialization/cleanup All IO operation, serialization/deserialization All ZK operation Longda Feng Alibaba
More stable(7) -- ZK Reduced unnecessary ZK usage : Removed useless watcher Increased ZK heartbeat frequency Detect failed worker without a full scan of the entire ZK directory Longda Feng Alibaba
More stable ( 8 ) -- other Improved GC Tuning. Guaranteed that all workers killed after kill command is issued Guaranteed single supervisor/nimbus per instance Avoid excessive use of local ports by Netty client 。。。 Longda Feng Alibaba
More powerful scheduler Balancing Tasks with regard of : CPU Memory Disk Net Longda Feng Alibaba
CPU assignment By default assign each worker a single CPU slot Application can be configured to utilize more slots Why : Some task creates extra threads to do other things in Alimama, one CPU slot doesn’t meet requirement Longda Feng Alibaba
Memory Usage Default worker memory is 2G Application can be configured to utilize more memory slots Why: In Alipay Mdrill application, Solr bolt will apply much more memory Longda Feng Alibaba
Smarter Balancing With JStorm Scheduler: Tasks that exchange data heavily tend to be assigned to the same worker to avoid networking cost. Longda Feng Alibaba
User Defined Scheduler User define task run one designated worker User can setting how many CPU slot /memory slot will be used Why : In Taobao TAE project, some bolts want to run in user defined-nodes Longda Feng Alibaba
Task on Different Node Task of one component can be scheduled to run on different nodes Why : In ALIPAY Mdrill, Solr bolt must run different node Longda Feng Alibaba
Task on Single Node All tasks can be scheduled to run on a single node. Why: In Taobao TLog, there are many small jobs, in order to reduce network cost, all task of one job must run on single node. Longda Feng Alibaba
Old Assignment “Last Assignment Policy” By default, a task will run on the machine it runs previous time Why : In Alibaba CDO, When restart one application, user wanted to reuse old workers Longda Feng Alibaba
Pluginable Be able to run on: Hadoop yarn(more stable than storm) Alibaba Apsara Clould System Alibaba Elastic Resource Pool Longda Feng Alibaba
Classloader Resolved application jar-confliction with JStorm Longda Feng Alibaba
More convenient UI More useful stats collected and displayed. Browse Worker Log in UI Longda Feng Alibaba
Support libjar Don’t need assembly all dependency jars into one jar Submit libjar with libjar parameter Support worker.classpath Longda Feng Alibaba
Faster 6 Servers (24core/98G) 18 Spout/18 Bolt/18 Acker Longda Feng Alibaba
JStorm 41W/S Sending Speed Longda Feng Alibaba
Storm 41W/S Sending Speed Longda Feng Alibaba
Why Faster Reduce memory-copying by zeroMq Dedicated Deserializing Thread Better Tuned Sampling Logic Better Tuned Acking Framework Better Tuned GC Longda Feng Alibaba
Other Improvement More than 100 improvements Fixed assign topology competition Reset rebalance/reassigned worker timeout as 4 minutes Graceful worker shutdown Improvement on thrift server Avoid mistakenly killing of worker while rebalancing jobs. 。。。。 Longda Feng Alibaba
More document Google-group: Wangwang : JStorm QQ : Laiwang: JStorm Longda Feng Alibaba
Join us Welcome to Join us Longda Feng Alibaba
Company LOGO 纪君祥( Longda Feng )