Download presentation
Presentation is loading. Please wait.
Published byMarcel Tillis Modified over 9 years ago
1
Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of California, San Diego 9500 Gilman Drive, MC 0505 La Jolla, CA 92093-0505, U.S.A. {jianwu, crawl, altintas}@sdsc.edu Presentation by Woodrow H. Edwards
2
Kepler Open source scientific workflow system Executable model of the many stages transforming data into the desired result in a scientific domain Scientific domains using Kepler Bioinformatics, Computational Chemistry, Ecoinformatics, and Geoinformatics All have large data sets and require a lot of computation
3
Kepler User friendly GUI to connect data sources to built-in procedures or independent applications with the ease of drag and drop Promotes component reuse and sharing Written in Java Designed to run on clusters, grids, or the Web A nice match to integrate with MapReduce
4
Kepler Components of a Kepler workflow Actors ○ Independently process data ○ Atomic or composite ○ Ports input and ouput data (tokens) or signals ○ Could be R or MATLAB scripts or an outside application Channels ○ Link actors ○ Carry data or other signals Directors ○ Specify when actors run ○ Sequential (SPD) or parallel (PN)
5
Figure 1: Example Kepler workflow [2]
6
Hadoop Open source implementation of MapReduce map(in_key, in_value) (out_key, intermediate_value) list reduce(out_key, intermediate_value list) out_value list HDFS Data partitioning, scheduling, load balancing, and fault tolerance Also written in Java
7
Kepler + Hadoop Implement a MapReduce composite actor Map actor ○ MapInputKey: in_key ○ MapInputValue: in_value ○ MapOutputList: (out_key, intermediate_value) list Reduce actor ○ ReduceInputKey: out_key ○ ReduceInputList: intermediate_value list ○ ReduceOutputValue: out_value list Figure 2: (a) MapReduce composite actor. (b) Map actor. (c) Reduce actor. [1]
8
Kepler + Hadoop Figure 3: Hierarchical execution of MapReduce composite actor with Hadoop [1]
9
Kepler + Hadoop Figure 4: (a) Word Count workflow. (b) Map actor. (c) Reduce actor. (d) IterateOverArray actor. [1]
10
Kepler + Hadoop Takes 10 to 15% longer over native Hadoop MapReduce Makes up for it in ease of implementation Scientist can use MapReduce without needing to know the framework They only need to know where they can benefit from parallelism in their workflow
11
References 1. J. Wang, D. Crawl, and I. Altintas. Kepler + Hadoop: A General Architecture Facilitating Data- Intensive Applications in Scientific Workflow Systems. In WORKS 09, ACM, Nov. 2009. 2. The Kepler Project. https://kepler-project.org.https://kepler-project.org 3. The Apache Hadoop Project. http://hadoop.apache.org. http://hadoop.apache.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.