Ajou University, South Korea Chameleon: A Resource Scheduler in A Data Grid Environment Sang Min Park Jai-Hoon Kim Ajou University South Korea
Ajou University, South Korea 2 Contents Introduction to Data Grid Related Works Scheduling Model Scheduler Implementation Testbed and Application Results Conclusions
Ajou University, South Korea 3 Introduction to Data Grid Data Grid Motivations Petabyte scale data production Distributed data storage to store parts of data Distributed computing resources which process the data Two Most Important Approaches for Data Grid Secure, reliable, and efficient data transport protocol (ex. GridFTP) Replication (ex. Replica catalog) Replication Large size files are partially replicated among sites Reduce data access time Application Scheduling, Dynamic replication issues are emerging
Ajou University, South Korea 4 Related Works Data Grid Replica catalog – mapping from logical file name to physical instance GridFTP – Secure, reliable, and efficient file transfer protocol Job Scheduling Various scheduling algorithms for computational Grid Application Level Scheduling (AppLes) Large data collection has not been concerned Job Scheduling in Data Grid Roughly analytical and simulation studies are presented Our works define more in-depth scheduling model
Ajou University, South Korea 5 Scheduling Model - Assumptions Assumptions Site has both data storage and computing facilities Files are replicated at part of Grid sites Each site has different amount of computational capability Grid users request job execution through Job schedulers
Ajou University, South Korea 6 Scheduling Model - System Factors Dynamic system factors - Factors change over time Network bandwidth Data transfer time is proportional to network bandwidth NWS- tool for measuring and forecasting network bandwidth Available computing nodes Determines execution time of jobs Decided according to job load on a site System attributes Machine architecture (clusters, MPPs, etc) Processor speed, Available memory, I/O performance, etc.
Ajou University, South Korea 7 Scheduling Model - System Factors Application specific factors - Unique factors Data Grid applications have Size of input data (replica) If not in the computing site, data fetch is needed Much time will be consumed to transfer large size data Size of application code Application code should be migrated to sites which perform computation Not critical to the overall performance (small size) Size of produced output data When the computing job takes place at the remote site, result data should be returned back to the local Strongly related to the size of input data
Ajou University, South Korea 8 Scheduling Model - application scenarios The model consists of 5 distinct application scenarios 1.Local Data and Local Execution 2.Local Data and Remote Execution 3.Remote Data and Local Execution 4.Remote Data and Same Remote Execution 5.Remote Data and Different Remote Execution
Ajou University, South Korea 9 Scheduling Model - application scenarios Terms in the scenarios ParameterMeaning Number of available computing nodes at the site Size of input data (replica) Size of application codes Size of produced output data Bandwidth of WAN connection between sites Bandwidth of LAN connection between nodes Expected execution time of jobs
Ajou University, South Korea 10 Scheduling Model - application scenarios 1.Local Data and Local Execution Input data (replica) is located in local, and processing is performed with local available processors Data in move consists of Input data (replica) Application code Output data Cost consists of 1. 1.Data transfer time between master and computing nodes via LAN 2. 2.Job execution time using local processors
Ajou University, South Korea 11 Scheduling Model - application scenarios 2. Local Data and Remote Execution Locally copied replica is transferred to remote computation site Cost consists of 1. 1.Data (input+codes+output) movement time via WAN between local and remote site 2. 2.Data movement time via LAN in a remote site 3. 3.Job execution time on a remote site
Ajou University, South Korea 12 Scheduling Model - application scenarios 3. Remote Data and Local Execution Remote replica is copied into local site, and processing is performed on local Cost consists of 1. 1.Input data movement time via WAN between local and remote site 2. 2.Data movement time via LAN in a local site 3. 3.Job execution time on a local processors
Ajou University, South Korea 13 Scheduling Model - application scenarios 4. Remote Data and Same Remote Execution Remote site having replica performs computation Cost consists of 1. 1.Data (code+output) movement time via WAN between local and remote site 2. 2.Data movement time via LAN in a remote site 3. 3.Job execution time on a remote site
Ajou University, South Korea 14 Scheduling Model - application scenarios 5. Remote Data and Different Remote Execution Remote site j performs computation with replica copied from remote site i Cost consists of 1. 1.Input replica movement time via WAN between remote site i and j 2. 2.Data (codes + output) movement time via WAN between local and remote j 3. 3.Data movement time via LAN in a remote site j 4. 4.Job execution time in a remote site j
Ajou University, South Korea 15 Scheduling Model - scheduler Operations of the scheduler 1.Predict the response time of each scenario 2.Compare the response time of scenarios 3.Choose the best scenario and sites holding data and to perform job execution 4.Requests data movement and job execution
Ajou University, South Korea 16 Scheduler Implementation Develop scheduler prototype, called Chameleon, for evaluating the scheduling model Built on top of services provided by Globus GRAM MDS GridFTP Replica Catalog NWS is used for measuring and forecasting network bandwidth Scheduling algorithms are based on the scheduling models presented
Ajou University, South Korea 17 Testbed for experiments SiteLocationNumber of proc.Local Scheduler Ajou UniversityS.Korea8 PBS Yonsei Univ. 1S.Korea12 PBS Yonsei Univ. 2S.Korea12 PBS KISTIS.Korea36 LSF KUTS.Korea6 PBS Chonbuk Univ.S.Korea1 Fork Pusan Univ.S.Korea24 PBS POSTECHS.Korea8 PBS AISTJapan10 SGE
Ajou University, South Korea 18 Applications Gene sequence comparison applications (Bioinformatics) Computationally intensive analysis on the large size protein database Bio-scientists predict structure and functions of newly found protein by comparing it with well known protein database The size of database reaches over 500 MB There are various versions of protein database Large databases are replicated in Data Grid Two well-known applications, Blast and FASTA, are executed
Ajou University, South Korea 19 Applications - parameters ParametersPSI-BLASTFASTA Size of Input replica (Protein Database) 502 MB Size of output data10 MB200 MB Size of application codes7 MB1 MB
Ajou University, South Korea 20 Experimental Results (1) Replication scenario Results when executing PSI-BLAST
Ajou University, South Korea 21 Experimental Results (2) Results when executing FASTA in the above replication scenario Results on the previous slide
Ajou University, South Korea 22 Experimental Results (3) No replication takes place Results when executing PSI- BLAST
Ajou University, South Korea 23 Experimental Results (4) Number of Replica Sites with Replica 1Local 2Local, E 3Local, E, D 4Local, E, D, F 5Local, E, D, F, G 6Local, E, D, F, G, H 7Local, E, D, F, G, H, B 8Local, E, D, F, G, H, B, A 9Local, E, D, F, G, H, B, A, C Increasing the number of replica Decreasing response time
Ajou University, South Korea 24 Conclusions Job scheduling models for Data Grid The models consist of 5 distinct scenarios Scheduler prototype, called Chameleon, is developed which is based on the presented scheduling models Perform meaningful experiments with Chameleon on a constructed Grid testbed We achieve better performance by considering data locations as well as computational capabilities
Ajou University, South Korea 25 References ANTZ: ApGrid: B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, S. Tuecke. “Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing,” IEEE Mass Storage Conference, Mark Baker, Rajkumar Buyya and Domenico Laforenza. “The Grid: International Efforts in Global Computing,” International Conference on Advances in Infrastructure for E-Business, Science, and Education on the Internet, SSGRR2000, L'Aquila, Italy, July F. Berman and R. Wolski. “The AppLes project: A status report,” Proceedings of the 8th NEC Research Symposium, Berlin, Germany, May Rajkumar Buyya, Kim Branson, Jon Giddy and David Abramson. “The Virtual Laboratory: A Toolset for Utilising the World-Wide Grid to Design Drugs,” 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), Berlin, Germany, May CERN DataGrid Project: Ann Chervenak, Ian Foster, Carl Kesselman, Charles Salisbury and Steven Tuecke. “The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets,” Journal of Network and Computer Applications, 23: , Dirk Düllmann, Wolfgang Hoschek, Javier Jean-Martinez, Asad Samar, Heinz Stockinger and Kurt Stockinger. “Models for Replica Synchronisation and Consistency in a Data Grid,” 10th IEEE Symposium on High Performance and Distributed Computing (HPDC-10), San Francisco, California, August I. Foster and C. Kesselman. “The Grid: Blueprint for a New Computing Infrastructure,” Morgan Kaufmann, I. Foster, C. Kesselman and S. Tuecke. “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International J. Supercomputer Applications, 15(3), Cynthia Gibas. “Developing Bioinformatics Computer Skills,” O’REILLY, April The Globus Project:
Ajou University, South Korea 26 References Leanne Guy, Erwin Laure, Peter Kunszt, Heinz Stockinger, and Kurt Stockinger. “Replica management in data grids,” Technical report, Global Grid Forum Informational Document, GGF5, Edinburgh, Scotland, July Wolfgang Hoschek, Javier Jaen-Martinez, Asad Samar, Heinz Stockinger and Kurt Stockinger. “Data Management in an International Data Grid Project,” 1st IEEE/ACM International Workshop on Grid Computing (Grid'2000), Bangalore, India, Dec Kavitha Ranganathan and Ian Foster. “Decoupling Computation and Data Scheduling in Distributed Data- Intensive Applications,” 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11), Edinburgh, Scotland, July Kavitha Ranganathan and Ian Foster. “Design and Evaluation of Dynamic Replication Strategies for a High Performance Data Grid,” International Conference on Computing in High Energy and Nuclear Physics, Beijing, September Kavitha Ranganathan and Ian Foster. “Identifying Dynamic Replication Strategies for a High Performance Data Grid,” International Workshop on Grid Computing, Denver, November Heinz Stockinger, Kurt Stockinger, Erich Schikuta and Ian Willers. “Towards a Cost Model for Distributed and Replicated Data Stores,” 9th Euromicro Workshop on Parallel and Distributed Processing PDP 2001, Mantova, Italy, February S. Vazhkudai, S. Tuecke and I. Foster. “Replica Selection in the Globus Data Grid,” Proceedings of the First IEEE/ACM International Conference on Cluster Computing and the Grid (CCGRID 2001), Brisbane, Australia, May Rich Wolski, Neil Spring, and Jim Hayes. “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing,” Journal of Future Generation Computing Systems, Volume 15, Numbers 5-6, pp , October 1999.