18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 1 Sphinx: A Scheduling Middleware for Data.

Slides:



Advertisements
Similar presentations
Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
Advertisements

OptorSim: A Replica Optimisation Simulator for the EU DataGrid W. H. Bell, D. G. Cameron, R. Carvajal, A. P. Millar, C.Nicholson, K. Stockinger, F. Zini.
The National Grid Service and OGSA-DAI Mike Mineter
Distributed Systems basics
SLA-Oriented Resource Provisioning for Cloud Computing
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Services and Operations in Polish NGI M. Radecki,
A Computation Management Agent for Multi-Institutional Grids
Sphinx Server Sphinx Client Data Warehouse Submitter Generic Grid Site Monitoring Service Resource Message Interface Current Sphinx Client/Server Multi-threaded.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Workload Management Massimo Sgaravatto INFN Padova.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
Sergey Belov, Tatiana Goloskokova, Vladimir Korenkov, Nikolay Kutovskiy, Danila Oleynik, Artem Petrosyan, Roman Semenov, Alexander Uzhinskiy LIT JINR The.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
1 Resource Management of Large- Scale Applications on a Grid Laukik Chitnis and Sanjay Ranka (with Paul Avery, Jang-uk In and Rick Cavanaugh) Department.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
Grid Leadership Avery –PI of GriPhyN ($11 M ITR Project) –PI of iVDGL ($13 M ITR Project) –Co-PI of CHEPREO –Co-PI of UltraLight –President of SESAPS Ranka.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
Main Sphinx Design Concepts There are two primary design components which comprise Sphinx The Database Warehouse The Control Process The Database Warehouse.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Advanced Techniques for Scheduling, Reservation, and Access Management for Remote Laboratories Wolfgang Ziegler, Oliver Wäldrich Fraunhofer Institute SCAI.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
CPT Demo May Build on SC03 Demo and extend it. Phase 1: Doing Root Analysis and add BOSS, Rendezvous, and Pool RLS catalog to analysis workflow.
CHEP03 Mar 25Mary Thompson Fine-grained Authorization for Job and Resource Management using Akenti and Globus Mary Thompson LBL,Kate Keahey ANL, Sam Lang.
Tools for collaboration How to share your duck tales…
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
ServiceSs, a new programming model for the Cloud Daniele Lezzi, Rosa M. Badia, Jorge Ejarque, Raul Sirvent, Enric Tejedor Grid Computing and Clusters Group.
VO-Ganglia Grid Simulator Catalin Dumitrescu, Mike Wilde, Ian Foster Computer Science Department The University of Chicago.
Cracow Grid Workshop ‘06 17 October 2006 Execution Management and SLA Enforcement in Akogrimo Antonios Litke Antonios Litke, Kleopatra Konstanteli, Vassiliki.
CHEP 2004 Grid Enabled Analysis: Prototype, Status and Results (on behalf of the GAE collaboration) Caltech, University of Florida, NUST, UBP Frank van.
Grid Scheduler: Plan & Schedule Adam Arbree Jang Uk In.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
The Grid Effort at UF Presented by Craig Prescott.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Conference name Company name INFSOM-RI Speaker name The ETICS Job management architecture EGEE ‘08 Istanbul, September 25 th 2008 Valerio Venturi.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
Grid and Cloud Computing Alessandro Usai SWITCH Sergio Maffioletti Grid Computing Competence Centre - UZH/GC3
Distributed Services for Grid Enabled Data Analysis Distributed Services for Grid Enabled Data Analysis.
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
Resource Brokering on Complex Grids EUROGRID and GRIP Presented by John Brooke ESNW October 3/4 UK/Japan N+N.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
06/08/10 P-GRADE Portal and MIMOS P-GRADE portal developments in the framework of the MIMOS-SZTAKI joint project Mohd Sidek Salleh MIMOS Berhad Zoltán.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
+ Support multiple virtual environment for Grid computing Dr. Lizhe Wang.
Computing in High Energy Physics , Interlaken Switzerland 1 Sphinx: A Scheduling Middleware for Data Intensive Applications on a Grid Jang.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
BDTS and Its Evaluation on IGTMD link C. Chen, S. Soudan, M. Pasin, B. Chen, D. Divakaran, P. Primet CC-IN2P3, LIP ENS-Lyon
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
1 Grid2003 Monitoring, Metrics, and Grid Cataloging System Leigh GRUNDHOEFER, Robert QUICK, John HICKS (Indiana University) Robert GARDNER, Marco MAMBELLI,
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Grid Computing.
Distributed Services for Grid Distributed Services for Grid
Presentation transcript:

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 1 Sphinx: A Scheduling Middleware for Data Intensive Applications on a Grid Richard Cavanaugh University of Florida Collaborators : Janguk In, Sanjay Ranka, Paul Avery, Laukik Chitnis, Gregory Graham (FNAL), Pradeep Padala, Rajendra Vippagunta, Xing Yan

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 2 The Problem of Grid Scheduling oDecentralised ownership oNo one controls the grid oHeterogeneous composition oDifficult to guarantee execution environments oDynamic availability of resources oUbiquitous monitoring infrastructure needed oComplex policies oIssues of trust oLack of accounting infrastructure oMay change with time oInformation gathering and processing is critical!

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 3 A Real Life Example oMerge two grids into a single multi-VO “inter-grid” oHow to ensure that oneither VO is harmed? oboth VOs actually benefit? othere are answers to questions like: o“With what probability will my job be scheduled and complete before my conference deadline?” oClear need for a scheduling middleware! FNAL Rice UI MIT UCSD UF UW Caltech UM UTA ANL IU UC LBL SMU OU BU BNL

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 4 Some Requirements for Effective Grid Scheduling oInformation requirements oPast & future dependencies of the application oPersistent storage of workflows oResource usage estimation oPolicies oExpected to vary slowly over time oGlobal views of job descriptions oRequest Tracking and Usage Statistics oState information important oResource Properties and Status oExpected to vary slowly with time oGrid weather oLatency measurement important oReplica management oSystem requirements oDistributed, fault-tolerant scheduling oCustomisability oInteroperability with other scheduling systems oQuality of Service

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 5 Incorporate Requirements into a Framework VDT Server VDT Client oAssume the GriPhyN Virtual Data Toolkit: oClient (request/job submission) oGlobus clients oCondor-G/DAGMan oChimera Virtual Data System oServer (resource gatekeeper) oGlobus services oRLS (Replica Location Service) oMonALISA Monitoring Service oetc ? ? ?

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 6 Incorporate Requirements into a Framework oAssume the GriPhyN Virtual Data Toolkit: oClient (request/job submission) oClarens Web Service oGlobus clients oCondor-G/DAGMan oChimera Virtual Data System oServer (resource gatekeeper) oMonALISA Monitoring Service oGlobus services oRLS (Replica Location Service) VDT Server VDT Client oFramework design principles: oInformation driven oFlexible client-server model oGeneral, but pragmatic and simple oImplement now; learn; extend over time oAvoid adding middleware requirements on grid resources oTake what is offered! ? Scheduler

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 7 The Sphinx Framework Sphinx Server VDT Client VDT Server Site MonALISA Monitoring Service Globus Resource Replica Location Service Condor-G/DAGMan Request Processing Data Warehouse Data Management Information Gathering Sphinx Client Chimera Virtual Data System Clarens WS Backbone

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 8 Sphinx Scheduling Server oFunctions as the Nerve Centre oData Warehouse oPolicies, Account Information, Grid Weather, Resource Properties and Status, Request Tracking, Workflows, etc oControl Process oFinite State Machine oDifferent modules modify jobs, graphs, workflows, etc and change their state oFlexible oExtensible Sphinx Server Control Process Job Execution Planner Graph Reducer Graph Tracker Job Predictor Graph Data Planner Job Admission Control Message Interface Graph Predictor Graph Admission Control Data Warehouse Data Management Information Gatherer

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 9 Policy Constraints oDefined by Resource Providers oActual grid sites (resource centres) oVO management oApplied to Request Submitters oVO, group, user, or even a proxy request (e.g. workflow) oValid over a Period of Time oCan be dynamic (e.g. periodic) or constant oGlobal accounting and book-keeping is necessary

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 10 Quality of Service oFor grid computing to become economically viable, a Quality of Service is needed o“Can the grid possibly handle my request within my required time window?” oIf not, why not? When might it be able to accommodate such a request? oIf yes, with what probability? oBut, grid computing today typically: oRelies on a “greedy” job placement strategies oWorks well in a resource rich (user poor) environment oAssumes no correlation between job placement choices oProvides no QoS

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 11 Quality of Service oAs a grid becomes resource limited, oQoS becomes even more important! o“greedy” strategies may not be a good choice oStrong correlation between job placement choices oSphinx is designed to provide QoS through time dependent, global views of oRequests (workflows, jobs, allocation, etc) oPolicies oResources

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 12 Resource Usage Estimation oUser Requirements oUpper limits on CPU, memory, storage, bandwidth usage oDomain Specific Knowledge oApplications are often known to depend logarithmically, linearly, etc on certain input parameters, data size or type oHistorical Estimates oRecord the performance of all applications oStatistically estimate resource usage within some confidence level

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 13 Data Management oSmart Replication: oGraph based oExamine and insert replication nodes to minimise overall completion time oDistribute and collect required data oParticularly useful in data parallelism o“Hot Spot” based oMonitor current and historical data access patterns and replicate to optimise future access

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 14 Data Management oSmart Replication: oGraph based oExamine and insert replication nodes to minimise overall completion time oDistribute and collect required data oParticularly useful in data parallelism o“Hot Spot” based oMonitor current and historical data access patterns and replicate to optimise future access

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 15 Early Sphinx Prototype Test Results oSimple sanity checks o120 canonical virtual data workflows submitted to US-CMS Grid oRound-robin strategy oEqually distribute work to all sites oUpper-limit strategy oMakes use of global information (site capacity) oThrottle jobs using just-in-time planning o40% better throughput (given grid topology) oConclusion: Prototype is working!

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 16 Some Current and Future Activities oPolicy Based Scheduling oQuality of Service oGraph Partitioning oData Parallelism oPrediction Module oUseful Views and Fusion of Monitoring Data

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 17 Conclusions oScheduling on a grid has unique requirements oInformation oSystem oDecisions based on global views providing a Quality of Service are important oParticularly in a resource limited environment oSphinx is an extensible, flexible grid middleware which oAlready implements many required features for effective global scheduling oProvides an excellent “workbench” for future activities!