Presentation is loading. Please wait.

Presentation is loading. Please wait.

18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 1 Sphinx: A Scheduling Middleware for Data.

Similar presentations


Presentation on theme: "18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 1 Sphinx: A Scheduling Middleware for Data."— Presentation transcript:

1 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 1 Sphinx: A Scheduling Middleware for Data Intensive Applications on a Grid Richard Cavanaugh University of Florida Collaborators : Janguk In, Sanjay Ranka, Paul Avery, Laukik Chitnis, Gregory Graham (FNAL), Pradeep Padala, Rajendra Vippagunta, Xing Yan

2 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 2 The Problem of Grid Scheduling oDecentralised ownership oNo one controls the grid oHeterogeneous composition oDifficult to guarantee execution environments oDynamic availability of resources oUbiquitous monitoring infrastructure needed oComplex policies oIssues of trust oLack of accounting infrastructure oMay change with time oInformation gathering and processing is critical!

3 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 3 A Real Life Example oMerge two grids into a single multi-VO “inter-grid” oHow to ensure that oneither VO is harmed? oboth VOs actually benefit? othere are answers to questions like: o“With what probability will my job be scheduled and complete before my conference deadline?” oClear need for a scheduling middleware! FNAL Rice UI MIT UCSD UF UW Caltech UM UTA ANL IU UC LBL SMU OU BU BNL

4 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 4 Some Requirements for Effective Grid Scheduling oInformation requirements oPast & future dependencies of the application oPersistent storage of workflows oResource usage estimation oPolicies oExpected to vary slowly over time oGlobal views of job descriptions oRequest Tracking and Usage Statistics oState information important oResource Properties and Status oExpected to vary slowly with time oGrid weather oLatency measurement important oReplica management oSystem requirements oDistributed, fault-tolerant scheduling oCustomisability oInteroperability with other scheduling systems oQuality of Service

5 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 5 Incorporate Requirements into a Framework VDT Server VDT Client oAssume the GriPhyN Virtual Data Toolkit: oClient (request/job submission) oGlobus clients oCondor-G/DAGMan oChimera Virtual Data System oServer (resource gatekeeper) oGlobus services oRLS (Replica Location Service) oMonALISA Monitoring Service oetc ? ? ?

6 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 6 Incorporate Requirements into a Framework oAssume the GriPhyN Virtual Data Toolkit: oClient (request/job submission) oClarens Web Service oGlobus clients oCondor-G/DAGMan oChimera Virtual Data System oServer (resource gatekeeper) oMonALISA Monitoring Service oGlobus services oRLS (Replica Location Service) VDT Server VDT Client oFramework design principles: oInformation driven oFlexible client-server model oGeneral, but pragmatic and simple oImplement now; learn; extend over time oAvoid adding middleware requirements on grid resources oTake what is offered! ? Scheduler

7 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 7 The Sphinx Framework Sphinx Server VDT Client VDT Server Site MonALISA Monitoring Service Globus Resource Replica Location Service Condor-G/DAGMan Request Processing Data Warehouse Data Management Information Gathering Sphinx Client Chimera Virtual Data System Clarens WS Backbone

8 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 8 Sphinx Scheduling Server oFunctions as the Nerve Centre oData Warehouse oPolicies, Account Information, Grid Weather, Resource Properties and Status, Request Tracking, Workflows, etc oControl Process oFinite State Machine oDifferent modules modify jobs, graphs, workflows, etc and change their state oFlexible oExtensible Sphinx Server Control Process Job Execution Planner Graph Reducer Graph Tracker Job Predictor Graph Data Planner Job Admission Control Message Interface Graph Predictor Graph Admission Control Data Warehouse Data Management Information Gatherer

9 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 9 Policy Constraints oDefined by Resource Providers oActual grid sites (resource centres) oVO management oApplied to Request Submitters oVO, group, user, or even a proxy request (e.g. workflow) oValid over a Period of Time oCan be dynamic (e.g. periodic) or constant oGlobal accounting and book-keeping is necessary

10 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 10 Quality of Service oFor grid computing to become economically viable, a Quality of Service is needed o“Can the grid possibly handle my request within my required time window?” oIf not, why not? When might it be able to accommodate such a request? oIf yes, with what probability? oBut, grid computing today typically: oRelies on a “greedy” job placement strategies oWorks well in a resource rich (user poor) environment oAssumes no correlation between job placement choices oProvides no QoS

11 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 11 Quality of Service oAs a grid becomes resource limited, oQoS becomes even more important! o“greedy” strategies may not be a good choice oStrong correlation between job placement choices oSphinx is designed to provide QoS through time dependent, global views of oRequests (workflows, jobs, allocation, etc) oPolicies oResources

12 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 12 Resource Usage Estimation oUser Requirements oUpper limits on CPU, memory, storage, bandwidth usage oDomain Specific Knowledge oApplications are often known to depend logarithmically, linearly, etc on certain input parameters, data size or type oHistorical Estimates oRecord the performance of all applications oStatistically estimate resource usage within some confidence level

13 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 13 Data Management oSmart Replication: oGraph based oExamine and insert replication nodes to minimise overall completion time oDistribute and collect required data oParticularly useful in data parallelism o“Hot Spot” based oMonitor current and historical data access patterns and replicate to optimise future access

14 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 14 Data Management oSmart Replication: oGraph based oExamine and insert replication nodes to minimise overall completion time oDistribute and collect required data oParticularly useful in data parallelism o“Hot Spot” based oMonitor current and historical data access patterns and replicate to optimise future access

15 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 15 Early Sphinx Prototype Test Results oSimple sanity checks o120 canonical virtual data workflows submitted to US-CMS Grid oRound-robin strategy oEqually distribute work to all sites oUpper-limit strategy oMakes use of global information (site capacity) oThrottle jobs using just-in-time planning o40% better throughput (given grid topology) oConclusion: Prototype is working!

16 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 16 Some Current and Future Activities oPolicy Based Scheduling oQuality of Service oGraph Partitioning oData Parallelism oPrediction Module oUseful Views and Fusion of Monitoring Data

17 18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 17 Conclusions oScheduling on a grid has unique requirements oInformation oSystem oDecisions based on global views providing a Quality of Service are important oParticularly in a resource limited environment oSphinx is an extensible, flexible grid middleware which oAlready implements many required features for effective global scheduling oProvides an excellent “workbench” for future activities!


Download ppt "18.09.2003Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 1 Sphinx: A Scheduling Middleware for Data."

Similar presentations


Ads by Google