Presentation is loading. Please wait.

Presentation is loading. Please wait.

NAREGI Middleware Beta 1 and Beyond Satoshi Matsuoka Professor, Global Scientific Information and Computing Center, Deputy Director, NAREGI Project Tokyo.

Similar presentations


Presentation on theme: "NAREGI Middleware Beta 1 and Beyond Satoshi Matsuoka Professor, Global Scientific Information and Computing Center, Deputy Director, NAREGI Project Tokyo."— Presentation transcript:

1 NAREGI Middleware Beta 1 and Beyond Satoshi Matsuoka Professor, Global Scientific Information and Computing Center, Deputy Director, NAREGI Project Tokyo Institute of Technology / NII http://www.naregi.org

2 ● Publication of scientific results from academina Human Resource Development and strong organization NAREGI Middleware Virtual Organization For science CyberScience Infrastructure for Advanced Science (by NII) CyberScience Infrastructure for Advanced Science (by NII) To Innovate Academia and Industry UPKI ★ ★ ★ ★ ★ ★ ★ ☆ Super-sinet: a next generation network infrastructure supported by NII and 7 National Computer Centers CyberScience Infrastructure 北海道大学 東北大学 東京大学 NIINII 名古屋大学 京都大学 大阪大学 九州大学 (東京工業大学、早稲田大学、高 エネルギー加速器研究機構等) Scientific Repository Industry Liaison and Social Benefit Global Contribution

3 Hokkaido University Information Initiative Center HITACHI SR11000 5.6 Teraflops Tohoku University Information Synergy Center NEC SX-7 NEC TX7/AzusA University of Tokyo Information Technology Center HITACHI SR8000 HITACHI SR11000 6 Teraflops Others (in institutes) Nagoya University Information Technology Center FUJITSU PrimePower2500 11 Teraflops Osaka University CyberMedia Center NEC SX-5/128M8 HP Exemplar V2500/N 1.2 Teraflops Kyoto University Academic Center for Computing and Media Studies FUJITSU PrimePower2500 10 Teraflops Kyushu University Computing and Communications Center FUJITSU VPP5000/64 IBM Power5 p595 5 Teraflops University Computer Centers (excl. National Labs) circa Spring 2006 10Gbps SuperSINET Interconnecting the Centers Tokyo Inst. Technology Global Scientific Information and Computing Center 2006 NEC/SUN TSUBAME 85 Teraflops University of Tsukuba FUJITSU VPP5000 CP-PACS 2048 (SR8000 proto) National Inst. of Informatics SuperSINET/NAREGI Testbed 17 Teraflops ~60 SC Centers in Japan - 10 Petaflop center by 2011

4 The Titech TSUBAME Production Supercomputing Cluster, Spring 2006 ClearSpeed CSX600 SIMD accelerator 360 boards, 35TeraFlops(Current) Storage 1 Petabyte (Sun “Thumper”) 0.1Petabyte (NEC iStore) Lustre FS, NFS (v4?) 500GB 48disks 500GB 48disks 500GB 48disks NEC SX-8 Small Vector Nodes (under plan) Unified IB network Sun Galaxy 4 (Opteron Dual core 8-Way) 10480core/655Nodes 50.4TeraFlops OS Linux (SuSE 9, 10) NAREGI Grid MW Voltaire ISR9288 Infiniband 10Gbps x2 (xDDR) x ~700 Ports 10Gbps+External Network 7 th on June2006 Top500, 38.18 TFlops

5 Titech TSUBAME ~80+ racks 350m2 floor area 1.2 MW (peak)

6 High Density GSIC Main Clusters (256 processers) x 2 systems in just 5 cabinets Titech Grid is a large-scale, campus-wide, pilot commodity Grid deployment for next generation E-Science application development within the Campuses of Tokyo Institute of Technology (Titech) High-density blade PC server systems consisting of 800 high-end PC processors installed at 13 locations throughout the Titech Campuses, interconnected via the Super TITANET backbone. The first campus-wide pilot Grid system deployment in Japan, providing next-generation high-performance “ virtual parallel computer ” infrastructure for high-end computational E-Science. 24-processor Satellite Systems @ each department ×12 systems 2002/3-2006/3: Titech “ TeraScale ” Campus Grid - System Image Super SINET (10 Gbps MOE National Backbone Network) to other Grids Grid-wide Single System Image via Grid middleware: Globus, Ninf-G, Condor, NWS, … > 1300 processors, > 2.5 TeraFlops, 15 clusters and 4 Supecomputers/Servers Super TITANET (1-4Gbps) Suzukake-dai Campus Oo-okayama Campus NEC Express 5800 Series Blade Servers 30km

7 TSUBAME Physical Installation 3 rooms (600m 2 ), 350m 2 service area 76 racks incl. network & storage, 46.3 tons –10 storage racks 32 AC units, 12.2 tons Total 58.5 tons (excl. rooftop AC heat exchangers) Max 1.2 MWatts ~3 weeks construction time 1 st Floor 2 nd Floor A 2 nd Floor B Titech Grid Cluster TSUBAME TSUBAME & Storage

8 TSUBAME in Production June 17-24 2006 (phase 2) ~4700 CPUs n1sge batch Inter- active

9 Titech Supercomputing Grid 2006 ~13,000 CPUs, 90 TeraFlops, ~26 TeraBytes Mem, ~1.1 Petabytes Disk CPU Cores: x86: TSUBAME (~10600), Campus Grid Cluster (~1000), COE-LKR cluster (~260), WinCCS (~300) + ClearSpeed CSX600 (720 Chips) すずかけ台 大岡山 数理・計算 C ( 予定 ) 計算工学 C ( 予定 ) 1.2km 35km, 10Gbps Campus Grid Cluster COE-LKR ( 知識 ) cluster TSUBAME WinCCS

10 Super SINET provides lambda 10 Gbps Backbone

11 Grid for enabling Collaborative Computing Researchers Experimental Devices Super Computer Data Base Server Experiments using special devices Analysis using Super Computers Search in Data Bases Researchers Overseas Lab B University A Domestic Lab C Super SINET Security is a key issue to be solved! A Virtual Organization To realize heterogeneous large scale computational environment To share Large and expensive devices and data bases

12 1TF 10TF 100TF 1PF 200220062008201020122004 Earth Simulator 40TF (2002) Next Gen “PetaGrid” 1PF (2010) 2010 Titech “PetaGrid” => Interim 200TeraFlops @ 2008 => “Petascale” @ 2010 NORM for a typical Japanese center? →HPC Software is the key! 10PF “Keisoku” >10PF(2011) Titech Supercomputing Campus Grid (incl TSUBAME )~90TF (2006) Korean Machine >100TF (2006~7) Chinese National Machine >100TF (2007~8) US Petascale (2007~8) US HPCS (2010) BlueGene/L 360TF(2005) TSUBAME Upgrade >200TF (2008-2H) US 10P (2011~12?) 1.3TF Scaling Towards Petaflops… KEK 59TF BG/L+SR11100 Titech Campus Grid

13 NAREGI is/has/will… Is THE National Research Grid in Japan –Part of CSI and future Petascale initiatives –METI “Business Grid” counterpart 2003-2005 Has extensive commitment to WS/GGF-OGSA –Entirely WS/Service Oriented Architecture –Set industry standards e.g. 1st impl. of OGSA-EMS Will work with EU/US/AP counterparts to realize a “global research grid” –Various talks have started, incl. SC05 interoperability meeting Will deliver first OS public beta in May 2006 –To be distributed @ GGF17/GridWorld in Tokyo

14 NAREGI is not/doesn’t/won’t… Is NOT an academic research project –All professional developers from Fujitsu, NEC, Hitachi, (No students) –IMPORTANT for Vendor buy-in and tech transfer Will NOT develop all software by itself –Will rely on external components in some cases –Must be WS and other industry standards compliant –IMPORTANT for Vendor buy-in and tech transfer Will NOT hinder industry adoption at all costs –Intricate open source copyright and IP policies –We want companies to save/make money using NAREGI MW –IMPORTANT for Vendor buy-in and tech transfer

15 Standards: Setting & Adoption Critical to Grid Adoption & Longevity Project lifetimes limited Most production centers, commercial entities, and users won’t commit their time & money unless longevity guaranteed –A few brave ones… (high-risk, high-return) –Success legacies overtold Responsible Action for major research grid infrastructural projects (e.g. NAREGI) to guarantee longevity beyond project lifetime Commercial Adoption by IT vendors: ISPs, ISVs, ASPs, etc. key for such longevity –But how do we achieve that?

16 Approach: “Bait and Catch” --- Typical Japanese National Project Strategy --- IT vendors involved (seriously) from Day 1 Public entity project benefactors should not solely expend its grants to sustain itself –Most money going to IT vendors to have them commit OTOH, vendors are later pressured by the government not to make direct profit (Exclusive IPR acquisition or) standardization used as bait for commercial adoption –Thus, even a research grid infrastructural project MUST FOCUS on standards, above all other priorities Then, standards & supporting technologies are DIRECTLY transferred to IT vendors –Productization and support => longevity => adoption !

17 A Picture is Worth a 1000 Words…

18 FY2003FY2004FY2005FY2006FY2007 UNICORE-based R&D Framework OGSA/WSRF-based R&D Framework Utilization of NAREGI NII-IMS Testbed Utilization of NAREGI-Wide Area Testbed Prototyping NAREGI Middleware Components Development and Integration of αrelease Development and Integration of βrelease Evaluation on NAREGI Wide-area Testbed Development of OGSA-based Middleware Verification & Evaluation Of Ver. 1 Apply Component Technologies to Nano Apps and Evaluation Evaluation of α release in NII-IMS Testbed Evaluation of βRelease By IMS and other Collaborating Institutes Deployment of β αRelease (private) βRelease (public) Version 1.0 Release Midpoint Evaluation NAREGI Grid Middleware Roadmap Series of βReleases (public)

19 NAREGI Software Stack (beta 1 2006) - WS(RF) based (OGSA) SW Stack - Computing Resources and Virtual Organizations NII IMS Research Organizations Major University Computing Centers ( WSRF (GT4+Fujitsu WP1) + GT4 and other services) SuperSINET Grid-Enabled Nano-Applications (WP6) Grid PSE Grid Programming (WP2) -Grid RPC -Grid MPI Grid Visualization Grid VM (WP1) Packaging Distributed Information Service (CIM) Grid Workflow (WFML (Unicore+ WF)) Super Scheduler Grid Security and High-Performance Grid Networking (WP5) Data (WP4) WP3 WP1

20 List of NAREGI “Standards” (beta 1 and beyond) GGF Standards and Pseudo-standard Activities set/employed by NAREGI GGF “ OGSA CIM profile ” GGF AuthZ GGF DAIS GGF GFS (Grid Filesystems) GGF Grid CP (GGF CAOPs) GGF GridFTP GGF GridRPC API (as Ninf-G2/G4) GGF JSDL GGF OGSA-BES GGF OGSA-Byte-IO GGF OGSA-DAI GGF OGSA-EMS GGF OGSA-RSS GGF RUS GGF SRM (planned for beta 2) GGF UR GGF WS-I RUS GGF ACS GGF CDDLM Other Industry Standards Employed by NAREGI ANSI/ISO SQL DMTF CIM IETF OCSP/XKMS MPI 2.0 OASIS SAML2.0 OASIS WS-Agreement OASIS WS-BPEL OASIS WSRF2.0 OASIS XACML De Facto Standards / Commonly Used Software Platforms Employed by NAREGI Ganglia GFarm 1.1 Globus 4 GRAM Globus 4 GSI Globus 4 WSRF (Also Fujitsu WSRF for C binding) IMPI (as GridMPI) Linux (RH8/9 etc.), Solaris (8/9/10), AIX, … MyProxy OpenMPI Tomcat (and associated WS/XML standards) Unicore WF (as NAREGI WFML) VOMS Necessary for Longevity and Vendor Buy-In Metric of WP Evaluation Implement “Specs” early even if nascent if seemingly viable

21 Demonstrates process flow and several functionalities working together as Grid Services. ・ Users can execute huge jobs such as coupled simulations using appropriate resources across multiple computer centers, ・ VO members across organizations can share the resources. ・ NAREGI widely adopts grid standards, and will contribute standardization and multi-grid interoperation. ・ Open Source package self-contained. ⇒ Sustainable Research Grid Infrastructure. NAREGI Middleware Allocation, scheduling Grid Services Requirements. NAREGI Middleware

22 NAREGI R&D Assumptions and Goals Future Research Grid Metrics for Petascale –10s of Institutions/Centers, various Project VOs –> 100,000 users, > 100,000~1,000,000 CPUs Machines are very heterogeneous CPUs (super computers, clusters, desktops), OSes, local schedulers –24/7 usage, production deployment –Server Grid, Data Grid, Metacomputing … High Emphasis on Standards –Start with Globus, Unicore, Condor, extensive collaboration –GGF contributions, esp. OGSA TM reference implementation Win support of users –Application and experimental deployment essential –R&D for production quality (free) software –Nano-science (and now Bio) involvement, large testbed

23  Release Features Resource and Execution Management – GGF OGSA-EMS based architecture – Automatic resource brokering and job scheduling – Reservation based co-allocation – VO and local policy based access control – Network traffic measurement and control – GGF JSDL based job description – DMTF-CIM based resource information model – GGF OGSA-RUS based accounting Data Grid (New from FY2005) – WSRF based grid-wide data sharing

24  Release Features continued User-Level Tools – Web based Portal – Workflow tool w/NAREGI-WFML – Application contents and deployment service – Large-Scale Interactive Grid Visualization Grid Ready Programming Tools & Libraries – Standards compliant GridMPI and GridRPC Bridge Method for Different Type Applications VO support – Production-quality CA, APGrid PMA – VOMS based VO user management and more …

25 NAREGI Programming Models High-Throughput Computing –But with complex data exchange inbetween –NAREGI Workflow or GridRPC Metacomputing (cross-machine parallel) –Workflow (w/co-scheduling) + GridMPI –GridRPC (for task-parallel or task/data- parallel) Coupled Multi-Resolution Simulation –Workflow (w/co-scheduling) + GridMPI + Coupling Components Mediator (coupled simulation framework) GIANT (coupled simulation data exchange framework)

26 Nano-Science : coupled simluations on the Grid as the sole future for true scalability The only way to achieve true scalability! … between Continuum & Quanta. 10 -6 10 -9 m Material physics 的 (Infinite system) ・ Fluid dynamics ・ Statistical physics ・ Condensed matter theory … Molecular Science ・ Quantum chemistry ・ Molecular Orbital method ・ Molecular Dynamics … Multi-Physics Limit of Computing Capability Limit of Idealization Coordinates decoupled resources; Meta-computing, High throughput computing, Multi-Physics simulation w/ components and data from different groups within VO composed in real-time Old HPC environment: ・ decoupled resources, ・ limited users, ・ special software,...

27 Distributed Servers LifeCycle of Grid Apps and Infrastructure MetaComputing Workflows and Coupled Apps / User Many VO Users Application Contents Service HL Workflow NAREGI WFML Dist. Grid Info Service GridRPC/Grid MPI User Apps VO Application Developers&Mgrs GridVM SuperScheduler Data 1 Data 2 Data n Meta- data Meta- data Meta- data Grid-wide Data Management Service (GridFS, Metadata, Staging, etc.) Place & register data on the Grid Assign metadata to data

28 Information Service NAREGI Application Mediator (WP6) for Coupled Applications Job 1 Workflow NAREGI WFT co-allocated jobs Super Scheduler GridVM Simulation A Mediator Simulation A Simulation B Mediator A Mediator A Mediator A Mediator B Mediator B Sim.B Sim.A Support data exchange between coupled simulation MP I OGSA-DAI WSRF2.0 SQL JNI ・ Semantic transform- ation libraries for different simulations Data transfer management ・ Synchronized file transfer Mediator Components ・ Multiple protocol GridFTP/MPI Job n MP I Data transformation management ・ Coupled accelerator MP I GridFTP Mediator Globus Toolkit 4.0.1 GridFTP API Data1 Data2 Data3 * SBC: Storage-based communication SBC * -XML ・ Global Job ID ・ Allocated nodes ・ Transfer Protocol etc.

29 RISM Solvent distribution FMO Electronic structure Mediator Solvent charge distribution is transformed from regular to irregular meshes Mulliken charge is transferred for partial charge of solute molecules Electronic structure of Nano-scale molecules in solvent is calculated self-consistent by exchanging solvent charge distribution and partial charge of solute molecules. *Original RISM and FMO codes are developed by Institute of Molecular Science and National Institute of Advanced Industrial Science and Technology, respectively. RISM-FMO Coupled Simulation

30 Mediator RISM Reference Interaction Site Model FMO Fragment Molecular Orbital method GridMPI Electronic Structure in Solutions Electronic Structure Analysis Solvent Distribution Analysis Data Transformation between Different Meshes RISM-FMO Application A coupled simulation of the electronic structure of a molecule in solution RISM:CPU=Power4, OS=AIX, #CPUs=32, #nodes=1 FMO:CPU=Xeon, OS=Linux, #CPUs=10, #nodes=5 Mediator: CPU=Xeon, OS=Linux, #CPUs=1, #nodes=2 IMPI:CPU=Xeon, OS=Linux, #CPUs=1, #nodes=1 Requirements RISMFMO Mediator IMPI : GridMPI com. : GridMPI init. Configuration

31 RISM Job Local Scheduler GridVM Local Scheduler IMPI Server GridVM FMO Job Local Scheduler GridVM Super Scheduler WFT MPI source Work- flow PSE Site ρ Site αSite μ Co-Allocation 3: Negotiation Agreement 6: MPI job starts 9: Accounting 2: Monitoring 4: Reservation 5: IMPI starts c: Edit b: Deployment 2: Resource discovery 7: MPI init. Information Service GridMPI 8: Visualization 1: Submission Application requirement definition Note) There is one more job: Mediator Job CA/RA MyProxy VOMS User cert. a: Sign-on Portal Network monitor DataGrid Grid File System Proxy cert. (User DN, VO) … Multi-physics coupled MPI job a – c, 1: Job Submission: Client Side

32 VOMS (VO2) VOMS (VO2) Certificate Management Server Certificate Management Server User Certificate Private Key NAREGI CA Put Proxy cert. with VO attributes Get delegated proxy cert. with VO attrib. Put Proxy cert. with VO attrib. Get delegated Proxy cert. with VO attrib. Client Environment Portal WFT PSE GVM SS client Super Scheduler MyProxy Proxy cert. with VO attrib. Proxy cert. with VO attrib. MyProxy+ sign-on Get User Certificate ssh + voms-myproxy-init voms-proxy-init SignedJobDescription VO Information Service Query (Job requirements +VO attributes) +VO attributes) Resources in the VO Resources Info incl. VO attrib. Resource GridVM local Info. incl. VO attrib. globusrun-ws GridVM services Resource Info. (Incl. VO attributes) Authz polcy in terms of VO attrib. ssh + Put Certificate VOMS (VO1) VOMS (VO1) Proxy cert. with VO attrib. Proxy cert. with VO attrib. Proxy cert. with VO attrib. Proxy cert. with VO attrib. DataGrid Users access to portal, get VOMS-proxy certificates (/w their VO attrib) deposited beforehand and submit their jobs to NAREGI M/W using Grid tools. Proxy cert. with VO attrib. Proxy cert. with VO attrib. Proxy cert. with VO attrib. Proxy cert. with VO attrib. Proxy cert. with VO attrib. Proxy cert. with VO attrib. IS GRAM CP/CPS Audit criteria Subset of WebTrust Programs for CA Authorized by APGridPMA (Virtual Organization Membership Service) a: Single Sign-On

33 Application sharing in Research communities Information Service ⑦ Register Deployment Info. Server#1Server#2Server#3 CompilingOK! Test Run NG! Test RunOK!Test RunOK! Resource Info. ③ Compile Application Summary Program Source Files Input Files Resource Requirements etc. Application Developer ⑥ Deploy ④ Send back Compiled Application Environment ① Register Application ② Select Compiling Host ⑤ Select Deployment Host ACS (Application Contents Service) PSE Server b-1: Registration & Deployment of Applications

34 Data 1 Data 2 Data n Grid-wide File System Metadata Management Data Access Management Data Resource Management Job 1 Meta- data Meta- data Data 1 Grid Workflow Data 2Data n Job 2Job n Meta- data Job 1 Grid-wide Data Sharing Service Job 2 Job n Data Grid Components Import data into workflow Place & register data on the Grid Assign metadata to data Store data into distributed file nodes b-2: Data Import to Data Sharing Service

35 PSE DataGrid Information Service Super Scheduler Web server(apache) Workflow Servlet tomcat Wokflow Description By NAREGI-WFML Server BPEL ↓ JSDL -A ↓ JSDL -A ………………….. NAREGI JM I/F module BPEL+JSDL http(s) Data icon Program icon Appli-A Appli-B JSDL applet Global file information Application Information /gfarm/.. GridFTP (Stdout Stderr) c: Description of Workflow and Job Submission Requirements

36 RISM Job Local Scheduler GridVM Local Scheduler IMPI Server GridVM FMO Job Local Scheduler GridVM Super Scheduler WFT MPI source Work- flow PSE Site ρ Site αSite μ 3: Negotiation Agreement 6: MPI job starts 9: Accounting 2: Monitoring 4: Reservation 5: IMPI starts c: Edit b: Deployment 2: Resource discovery Co-Allocation 8: Visualization 1: Submission Application requirement definition 7: MPI init. GridMPI DataGrid Grid File System Network monitor MyProxy VOMS CA a: Sign-on Portal Information Service 2 - 6: Resource Discovery, Co-Allocation, Advanced Reservation and Submission

37 Distributed Info.Services maintain various kind of information; CPU, Memory, OS, Job Queue, Account, Usage Record, etc.etc. across multiple administrative domains, ・ Abstract heterogeneous resources (CIM schema) → RDB, ・ Retrieve resource DB through Grid Service (OGSA-DAI), ・ Access resource info. according to the users’ right. Linux AIX Linux UNIX Solaris Resource Performance LogUser Budget Policy Account ResourceUser Account Budget Policy Log Resource Policy AccountUser Log Budget Resource … Associated tables based on CIM schema. OGSA -DAI Super Scheduler Distributed Information Service WFT Sub job requirements 2: Resource Discovery

38 WFML2BPEL SS NAREGI JM(SS) Java I/F module NAREGI-WP3 WorkFlowTool, PSE, GVS JM-Client Submit Status Delete Cancel EPS CSG IS Cance l StatusSubmit BPEL2WFST CreateActivity(FromBPEL) GetActivityStatus RequestActivityStateChanges CE S OGSA-DAI CIM 2.10 RDB PostgreSQL JSDL MakeReservation CancelReservation globus-url-copy uber-ftp CES S SR R BPEL ( include JSDL) Invoke EPS Invoke SC JSDL NAREGI- WFML NAREGI- WFML JSDL Co-allocationFileTransfer CreateActivity(FromJSDL) GetActivityStatus RequestActivityStateChanges MakeReservation CancelReservation JSDL Delete JSDL NAREGI JM (BPEL Engine) S JSDL Abbreviation SS: Super Scheduler JSDL: Job Submission Description Document JM: Job Manager EPS: Execution Planning Service CSG: Candidate Set Generator RS: Reservation Service IS: Information Service SC: Service Container AGG-SC: Aggregate SC GVM-SC: GridVM SC FTS-SC: File Transfer Service SC BES: Basic Execution Service I/F CES: Co-allocation Execution Service I/F (BES+) CIM: Common Information Model GNIS: Grid Network Information Service SelectResource FromJSDL GenerateCandidate Set Generate SQL Query From JSDL AGG-SC / RS SC CES PBS, LoadLeveler S GRAM4 specific SC(GVM-SC) SC CES PBS, LoadLeveler S GRAM4 specific SC(GVM-SC) R R WS-GRAM GridVM WS-GRAM GridVM FTS-SC Fork/Exec is-query Fork/Exec globusrun-ws Fork/Exec GFarm server GNIS GetGroups- OfNodes SQL to allocate and execute jobs on multiple sites simultaneously. 3, 4: Co-allocation and Reservation

39 export IMPI_SERVER_IP=clusterX.naregi.org export IMPI_SERVER_PORT=5963 export IMPI_CLIENT_ID=1 export _YAMPII_CONFIG_FILE = … export GRIDVM_GLOBAL_JOB_ID=XXX export GRIDVM_GLOBAL_USER_ID=‘/O=Grid/...’ export GRIDVM_SUBJOB_ID = YYY export GRIDVM_LAUNCHER=gvm_launch $GRIDVM_LAUNCHER lmpirun –np 128./app GridVM Site A Site B ジョブ GridMPI Super Scheduler Co-allocation Concrete JSDL Reserve resources for the parallel jobs Start the jobs at the reserved time Generate a shell script for the local MPI job according to JSDL and run the script. Configure the IMPI server and the execution nodes, etc… IMPI サーバ Extended JSDL for parallel jobs and co-allocation AbstractJSDL WorkFlow Tool If the resources, which is enough for the requirements, is not found ⇒ Automatic job division Create concrete JSDL Concrete JSDL 5, 6: Advanced Reservation and Job Execution (GridMPI job)

40 Information Services Service Container Accounting Services Execution Planning Services Candidate Set Generator JSDL Reservation Query Update Submit Reserve Register Provisioning  Deployment  Configuration NAREGI GridVM NAREGI Information Service NAREGI WFT JSDL NAREGI Super Scheduler Application Contents Service Deployment NAREGI PSE Job Manager JSDL Discover & Select Open Grid Service Architecture – Execution Management Service will be standardized by OGSA EMS-WG ■ NAREGI will contribute our middleware as a reference implementation of OGSA-EMS. NAREGI Resource Management and OGSA-EMS

41 RISM Job Local Scheduler GridVM Local Scheduler IMPI Server GridVM FMO Job Local Scheduler GridVM Super Scheduler WFT MPI source Work- flow PSE Site ρ Site αSite μ 3: Negotiation Agreement 6: MPI job starts 2: Monitoring 4: Reservation 5: IMPI starts c: Edit b: Deployment 2: Resource discovery 7: MPI init. Information Service Co-Allocation 9: Accounting GridMPI 8: Visualization 1: Submission Application requirement definition MyProxy VOMS CA a: Sign-on Portal Network monitoring 7: GridMPI Communication and Network Monitoring

42 Cluster A YAMPII IMPI Server B Vendor MPI IMPI server ■ GridMPI is a library which enables MPI communication between parallel systems in the grid environment. (in Metropolitan Area :  10Gbps,  10ms) ・ Huge data size jobs which cannot be executed in a single cluster system ・ Multi-Physics jobs in the heterogeneous CPU architecture environment (1) Inter operability: - IMPI ( Interoperable MPI ) compliance communication protocol - Strict implementation of MPI 2.0 standard (test suite compliant) (2) High performance: - Simple implementation 7: GridMPI

43 NW topology, Monitoring data ( CIM ) Grid Middleware SuperScheduler Information Service Network Resource Management ControlMonitor GUI for NW admin Brokering hint request Control NetworkMiddlebox NetworkMiddleboxNetworkMiddlebox IP Tunnel ( for Route Control ) Active measurement SiteSite Monitoring Monitor, control and manage network resources with Network Middleboxes. (requests of route control, visualization of topology & measuring result) passive + 7’: Network Monitoring

44 RISM Job Local Scheduler GridVM Local Scheduler IMPI Server GridVM FMO Job Local Scheduler GridVM Super Scheduler WFT RISM source FMO source Work- flow PSE Site A Site B Site C 3: Negotiation Agreement 6: MPI job starts 2: Monitoring 4: Reservation 5: IMPI starts 1: Submission c: Edit b: Deployment 2: Resource discovery Application requirement definition Co-Allocation GridMPI RISM SMP machine 64 CPUs FMO PC cluster 128 CPUs Output files Input files IMPI GVS 8: Visualization 9: Accounting Information Service 7: MPI init. GridMPI DataGrid Grid File System Network monitor MyProxy VOMS CA a: Sign-on Portal 8, 9: Grid Visualization and Accounting

45 Computing nodeFront-end Back-end PC cluster Super computer Portal RISM result VisualizerClient Service Provider MPI Parallel HTTP(S) socket FMO result Visualizer MPI Parallel Service Provider Images Stat. / Info. Requests Parameters Images Stat. / Info. Requests Parameters Post-Processing Visualization Loosely Coupled Concurrent Visualization Integrated WSRF-based remote visualization environment 8: Visualization Service7

46 ⑬ “Cell Domain” Info.Service Query “Cell Domain” Info.Service Query GridVM_Engine Process Computing node GridVM_Scheduler usage “Cell domain” Info.Service UR - XML RUSQuery Cluster / Site Info.Service “Node” (Upper layer) RUSQuery Users can search and summarize their URs by ・ Global User ID ・ Global Job ID ・ Global Resource ID ・ time range … collects and maintains Job Usage Records in the form of the GGF proposed standard RUS : Resource Usage Service UR : Usage Record GridVM_Engine Process Computing node GridVM_Engine Process Computing node usage (RUS,UR). 9: Accounting

47 Ninf-G: A Reference Implementation of the GGF GridRPC API Large scale computing across supercomputers on the Grid user ① Call remote procedures ② Notify results Utilization of remote supercomputers Call remote libraries Internet What is GridRPC? Programming model using RPCs on a Grid Provide easy and simple programming interface The GridRPC API is published as a proposed recommendation (GFD-R.P 52) What is Ninf-G? A reference implementation of the standard GridRPC API Built on the Globus Toolkit Now in NMI Release 8 (first non-US software in NMI) Easy three steps to make your program Grid aware –Write IDL file that specifies interface of your library –Compile it with an IDL compiler called ng_gen –Modify your client program to use GridRPC API

48 GridMPI MPI applications run on the Grid environment Metropolitan area, high-bandwidth environment:  10 Gpbs,  500 miles (smaller than 10ms one-way latency) –Parallel Computation Larger than metropolitan area –MPI-IO Wide-area Network Single (monolithic) MPI application over the Grid environment computing resource site A computing resource site A computing resource site B computing resource site B

49 GGF OGSA-EMS implementation written in C. Support Resource Discovery Discover available resources for a job and select the appropriate system (s) and submit the job. Support Complex Job Execution - Co-allocation Enable to run a job on more than one resources for MPI job. If a MPI job doesn’t match appropriate resources, SuperScheduler divide into the job into multiple fragment jobs and run on multiple resources. Full Feature Toolkit HTTP (s), SOAP, WS-Addressing, WS-ResourceFramework, WS-Notification, BPEL compiler/Interpreter, WSDL Compiler, JSDL, OGSA-EMS(CSG,EPS,RS, SC,JM) New NAREGI Super-Scheduler

50 Naregi WSRF Kit (NWK) Base software for SuperSchduler. Features –WSRF, WS-Notification, WS-Addressing –Wsdl Compiler (generate stub/skeleton code) –Interoperability with Globus4 –Developed by C and libxml2

51 NAREGI beta 1 SSS Architecture An extended OGSA-EMS Incarnation WFML2BPEL SS NAREGI JM(SS) Java I/F module NAREGI-WP3 WorkFlowTool, PSE, GVS JM-Client Submit Status Delete Cancel EPS CSG IS Cance l StatusSubmit BPEL2WFST CreateActivity(FromBPEL) GetActivityStatus RequestActivityStateChanges CE S OGSA-DAI CIM DB PostgreSQL JSDL MakeReservation CancelReservation globus-url-copy uber-ftp CES S SR R BPEL ( include JSDL) Invoke EPS Invoke SC JSDL NAREGI- WFML NAREGI- WFML JSDL Co-allocationFileTransfer CreateActivity(FromJSDL) GetActivityStatus RequestActivityStateChanges MakeReservation CancelReservation JSDL Delete JSDL NAREGI JM (BPEL Engine) S JSDL Abbreviation SS: Super Scheduler JSDL: Job Submission Description Document JM: Job Manager EPS: Execution Planning Service CSG: Candidate Set Generator RS: Reservation Service IS: Information Service SC: Service Container AGG-SC: Aggregate SC GVM-SC: GridVM SC FTS-SC: File Transfer Service SC BES: Basic Execution Service I/F CES: Co-allocation Execution Service I/F (BES+) CIM: Common Information Model GNIS: Grid Network Information Service SelectResource FromJSDL GenerateCandidate Set Generate SQL Query From JSDL AGG-SC /RS SC CES PBS, LoadLeveler S GRAM4 specific SC(GVM-SC) SC CES PBS, LoadLeveler S GRAM4 specific SC(GVM-SC) R R WS-GRAM GridVM WS-GRAM GridVM FTS-SC Fork/Exec is-query Fork/Exec globusrun-ws Fork/Exec GFarm server GNIS GetGroups- OfNodes

52 Meta computing scheduler is required to allocate and to execute jobs on multiple sites simultaneously. The super scheduler negotiates with local RSs on job execution time and reserves resources which can execute the jobs simultaneously. 3, 4: Co-allocation and Reservation Reservation Service Container Local RS 1 Service Container Local RS 2 Cluster (Site) 1 Cluster (Site) 2 Execution Planning Services with Meta-Scheduling Candidate Set Generator Local RS #: Local Reservation Service # Abstract JSDL (10) ① Abstract JSDL (10) ② Candidates: Local RS 1 EPR (8) Local RS 2 EPR (6) ④ Concrete JSDL (8) (1) 14:00- (3:00) + Concrete JSDL (2) (3)Local RS1 Local RS2 (EPR) (4)Abstract Agreement Instance EPR (2) ⑥ Concrete JSDL (8) 15:00 -18:00 ⑦ Concrete JSDL (2) 15:00 -18:00 ⑨ Super Scheduler GridVM Distributed Information Service ③ Concrete JSDL (8) ⑧ create an agreement instance Concrete JSDL (2) ⑩ create an agreement instance Abstract JSDL (10) create an agreement instance ⑤ ⑪

53 The NAREGI SSS Architecture (2007/3) Grid Resource Grid-Middleware PETABUS (Peta Application services Bus) NAREGI- SSS WESBUS (Workflow Execution Services Bus) CESBUS (Coallocation Execution Services Bus; a.k.a. BES+ Bus) BESBUS (Basic Execution Services Bus) Globus WS-GRAM I/F (with reservation) GridVM UniGridS Atomic Services (with reservation) UniGridS-SCGRAM-SC AGG-SC with RS (Aggregate SCs) CSGEPS FTS-SC JM BPEL Interpreter Service Application Specific Service

54 NAREGI Info Service (beta) Architecture Client (Resource Broker etc.) Client Library Java-API RDB Light- weight CIMOM Service Aggregator Service OS Processor File System CIM Providers ● ●● ● Resource Usage Service RUS::insertURs Chargeable Service (GridVM etc.) Job Queue Cell Domain Information Service Node B Node A Node C ACL Grid VM Cell Domain Information Service Information Service Node … Hierarchical filtered aggregation Parallel Query … Performance Ganglia Data Service Viewer User Admin. ・ CIMOM Service classifies info according to CIM based schema. ・ The info is aggregated and accumulated in RDBs hierarchically. ・ Client library utilizes OGSA-DAI client toolkit. ・ Accounting info is accessed through RUS. Client (publisher)

55 NAREGI IS: Standards Employed in the Architecture GT4.0.1Distributed Information Service Client (OGSA- RSS etc.) Client library Java-API RDB Light- weight CIMOM Service Aggregator Service OS Processor File System CIM Providers ● ●● ● WS-I RUS RUS::insertURs GridVM (Chargeable Service) Job Queue Cell Domain Information Service Node B Node A Node C ACL Grid VM Cell Domain Information Service Information Service Node … Hierarchical filtered aggregation... Distributed Query … Performance Ganglia OGSA-DAI WSRF2.1 Viewer User Admin. OGSA-DAI Client toolkit CIM Schema 2.10 /w extension GGF/ UR GGF/ UR APP CIM spec. CIM/XML Tomcat 5.0.28 Client (OGSA- BES etc.) APP

56 Platform independence as OGSA-EMS SC WSRF OGSA-EMS Service Container interface for heterogeneous platforms and local schedulers “Extends” Globus4 WS-GRAM Job submission using JSDL Job accounting using UR/RUS CIM provider for resource information Meta-computing and Coupled Applications Advanced reservation for co-Allocation Site Autonomy WS-Agreement based job execution (beta 2) XACML-based access control of resource usage Virtual Organization (VO) Management Access control and job accounting based on VOs (VOMS & GGF-UR) GridVM Features

57 NAREGI GridVM (beta) Architecture Virtual execution environment on each site Virtualization of heterogeneous resources Resource and job management services with unified I/F Super Scheduler Information Service AIX/LoadLeveler GridVM Scheduler GRAM4 WSRF I/F Local Scheduler GridVM Engine Linux/PBSPro GridVM Scheduler GRAM4 WSRF I/F Local Scheduler GridVM Engine GridMPI site Policy Advance reservation, Monitoring, Control Accounting Resource Info. Sandbox Job Execution site Policy

58 NAREGI GridVM: Standards Employed in the Architecture Super Scheduler Information Service GridVM Scheduler GRAM4 WSRF I/F Local Scheduler GridVM Engine GridVM Scheduler GRAM4 WSRF I/F Local Scheduler GridVM Engine GridMPI site Policy xacml-like access control policy CIM-based resource info. provider GT4 GRAM-integration and WSRF-based extension services UR/RUS-based job accounting Job submission based on JSDL and NAREGI extensions site Policy

59 GT4 GRAM-GridVM Integration GridVM scheduler RFT File Transfer Local scheduler GridVM Engine SS globusrun RSL+JSDL’ Delegate Transfer request GRAM services Delegation Scheduler Event Generator GRAM Adapter GridVMJobFactory GridVMJob SUDO Extension Service Basic job management + Authentication, Authorization Site Integrated as an extension module to GT4 GRAM Aim to make the both functionalities available PBS-Pro LoadLeveler …

60 Next Steps for WP1 – Beta2 Stability, Robustness, Ease-of-install Standard-setting core OGSA-EMS: OGSA-RSS, OGSA-BES/ESI, etc. More supported platforms (VM) –SX series, Solaris 8-10, etc. –More batchQs – NQS, n1ge, Condor, Torque “Orthogonalization” of SS, VM, IS, WSRF components –Better, more orthogonal WSRF-APIs, minimize sharing of states E.g., reservation APIs, event-based notificaiton –Mix-and-match of multiple SS/VM/IS/external components, many benefits Robutness Better and realistic Center VO support Better interoperability with external grid MW stack, e.g. Condor-C

61 VO and Resources in Beta 2 IS A.RO1 B.RO1 N.RO1 RO1 GridVMIS Policy VO-R01 VO-APL1 VO-APL2 GridVMIS Policy VO-R01 GridVMIS Policy VO-R01 VO-APL1 VO- RO1 ISSS Client VO- APL1 ISSS IS .RO2 .RO2.RO2 RO2 Policy VO-R02 VO-APL2 VO- RO2 ISSS Client GridVMIS Policy VO-R02 GridVMIS Policy VO-R01 VO-APL1 VO-APL2 ISSS GridVMIS Client RO3 Decoupling of WP1 components for pragmatic VO deployment "Peter Arzberger"

62 Data 1 Data 2 Data n Grid-wide File System Metadata Management Data Access Management Data Resource Management Job 1 Meta- data Meta- data Data 1 Grid Workflow Data 2Data n NAREGI Data Grid beta1 Architecture (WP4) Job 2Job n Meta- data Job 1 Grid-wide Data Sharing Service Job 2 Job n Data Grid Components Import data into workflow Place & register data on the Grid Assign metadata to data Store data into distributed file nodes Currently GFarm v.1.x

63 Future issues Current Issues to be solved Developed NAREGI-CA to be deployed in UPKI Security Requirements in AAA Authentication –PKI based user authentication –Compatible with GSI standards –Trust federation between CA’s Authorization –VO management for Inter-organizational collaboration –Interoperable with other Grid projects Accounting –ID federation for authorization & traceability –With privacy protection!

64 Virtual Organization user 1 ( VO Manager ) service_c service_a Services and Users are exposed in a Virtual Organization Organization A service_c service_b service_a user 2 user 3 user 1 Contract A service_x service_y user p service_z service_x service_y user puser q user r Organization B Contract B PKI domain VO domain Virtual Organization and Security Domain Definition of VO on GGF ・ CAS (Community Authorization Service) ・ VOMS (Virtual Organization Membership Service) A virtual organization(VO) is a dynamic collection of resources and users unified by a common goal and potentially spanning multiple administrative domains.

65 User CA/RA VOMS Proxy Cert + VO User Cert CRL Grid Job Submission VOMS-type VO Management developed in EGEE DN,VO, Group, roll, capability GRAM MK-gridmapfile Gridmap file GACL LCAS EGEE Grid site DN > pseudo accounts

66 User CA/RA VOMS GRAM Proxy Cert + VO User Cert CRL Grid Job Submission Managed by the Super Scheduler Account Mapping Gridmap file Policy file NAREGI Grid site VOMS-type VO Management adopted in NAREGI DN,VO info Grid VM Information Service Certificates handling is too hard for users

67 Job Submission mechanism in NAREGI Middleware  version VOMS MyProxy VOMS Proxy Certificate VOMS Proxy Certificate User Management Server(UMS) User Management Server(UMS) VOMS Proxy Certificate VOMS Proxy Certificate User Certificate Private Key Client Environment Portal Services WFT PSE GVS VOMS Proxy Certificate VOMS Proxy Certificate SS client The Super Scheduler (SS) VOMS Proxy Certificate VOMS Proxy Certificate GridVM WF Credential Repository VOMS Proxy Certificate VOMS Proxy Certificate Users Integrated and easy handling of VOMS and MyProxy Log in Workflow (WF) WF Credential is a user proxy cert passed through to the SS with the delegation protocol delegation Grid Jobs delegation The SS receives WF and deploys Grid jobs

68 VO and User Management Service Adoption of VOMS for VO management –Using proxy certificates with VO attributes for the interoperability with EGEE –GridVM is used instead of LCAS/LCMAPS Integration of MyProxy and VOMS servers –with UMS (User Management Server) to realize one-stop service at the NAREGI Grid Portal –using gLite implemented at UMS to connect VOMS server Workflow Credential Repository –As Workflow Credential a User Proxy Cert is used to realize safety delegation between the NAREGI Grid Portal and the Super Scheduler just in the same way as MyProxy. –The Super Scheduler receives Workflow (BPEL) and reserves resources to deploy Grid jobs with GSI interface.

69 Current Issues and the Future Plan Current Issues on VO management –VOMS platform gLite is running on GT2, while NAREGI middleware on GT4 –GridVM Interoperability of authorization policy with other Grid projects is to be realized. –Proxy certificate renewal Need to invent a new mechanism Future plan –Cooperation with GGF security area members to realize interoperability with each other. –A new proposal of VO management methodology and trial of reference implementation.

70 MyProxy User CA/RA Web Server VO Management Policy Enforcement Point Authentication &Authorization Service Proxy Cert of User User Cert SAML+XACML CRL Log in Grid Job Submission Policy Decision Point Policy Information Point OCSP/ XKMS LDAP AuthN&AuthZ Services in the future Super Scheduler GRAM (Grid VM)

71 NAREGI-beta1 Security Architecture (WP5) VOMS MyProxy VOMS Proxy Certificate VOMS Proxy Certificate User Management Server(UMS) User Management Server(UMS) VOMS Proxy Certificate VOMS Proxy Certificate User Certificate Private Key Client Environment Portal WFT PSE GVS VOMS Proxy Certificate VOMS Proxy Certificate SS client Super Scheduler VOMS Proxy Certificate VOMS Proxy Certificate NAREGI CA GridVM MyProxy+ VOMS Proxy Certificate VOMS Proxy Certificate Grid File System (AIST Gfarm) Grid File System (AIST Gfarm) Data Grid disk node

72 NAREGI-beta1 Security Architecture WP5-the standards NAREGI CA Client Environment Portal WFT PSE GVM SS client Proxy Certificate with VO Proxy Certificate with VO Super Scheduler Proxy Certificate with VO Proxy Certificate with VO log-in Request/Get Certificate Certificate VOMS Certificate Management Server Certificate Management Server Proxy Certificate withVO Proxy Certificate withVO User Certificate Private Key Put ProxyCertificate with VO Get VOMS Attribute MyProxy Proxy Certificate with VO Proxy Certificate with VO voms-myproxy-init Information Service Resources Info incl. VO Resource GridVM local Info. incl. VO CA Service ssh + voms-myproxy-init ProxyCertificate with VO query(requirements +VO info) resources in the VO globusrun-ws GridVM services GridVM services (incl. GSI) Resource Info. (Incl. VO info) SignedJobDescription VO 、 Certificate Management Service VO Info 、 Execution Info, Resource Info CP/CPS GRID CP (GGF CAOPs) Audit Criteria Subset of WebTrust Programs for CA

73 VO and User Management Service Adoption of VOMS for VO management –Using proxy certificate with VO attributes for the interoperability with EGEE –GridVM is used instead of LCAS/LCMAPS Integration of MyProxy and VOMS servers into NAREGI –with UMS (User Management Server) to realize one-stop service at the NAREGI Grid Portal –using gLite implemented at UMS to connect VOMS server MyProxy+ for SuperScheduler –Special-purpose certificate repository to realize safety delegation between the NAREGI Grid Portal and the Super Scheduler –Super Scheduler receives jobs with user’s signature just like UNICORE, and submits them with GSI interface.

74 2 CPU8 CPU 1 CPU 4 CPU 2 CPU1 CPU 4 CPU Computational Resource Allocation based on VO Resource configulation Workflow VO1 VO2 pbg1042 4 CPU png2041 4 CPU png2040 2 CPU 8 CPU pbg2039 Different resource mapping for different VOs

75 Local-File Access Control (GridVM) Provide VO-based access control functionality that does not use gridmap files. Control file-access based on the policy specified by a tuple of Subject, Resource, and Action. Subject is a grid user ID or VO name. DN Grid User Y Local Account Resource R Policy Permit: Subject=X, Resource=R, Action=read,write Deny: Subject=Y,Resource=R, Action=read Grid User X GridVM Access Control

76 Structure of Local-File Access Control Policy +TargetUnit 1 1 +Default +RuleCombiningAlgorithm 1 1 +Effect 1 1..* 1 1 1 0..1 1 0..1 1 0..1 1 1..* 1 0..* 1 0..* Who What Resouce Access Type Control user / VO read / write / execute permit / deny file / directory

77 Policy Example (1) <gvmcf:AccessProtection gvmac:Default="Permit" gvmac:RuleCombiningAlgorithm="Permit-overrides"> … /etc/passwd … User1 /etc/passwd read Default Applying rules

78 Policy Example (2) bio /opt/bio/bin./apps read execute VO name Resource name

79 VO-based Resouce Mapping in Global File System (  2) Next release of Gfarm (version 2.0) will have access control functionality. We will extend Gfarm metadata server for the data-resource mapping based on VO. Gfarm Metadata Server Gfarm Metadata Server file server file server Client file server file server file server file server file server file server VO1 VO2

80 Current Issues and the Future Plan Current Issues on VO management –VOMS platform gLite is running on GT2 and NAREGI middleware on GT4 –Authorization control on resource side Need to implement new functions for resource control on GridVM, such as Web services, reservation, etc. –Proxy certificate renewal Need to invent a new mechanism Future plans –Cooperation with GGF security area members to realize interoperability with other grid projects –Proposal of a new VO management methodology and trial of reference implementation.

81 Users describe ・ Workflow being executed with abstract requirement of sub jobs. NAREGI M/W ・ discovers fitting grid resources, ・ allocates resources executes on virtualized environments ・ as a ref. impl. of OGSA-EMS. ・ Programming Model : GridMPI & Grid RPC ・ User Interface : Deployment, Visualization, WFT ・ Grid File System is supported. ・ VOMS based VO management. under evaluation on NAREGI testbed. ・~ 15TF ・ SuperSINET Abstract Requirement NAREGI M/W Allocates jobs appropriately. concretizes, General-purpose Research Grid e-Infra. NAREGI Middleware  release is now on www.naregi.org as open source swwww.naregi.org

82 NAREGI beta on “VM Grid” Create “Grid-on-Demand” environment using Xen and Globus Workspace Vanilla personal virtual grid/cluster using our Titech Lab’s research results NAREGI beta image dynamic deployment

83 “VM Grid” – Prototype “S” http://omiij-portal.soum.co.jp:33980/gridforeveryone.php Request # of virtual grid nodes Fill in the necessary info in the form Confirmation page appears, follow instructions Ssh login to NMI stack (GT2+Condor) + selected NAREGI beta MW (Ninf-G, etc.) Entire Beta installation in the works Other “Instant Grid” research in the works in the lab

84 The Ideal World: Ubiquitous VO & user management for international e-Science Europe: EGEE, UK e-Science, … US: TeraGrid, OSG, Japan: NII CyberScience (w/NAREGI), … Other Asian Efforts (GFK, China Grid, etc.)… Grid Regional Infrastructural Efforts Collaborative talks on PMA, etc. HEP Grid VO NEES -ED Grid VO Astro IVO Standardization, commonality in software platforms will realize this Different software stacks but interoperable

85

86 From Interoperation to Interoperability GGF16 “Grid Interoperations Now” Charlie Catlett Director, NSF TeraGrid on Vacation Satoshi Matsuoka Sub Project Director, NAREGI Project Tokyo Institute of Technology / NII

87 Interoperation Activities The GGF GIN (Grid Interoperations Now) effort –Real interoperation between major Grid projects –Four interoperation areas identified Security, Data Mgmt, Information Service, Job Submission (not scheduling) EGEE/gLite – NAREGI interoperation –Based on the four GIN areas –Several discussions, including 3 day meeting at CERN mid March, email exchanges Updates at GGF17 Tokyo next week Some details in my talk tomorrow

88 The Reality: Convergence/Divergence of Project Forces (original slide by Stephen Pickles, edited by Satoshi Matsuoka) NGS(UK) TeraGrid(US) OSG(US) EGEE(EU) LCG(EU) GridPP(UK) OMII(UK) GT4 WSRF (OGSA?) gLite / GT2 Own WSRF & OGSA Globus(US) GGF DEISA(EU) common staff & procedures common users NAREGI (JP) interoperable infrastructure talks WSRF & OGSA,  GT4/Fujitsu WSRF NMI(US) IBM Unicore CSI (JP) UniGrids(EU) WS-I+ & OGSA? AIST-GTRC interoperable infrastructure talks Condor(US) APAC Grid (Australia) EU-China Grid (China)

89 GGF Grid Interoperation Now Started Nov. 17 2005 @SC05 by Catlett and Matsuoka –Now participation by all major grid projects “Agreeing to Agree on what needs to be Agreed first” Identified 4 Essential Key Common Services –Authentication, Authorization, Identity Management Individuals, communities (VO’s) –Jobs: submission, auditing, tracking Job submission interface, job description language, etc. –Data Management Data movement, remote access, filesystems, metadata mgmt –Resource discovery and Information Service Resourche description schema, information services

90 “Interoperation” versus “Interoperability” Interoperability “The ability of software and hardware on multiple machines from multiple vendors to communicate“ –Based on commonly agreed documented specifications and procedures Interoperation “Just make it work together” –Whatever it takes, could be ad-hoc, undocumented, fragile –Low hanging fruit, future interoperability

91 Interoperation Status GIN meetings GGF16 and GGF17 3-day meeting at CERN end of March Security –Common VOMS/GSI infrastructure –NAREGI more complicated use of GSI/Myproxy and proxy delegation but should be OK Data –SRM commonality and data catalog integration –GFarm and DCache consolidation Information Service –CIM vs. GLUE schema differences –Monitoring system differences fairly –Schema translation (see next slides) Job Submission –JDL vs. JSDL, Condor-C/CE vs. OGSA SS/SC-VM architectural differences, etc. –Simple job submission only (see next slides)

92 Basic syntax: –Resource description schemas (e.g., GLUE, CIM) –Data representations (e.g., XML, LDIF) –Query languages (e.g., SQL, XPath) –Client query interfaces (e.g., WS Resource Properties queries, LDAP, OGSA- DAI) Semantics: –What pieces of data are needed by each Grid (various previous works & actual deployment experiences already) Implementation: –Information service software systems (e.g., MDS, BDII) –The ultimate sources of this information (e.g., PBS, Condor, Ganglia, WS-GRAM, GridVM, various grid monitoring systems, etc.). Information Service Characteristics

93 NAREGI Information Service GT4.0.1Distributed Information Service Client (OGSA- RSS etc.) Client library Java-API RDB Light- weight CIMOM Service Aggregator Service OS Processor File System CIM Providers ● ●● ● WS-I RUS RUS::insertURs GridVM (Chargeable Service) Job Queue Cell Domain Information Service Node B Node A Node C ACL Grid VM Cell Domain Information Service Information Service Node … Hierarchical filtered aggregation... Distributed Query … Performance Ganglia OGSA-DAI WSRF2.1 Viewer User Admin. OGSA-DAI Client toolkit CIM Schema 2.10 /w extension GGF/ UR GGF/ UR APP CIM spec. CIM/XML Tomcat 5.0.28 Client (OGSA- BES etc.) APP

94 Relational Grid Monitoring Architecture Producer Service Registry Service Consumer Service Mediator Schema Service Consumer application Producer application Publish Tuples Send Query Receive Tuples Register Locate Query Tuples SQL “CREATE TABLE” SQL “INSERT” SQL “SELECT” An implementation of the GGF Grid Monitoring Architecture (GMA) All data modelled as tables: a single schema gives the impression of one virtual database for VO Vocabulary Manager

95 GridSchemaDataQuery Lang Client IFSoftware Tera- Grid GLUEXMLXPathWSRF RP Queries MDS4 OSGGLUELDIFLDAP BDII NAREGI CIM 2.10+ext Relatio nal SQLOGSA- DAI WS-I RUS CIMOM + OGSA- DAI EGEE/ LCG GLUELDIFLDAP BDII Relatio nal SQLR-GMA i/fR-GMA Nordu Grid ARCLDIFLDAP GIIS Syntax Interoperability Matrix

96 Low Hanging Fruit “Just make it work by GLUEing” Identify the minimum common set of information required for interoperation in the respective information service Employ GLUE and extended CIM as the base schema for respective grids Each Info service in grid acts as a information provider for the other Embed schema translator to perform schema conversion Present data in a common fashion on each grid ; WebMDS, NAREGI CIM Viewer, SCMSWeb, …

97 Minimal Common Attributes Provisioning ACS IS SC DC Accounting EPS CSG Reservation JM Provisioning ACS IS SC DC Accounting EPS CSG Reservation JM Common attributes Define minimal common set of attributes required Each system components in the grid will only access the translated information

98 Producer Service Registry Service Consumer Service Mediator Schema CIM→GLUE producer Publish Tuples Send Query Receive Tuples Register Locate Query Tuples SQL “CREATE TABLE” SQL “INSERT” SQL “SELECT” GLUE→CIM translator CIM provider skeleton NRG_Unicore Account OS System Lightweight CIMOM Cell Domain Information Service for EGEE resources RDB Aggregator Service OGSA -DAI Multi-Grid Information Service Node OGSA-DAI Aggregator Service CDIS for NAREGI GLUE-CIM mapping; selected Minimal Attributes G-Lite / R-GMA NAREGI GLUE→CIM translation ・ Development of information providers with translation from GLUE data model to CIM about selected common attributes such as up/down status of grid services CDIS for NAREGI SQL “SELECT”

99 Interoperability: NAREGI Short Term Policy gLite –Simple/Single Job (up to SPMD) –Bi-Directional Submission NAREGI  gLite:GT2-GRAM gLite  NAREGI:Condor-C –Exchange Resource Information GIN –Simple/Single Job (up to SPMD) –NAREGI  GIN Submission –WS-GRAM –Exchange Resource Information BES/ESI –TBD Status somewhat Confusing ESI middleware already developed ? –Globus 4.X and/or UnicoreGS ?

100 Job Submission Standdards Comparison: Goals ESIBESNAREGI SSSC(VM ) Use JSDL ✔ *1 ✔ WSRF OGSA Base Profile 1.0 Platform ✔✔✔✔ Job Management Service ✔✔✔✔ Extensible Support for Resource Models ✔✔ Reliability ✔✔✔✔ Use WS-RF modeling conventions ✔✔✔✔ Use WS-Agreement Advance reservation ✔✔ Bulk operations ✔ Generic management frameworks (WSDM) Define alternative renderings Server-side workflow ✔ *1: Extended

101 Job Factory Operations ESI (0.6)BES (Draft v16)NAREGI (  OriginalCreateManagedJob (there is a subscribe option) CreateActivityFromJSDL GetActivityStatus RequestActivityStateChanges StopAcceptingNewActivities StartAcceptingNewActivities IsAcceptingNewActivities GetActivityJSDLDocuments MakeReservations CommitReservations WS- ResourcePrope rties GetResourceProperty GetMultipleResourceProperties QueryResourceProperties GetResourceProperty GetMultipleResourceProperties QueryResourceProperties GetResourceProperty GetMultipleResourceProperties SetResourceProperties WS- ResourceLifeT ime ImmediateResourceDestructio n ScheduledResourceDestruction Destroy SetTerminationTime WS- BaseNotificati on NotificationProducerNotify Subscribe Now working with Dave Snelling et. al. to converge to ESI-API (which is similar to WS-GRAM), plus * Advance Reservation * Bulk submission

102 Interoperation: gLite- NAREGI gLite to NAREGI bridge NAREGI client lib ClassAd  JSDL NAREGI-WF generation Job Submit./Ctrl. Status propagation Certification ? NAREGI-IS [CIM] IS bridge gLite-IS [GLUE] GLUE  CIM gLite-WMS [JDL] Condor-C NAREGI-SS [JSDL] NAREGI-SC Interop-SC gLite-CE NAREGI GridVM JSDL  RSL Job Submit./Ctrl. Status propagation GT2-GRAM NAREGI Portal gLite user NAREGI user

103 Interoperation: GIN (Short Term) NAREGI-IS [CIM] IS bridge anotherGrid-IS [CIM or GLUE] GLUE  CIM anotherGrid [JSDL] GT4 NAREGI-SS [JSDL] NAREGI-SC Interop-SC anotherGrid-CE NAREGI GridVM JSDL  RSL Job Submit./Ctrl. Status propagation WS-GRAM NAREGI Portal another grid user NAREGI user WS-GRAM Sorry !! One way now


Download ppt "NAREGI Middleware Beta 1 and Beyond Satoshi Matsuoka Professor, Global Scientific Information and Computing Center, Deputy Director, NAREGI Project Tokyo."

Similar presentations


Ads by Google