Download presentation
Presentation is loading. Please wait.
Published byClaud Sims Modified over 8 years ago
1
NAREGI Middleware Beta 1 and Beyond Kazushige Saga Center for Grid Research and Development National Institute of Informatics Satoshi Matsuoka Professor, Global Scientific Information and Computing Center, Deputy Director, NAREGI Project Tokyo Institute of Technology / NII http://www.naregi.org
2
● Publication of scientific results from academia Human Resource Development and strong organization NAREGI Middleware Virtual Organization For science CyberScience Infrastructure for Advanced Science (by NII) Advanced Science (by NII) To Innovate Academia and Industry UPKI ★ ★ ★ ★ ★ ★ ★ ☆ Super-SINET: a next generation network infrastructure supported by NII and 7 National Computer Centers CyberScience Infrastructure Hokkaido Univ. Tohoku Univ. Univ. of Tokyo NIINII Nagoya Univ. Kyoto Univ. Osaka Univ. Kyushu Univ. ( Titech 、 Waseda, KEK etc. ) Scientific Repository Industry Liaison and Social Benefit Global Contribution
3
High Density GSIC Main Clusters (256 processors) x 2 systems in just 5 cabinets Titech Grid is a large-scale, campus-wide, pilot commodity Grid deployment for next generation E-Science application development within the Campuses of Tokyo Institute of Technology (Titech) High-density blade PC server systems consisting of 800 high-end PC processors installed at 13 locations throughout the Titech Campuses, interconnected via the Super TITANET backbone. The first campus-wide pilot Grid system deployment in Japan, providing next-generation high-performance “ virtual parallel computer ” infrastructure for high-end computational E-Science. 24-processor Satellite Systems @ each department ×12 systems 2002/3-2006/3: Titech “ TeraScale ” Campus Grid - System Image Super SINET (10 Gbps MOE National Backbone Network) to other Grids Grid-wide Single System Image via Grid middleware: Globus, Ninf-G, Condor, NWS, … > 1300 processors, > 2.5 TeraFlops, 15 clusters and 4 Supecomputers/Servers Super TITANET (1-4Gbps) Suzukake-dai Campus Oo-okayama Campus NEC Express 5800 Series Blade Servers 30km
4
Titech Supercomputing Grid 2006 Suzukake-dai Oo-Okayama 数理・計算 C ( 予定 ) 計算工学 C ( 予定 ) 1.2km 35km, 10Gbps Campus Grid Cluster COE-LKR (Knowledge) cluster TSUBAME WinCCS ~13,000 CPUs, 90 TeraFlops, ~26 TeraBytes Mem, ~1.1 Petabytes Disk CPU Cores: x86: TSUBAME (~10600), Campus Grid Cluster (~1000), COE-LKR cluster (~260), WinCCS (~300) + ClearSpeed CSX600 (720 Chips)
5
The Titech TSUBAME Production Supercomputing Cluster, Spring 2006 ClearSpeed CSX600 SIMD accelerator 360 boards, 35TeraFlops(Current) Storage 1 Petabyte (Sun “Thumper”) 0.1Petabyte (NEC iStore) Lustre FS, NFS (v4?) 500GB 48disks 500GB 48disks 500GB 48disks NEC SX-8 Small Vector Nodes (under plan) Unified IB network Sun Galaxy 4 (Opteron Dual core 8-Way) 10480core/655Nodes 50.4TeraFlops OS Linux (SuSE 9, 10) NAREGI Grid MW Voltaire ISR9288 Infiniband 10Gbps x2 (xDDR) x ~700 Ports 10Gbps+External Network 7 th on June2006 Top500, 38.18 TFlops
6
TSUBAME Physical Installation 3 rooms (600m 2 ), 350m 2 service area 76 racks incl. network & storage, 46.3 tons –10 storage racks 32 AC units, 12.2 tons Total 58.5 tons (excl. rooftop AC heat exchangers) Max 1.2 MWatts ~3 weeks construction time 1 st Floor 2 nd Floor A 2 nd Floor B Titech Grid Cluster TSUBAME TSUBAME & Storage
7
Titech TSUBAME ~80+ racks 350m2 floor area 1.2 MW (peak)
8
1TF 10TF 100TF 1PF 200220062008201020122004 Earth Simulator 40TF (2002) Next Gen “PetaGrid” 1PF (2010) 2010 Titech “PetaGrid” => Interim 200TeraFlops @ 2008 => “Petascale” @ 2010 NORM for a typical Japanese center? →HPC Software is the key! 10PF “Keisoku” >10PF(2011) Titech Supercomputing Campus Grid (incl TSUBAME )~90TF (2006) Korean Machine >100TF (2006~7) Chinese National Machine >100TF (2007~8) US Petascale (2007~8) US HPCS (2010) BlueGene/L 360TF(2005) TSUBAME Upgrade >200TF (2008-2H) US 10P (2011~12?) 1.3TF Scaling Towards Petaflops… KEK 59TF BG/L+SR11100 Titech Campus Grid
9
NAREGI is/has/will… Is The National Research Grid in Japan –Part of CSI and future Petascale initiatives –METI “Business Grid” counterpart 2003-2005 Has extensive commitment to WS/GGF-OGSA –Entirely WS/Service Oriented Architecture –Set industry standards e.g. 1st impl. of OGSA-EMS Will work with EU/US/AP counterparts to realize a “global research grid” –Various talks have started, incl. SC05 interoperability meeting First OS public beta is available –http://www.naregi.org/download/
10
NAREGI is not/doesn’t/won’t… Is NOT an academic research project –All professional developers from Fujitsu, NEC, Hitachi, (No students) –IMPORTANT for Vendor buy-in and tech transfer Will NOT develop all software by itself –Will rely on external components in some cases –Must be WS and other industry standards compliant –IMPORTANT for Vendor buy-in and tech transfer Will NOT hinder industry adoption at all costs –Intricate open source copyright and IP policies –We want companies to save/make money using NAREGI MW –IMPORTANT for Vendor buy-in and tech transfer
11
NAREGI R&D Assumptions and Goals Future Research Grid Metrics for Petascale –10s of Institutions/Centers, various Project VOs –> 100,000 users, > 100,000~1,000,000 CPUs Machines are very heterogeneous CPUs (super computers, clusters, desktops), OSes, local schedulers –24/7 usage, production deployment –Server Grid, Data Grid, Metacomputing … High Emphasis on Standards –Start with Globus, Unicore, Condor, extensive collaboration –GGF contributions, esp. OGSA TM reference implementation Win support of users –Application and experimental deployment essential –R&D for production quality (free) software –Nano-science (and now Bio) involvement, large testbed
12
NAREGI Programming Models High-Throughput Computing –But with complex data exchange inbetween –NAREGI Workflow or GridRPC Metacomputing (cross-machine parallel) –Workflow (w/co-scheduling) + GridMPI –GridRPC (for task-parallel or task/data- parallel) Coupled Multi-Resolution Simulation –Workflow (w/co-scheduling) + GridMPI + Coupling Components Mediator (coupled simulation framework) GIANT (coupled simulation data exchange framework)
13
NAREGI Software Stack (beta 1 2006) - WS(RF) based (OGSA) SW Stack - Computing Resources and Virtual Organizations NII IMS Research Organizations Major University Computing Centers ( WSRF (GT4+Fujitsu WP1) + GT4 and other services) SuperSINET Grid-Enabled Nano-Applications (WP6) Grid PSE Grid Programming (WP2) -Grid RPC -Grid MPI Grid Visualization Grid VM (WP1) Packaging Distributed Information Service (CIM) Grid Workflow (WFML (Unicore+ WF)) Super Scheduler Grid Security and High-Performance Grid Networking (WP5) Data (WP4) WP3 WP1
14
List of NAREGI “Standards” (beta 1 and beyond) GGF Standards and Pseudo-standard Activities set/employed by NAREGI GGF “ OGSA CIM profile ” GGF AuthZ GGF DAIS GGF GFS (Grid Filesystems) GGF Grid CP (GGF CAOPs) GGF GridFTP GGF GridRPC API (as Ninf-G2/G4) GGF JSDL GGF OGSA-BES GGF OGSA-Byte-IO GGF OGSA-DAI GGF OGSA-EMS GGF OGSA-RSS GGF RUS GGF SRM (planned for beta 2) GGF UR GGF WS-I RUS GGF ACS GGF CDDLM Other Industry Standards Employed by NAREGI ANSI/ISO SQL DMTF CIM IETF OCSP/XKMS MPI 2.0 OASIS SAML2.0 OASIS WS-Agreement OASIS WS-BPEL OASIS WSRF2.0 OASIS XACML De Facto Standards / Commonly Used Software Platforms Employed by NAREGI Ganglia GFarm 1.1 Globus 4 GRAM Globus 4 GSI Globus 4 WSRF (Also Fujitsu WSRF for C binding) IMPI (as GridMPI) Linux (RH8/9 etc.), Solaris (8/9/10), AIX, … MyProxy OpenMPI Tomcat (and associated WS/XML standards) Unicore WF (as NAREGI WFML) VOMS Necessary for Longevity and Vendor Buy-In Metric of WP Evaluation Implement “Specs” early even if nascent if seemingly viable
15
Resource and Execution Management –GGF OGSA-EMS based architecture Automatic resource brokering and job scheduling Reservation based co-allocation –VO and local policy based access control –Network traffic measurement and control –GGF JSDL based job description –DMTF-CIM based resource information model –GGF OGSA-RUS based accounting Data Grid (New from FY2005) –WSRF based grid-wide data sharing Release Features
16
User-Level Tools –Web based Portal –Workflow tool w/NAREGI-WFML –Application contents and deployment service –Large-Scale Interactive Grid Visualization Grid Ready Programming Tools & Libraries –Standards compliant GridMPI and GridRPC Bridge Method for Different Type Applications VO support –VOMS based VO user management Production-quality CA, APGrid PMA and more … Release Features continued
17
Computing Resource GridVM Accounting CIM UR/RUS GridVM Resource Info. Reservation, Submission, Query, Control… Client Concrete JSDL Concrete JSDL Workflow Abstract JSDL Resource and Execution Management Overview Super Scheduler Distributed Information Service Network Info. Network Monitoring & Control DAI Resource Query Reservation based Co-Allocation CIM ( release)
18
WFML2BPEL SS NAREGI JM(SS) Java I/F module NAREGI-WP3 WorkFlowTool, PSE, GVS JM-Client Submit Status Delete Cancel EPS CSG IS Cance l StatusSubmit BPEL2WFST CreateActivity(FromBPEL) GetActivityStatus RequestActivityStateChanges CE S OGSA-DAI CIM 2.10 RDB PostgreSQL JSDL MakeReservation CancelReservation globus-url-copy uber-ftp CES S SR R BPEL ( include JSDL) Invoke EPS Invoke SC JSDL NAREGI- WFML NAREGI- WFML JSDL Co-allocationFileTransfer CreateActivity(FromJSDL) GetActivityStatus RequestActivityStateChanges MakeReservation CancelReservation JSDL Delete JSDL NAREGI JM (BPEL Engine) S JSDL Abbreviation SS: Super Scheduler JSDL: Job Submission Description Document JM: Job Manager EPS: Execution Planning Service CSG: Candidate Set Generator RS: Reservation Service IS: Information Service SC: Service Container AGG-SC: Aggregate SC GVM-SC: GridVM SC FTS-SC: File Transfer Service SC BES: Basic Execution Service I/F CES: Co-allocation Execution Service I/F (BES+) CIM: Common Information Model GNIS: Grid Network Information Service SelectResource FromJSDL GenerateCandidate Set Generate SQL Query From JSDL AGG-SC / RS SC CES PBS, LoadLeveler S GRAM4 specific SC(GVM-SC) SC CES PBS, LoadLeveler S GRAM4 specific SC(GVM-SC) R R WS-GRAM GridVM WS-GRAM GridVM FTS-SC Fork/Exec is-query Fork/Exec globusrun-ws Fork/Exec GFarm server GNIS GetGroups- OfNodes SQL to allocate and execute jobs on multiple sites simultaneously. Co-allocation and Reservation
19
2: Interoperation with EGEE (gLite) ClassAd JSDL NAREGI-WF generation Job Submission/Control Status propagation Certification ? NAREGI-IS [CIM] IS bridge gLite-IS [GLUE] GLUE CIM NAREGI-SS [JSDL] NAREGI-SC Interop-SC gLite-CE NAREGI GridVM JSDL RSL Job Submission/Control Status propagation GT2 GRAM NAREGI Portal [JSDL] EGEE user NAREGI user gLite-WMS [JDL] Bridge-CE [JDL] SS client lib Condor SS-GAHP
20
RISM Solvent distribution FMO Electronic structure Mediator Solvent charge distribution is transformed from regular to irregular meshes Mulliken charge is transferred for partial charge of solute molecules Electronic structure of Nano-scale molecules in solvent is calculated self-consistent by exchanging solvent charge distribution and partial charge of solute molecules. *Original RISM and FMO codes are developed by Institute of Molecular Science and National Institute of Advanced Industrial Science and Technology, respectively. RISM-FMO Coupled Simulation
21
FY2003FY2004FY2005FY2006FY2007 UNICORE-based R&D Framework OGSA/WSRF-based R&D Framework Utilization of NAREGI NII-IMS Testbed Utilization of NAREGI-Wide Area Testbed Prototyping NAREGI Middleware Components Development and Integration of αrelease Development and Integration of βrelease Evaluation on NAREGI Wide-area Testbed Development of OGSA-based Middleware Verification & Evaluation Of Ver. 1 Apply Component Technologies to Nano Apps and Evaluation Evaluation of α release in NII-IMS Testbed Evaluation of βRelease By IMS and other Collaborating Institutes Deployment of β αRelease (private) βRelease (public) Version 1.0 Release Midpoint Evaluation NAREGI Grid Middleware Roadmap Series of βReleases (public)
22
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.