Australian Partnership for Advanced Computing “providing advanced computing and grid infrastructure for eResearch” Rhys Francis Manager, APAC grid program Partners: Australian Centre for Advanced Computing and Communications (ac3) in NSW The Australian National University (ANU) Commonwealth Scientific and Industrial Research Organisation (CSIRO) Interactive Virtual Environments Centre (iVEC) in WA Queensland Parallel Supercomputing Foundation (QPSF) South Australian Partnership for Advanced Computing (SAPAC) The University of Tasmania (TPAC) Victorian Partnership for Advanced Computing (VPAC)
National Facility Program –a world-class advanced computing service –currently 232 projects and 659 users (27 universities) –major upgrade in capability (1650 processor Altix 3700 system) APAC Grid Program –integrate the National Facility and Partner Facilities –allow users easier access to the facilities –provide an infrastructure for Australian eResearch Education, Outreach and Training Program –increase skills to use advanced computing and grid systems –courseware project –outreach activities –national and international activities APAC Programs
Engineering Taskforce Implementation Taskforce Project Leader Research Leader Steering Committee Activities APAC Grid Development APAC Grid Operation Research Activities Development Activities Project Leader Activities 140 people >50 full time equivs $8M pa in people Plus compute/data resources
Projects Grid Infrastructure Computing Infrastructure Globus middleware certificate authority system monitoring and management (grid operation centre) Information Infrastructure resource broker (SRB) metadata management support (Intellectual Property control) resource discovery User Interfaces and Visualisation Infrastructure portals to application software workflow engines visualisation tools Grid Applications Astronomy High-Energy Physics Bioinformatics Computational Chemistry Geosciences Earth Systems Science
Organisation Chart
Experimental High Energy Physics Belle Physics Collaboration –K.E.K. B-factory detector Tsukuba, Japan –Matter/Anti-matter investigations –45 Institutions, 400 users worldwide 10 TB data currently –Australian grid for KEK-B data testbed demonstrations data grid centred on APAC National Facility Atlas Experiment –Large Hadron Collider (LHC) at CERN 3.5 PB data per year (now 15 PB pa) operational in 2007 –Installing LCG (GridPP), will follow EGEE
Belle Experiment Simulated collisions or events –used to predict what we’ll see (features of data) –essential to support design of systems –essential for analysis 2 million lines of code
Belle simulations Computationally intensive –simulate beam particle collisions, interactions, decays –all components and materials : 10x10x20 m, 100 µm accuracy –tracking and energy deposition through all components –all electronics effects (signal shapes, thresholds, noise, cross-talk) –data acquisition system (DAQ) Need 3 times as many simulations as real events to reduce statistical fluctuations
Belle status Apparatus at KEK in Japan Simulation work done world wide Shared using an SRB federation: KEK, ANU, VPAC, Korea, Taiwan, Krakow, Beijing…(led by Australia!) Previous research work used script based workflow control, project is currently evaluating LCG middleware for workflow management Testing in progress: LCG job management, APAC grid job execution (2 sites), APAC grid SRB data management (2 sites) with data flow using international SRB federations Limitation is international networking
Earth Systems Science Workflow Access to Data Products –Inter-governmental Panel Climate Change scenarios of future climate (3TB) –Ocean Colour Products of Australasian and Antarctic region (10TB) –1/8 degree ocean simulations (4TB) –Weather research products (4TB) –Earth Systems Simulations –Terrestrial Land Surface Data Grid Services –Globus based version of OPeNDAP (UCAR/NCAR/URI) –Server side analysis tools for data sets: GRADS, NOMADS –Client side visualisation from on-line servers –THREDDS (catalogues of OPeNDAP repositories)
Workflow Vision Discovery Visualisation Digital Library OPeNDAP APAC NF VPAC AC3 SAPAC IVEC Job/Data Management Analysis Toolkit Crawler
Discovery Portlet Visualisation Portlet Get Data Portlet Analysis Toolkit Portlet Web Map Service Web Processing Service Web Coverage Service OAI Library API (Java) Live Access Server (LAS) OPeNDAP Server Processing App. Meta data Crawler Digital Repository Gridsphere Portal Web Services Application Layer Data Layer Hardware Layer Compute Engine Config Meta data Workflow Components
APAC NF (Canberra) International IPCC model results (10-50Tb) TPAC 1/8 degree ocean simulations (7Tb) Met Bureau Research Centre (Melbourne) Near real-time LAPS analyses products (<1Gb) Sea- and sub-surface temperature products TPAC & ACE CRC (Hobart) NCEP2 (150Gb), WOCE3 Global (90Gb) Antarctic AWS (150Gb), Climate modelling (4Gb) Sea-ice simulations, CSIRO Marine Research (Hobart) Ocean colour products & climatologies (1Tb) Satellite altimetry data (<1Gb) Sea-surface temperature product CSIRO HPSC (Melbourne) IPCC CSIRO Mk3 model results (6Tb) AC3 Facility (Sydney) Land surface datasets OPeNDAP Services
Data User SRB MCAT SSA SRB get( ) AVD Registry SIASSA query List of matches query List of matches Australian Virtual Observatory
APAC Grid Geoscience Conceptual models Databases Modeling codes Mesh generators Visualization packages People High Performance Computers Mass Storage Facilities
Mantle Convection Observational Databases –access via SEE Grid Information Services standards Earthbytes 4D Data Portal –Allows users to track observations through geological time and use them as model boundary conditions and/or to validate process simulations. Mantle Convection –solved via Snark on HPC resources Modeling Archive –stores the problem description so they can be mined and audited Trial application provided by: D. Müller (Univ. of Sydney) L. Moresi (Monash Univ./MC2/VPAC)
Workflows and services Resource Registry Service Registry Results Archive Data Management Service HPC Repository Login Job Monitor Run Simulation Edit Problem Description Local Repository Archive Search Geology S.A Geology W.A Rock Prop. N.S.W Rock Prop. W.A AAA Job Management Service Snark Service EarthBytes Service User
Status update APAC National Grid
Key steps Implementation of our own CA Adoption of VDT middleware packaging Agreement to a GT2 base for 2005, GT4 in 2006 Agreement on portal implementation technology Adoption of federated SRB as base for shared data Development of gateways for site grid architecture Support for inclusion of ‘associated’ systems Implementation of VOMS/VOMRS Development of user and provider policies
VDT components DOE and LCG CA Certificates v4 (includes LCG 0.25 CAs) GriPhyN Virtual Data System (containing Chimera and Pegasus) Condor/Condor-G VDT Condor configuration script Fault Tolerant Shell (ftsh) Globus Toolkit patches VDT Globus configuration script GLUE Schema 1.1, extended version 1 GLUE Information Providers CVS version 1.79, 4-April-2004 EDG Make Gridmap EDG CRL Update GSI-Enabled OpenSSH 3.4 Java SDK 1.4.2_06 KX Monalisa MyProxy 1.11 PyGlobus UberFTP 1.3 RLS ClassAds Netlogger 2.2 Apache HTTPD, v Apache Tomcat, v Apache Tomcat, v Clarens, v0.7.2 ClassAds, v0.9.7 Condor/Condor-G, v VDT Condor configuration script DOE and LCG CA Certificates, vv4 (includes LCG 0.25 CAs) DRM, v1.2.9 EDG CRL Update, v1.2.5 EDG Make Gridmap, v2.1.0 Fault Tolerant Shell (ftsh), v Generic Information Provider, v1.2 ( ) gLite CE Monitor, v1.0.2 Globus Toolkit, pre web-services, v patches Globus Toolkit, web-services, v4.0.1 GLUE Schema, v1.2 draft 7 Grid User Management System (GUMS), v1.1.0 GSI-Enabled OpenSSH, v3.5 Java SDK, v1.4.2_08 jClarens, v0.6.0 jClarens Web Service Registry, v0.6.0 JobMon, v0.2 KX509, v Monalisa, v MyProxy, v2.2 MySQL, v Nest, v0.9.7-pre1 Netlogger, v3.2.4 PPDG Cert Scripts, v1.6 PRIMA Authorization Module, v0.3 PyGlobus, vgt RLS, v SRM Tester, v1.0 UberFTP, v1.15 Virtual Data System, v1.4.1 VOMS, v1.6.7 VOMS Admin (client 1.0.7, interface 1.0.2, server 1.1.2), v1.1.0-r0
Our most important design decision V-LAN Gateway Server Cluster Datastore Cluster Gateway Server Cluster Datastore Cluster Installing Gateway Servers at all grid sites, using VM technology to support multiple grid stacks Gateways will support, GT2, GT4, LCG/EGEE, Data grid (SRB etc), Production Portals, development portals, experimental grid stacks High bandwidth, dedicated private networking between grid sites
Gateway Systems Support the basic operation of the APAC National Grid and translate grid protocols into site specific actions –limit the number of systems that need grid components installed and managed –enhance security as many grid protocols and associated ports only need to be open between the gateways –in many cases only the local gateways need to interact with site systems –support roll-out and control of production grid configuration –support production and development grids and local experimentation using Virtual Machine implementation
Grid pulse – every 30 minutes Gateway Down Gateway Up Gateway Up Gateway Up Gateway Up Gateway Down Gateway Up Gateway Up Gateway Up Gateway Down Gateway Down Gateway Up Gateway Up Gateway Up Gateway Up Gateway Up Gateway Up Gateway Down Gateway Up NG1 – globus toolkit 2 services ANU iVEC VPAC NG2 – globus toolkit 4 services iVEC SAPAC (down) VPAC NGDATA – SRB & GridFTP ANU iVEC VPAC (down) NGLCG – special physics stack VPAC NGPORTAL – apache/tomcat iVEC VPAC
A National Grid GrangeNet Backbone Centie/GrangeNet Link AARNet Links Townsville QPSF Brisbane Canberra ANU Melbourne VPAC CSIRO Sydney ac3 Perth IVEC CSIRO Adelaide SAPAC Hobart TPAC CSIRO processors +3PB near line storage
Mass stores (15TB cache, 200+ TB holdings, 3PB capacity) ANU TB CSIRO TB plus several TB stores Compute Systems (aggregate processors) Altix 1, GHz Itanium-II3.6 TB120 TB disk NEC 168 SX-6 vector cpus1.8 TB 22 TB disk IBM 160 Power 5 cpus432 GB 2 x Altix GHz Itanium-II 160 GB 2 x Altix GHz Itanium-II120 GBNUMA Altix GHz Itanium-II180 GB5TB disk, NUMA 374 x 3.06 GHz Xeon374 GBGigabit Ethernet 258 x 2.4 GHz Xeon258 GB Myrinet 188 x 2.8 GHz Xeon160 GB Myrinet 168 x 3.2 GHz Xeon 224 GBGigE, 28 with infiniband 152 x 2.66 GHz P4153 GB16TB disk, GigE Significant Resource Base
ResourcesUsers DataComputeMonitoringConstraintsActivitiesInterfaces Grid Staging and Execution Progress Monitoring Authorisation (Policy & Enforcement) Global resource allocation and scheduling Command line access to Resources Resource Discovery VO Mgmt (Rights, Shares, Delegations) Workflow Processing (Job execution) Portals, workflow Access Services Grid Interfaces Resource Availability Accounting Application development Portal for Grid Mgmt (GOC) Data Movement Queues Resource Registration Configuration Mgmt Data and Metadata Mgmt (Curation) AccessGrid interaction Files, DBs, Streams Binaries, Libraries, Licenses History, Auditing Authentication (Identity Mgmt) Reporting, analysis and summarisation 3 rd party GUIs for applications and activities Operating Systems and Hardware Firewalls, NATs and Physical Networks Security: agreements, obligations, standards, installation, configuration, verification Functional decomposition
ResourcesUsers DataComputeMonitoringConstraintsActivitiesInterfaces Grid Staging and Execution Progress Monitoring Authorisation (Policy & Enforcement) Global resource allocation and scheduling Command line access to Resources Resource Discovery VO Mgmt (Rights, Shares, Delegations) Workflow Processing (Job execution) Portals, workflow Access Services Grid Interfaces Resource Availability Accounting Application development Portal for Grid Mgmt (GOC) Data Movement Queues Resource Registration Configuration Mgmt Data and Metadata Mgmt (Curation) AccessGrid based interaction Files, DBs, Streams Binaries, Libraries, Licenses History, Auditing Authentication (Identity Mgmt) Reporting, analysis and summarisation 3 rd party GUIs for applications and activities Operating Systems and Hardware Firewalls, NATs and Physical Networks Security: agreements, obligations, standards, installation, configuration, verification
APAC National Grid Basic Services –single ‘sign-on’ to the facilities –portals to the computing and data systems –access to software on the most appropriate system –resource discovery and monitoring VPAC QPSF TPAC IVEC APAC NATIONAL FACILITY ANU CSIRO SAPAC AC3 one virtual system of computational facilities