CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEON Systems Report Karan Bhatia San Diego Supercomputer Center Friday Aug
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Year 2 Goals & Accomplishments Goals: –Procure and deploy physical resources for partners –Provide infrastructure for management of systems including mechanisms for collaboration and communication –Provide basic production services for data –Provide basic grid services for applications Physical Layer –Purchased and Deployed hardware Systems Layer –Developed management software and collaborations with partner sites –Developed Geon Software Stack Grid Layer –Beginning to build out Services Portal & Security done (end of aug) Naming & Discovery, Data Management & Replication, and mediation –Basic research still being done Applications Layer –Some apps ready, used as templates for how to build apps in Geon
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEONgrid Development Physical DeploymentHardware, clusters, networks Systems Layer OS & Software layer Grid LayerGrid System ServicesApplicationsEnd-user Apps & Services
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Physical Deployment Vendors: –Dell (27 prod systems + 9 devel systems) Poweredge 2650-based systems Dual 2.8 GHz Pentium processors 2 GB RAM –ProMirco (3 systems) Dual pentium 4 TB + RAID –HP Cluster donation (9 systems) Rx2600-based dual 1.4 GHz 15 partner sites –1 PoP node –Optional small cluster (4 system) –Optional data node Misc equipment as needed –Switches, racks, etc. Physical Deployment Systems Layer Grid Layer Applications
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Deployment Architecture Similar to BIRN Architecture –Each site runs a PoP –Optional cluster and data nodes Users access resources through PoP –PoP provides point of entry –PoP provides access to global services Developers add services & data hosted on GEON resources –Web services/Grid services
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEONgrid Current Status Physical Resources: - all pops deployed, 3 data nodes deployed, clusters all up - HP cluster delivered Software Stack: - mix of GeonRocks 0.1 (redhat 9-based), redhat 9
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Systems Layer Unified Software Stack definition –Custom GEON Roll Web/Grid Services software Stack Common GEON Applications and Services Focus on scalable systems management –Modified Rocks for wide-area cluster management (See [Sacerdoti94]) Collaborations with partner sites –Identified appropriate contacts Physical Deployment Systems Layer Grid Layer Applications
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEON Software Roll Development –OGSI 1.0 (from GT3.0.2) --> GT3.2 (packaged by NMI) –Web Services (jakarta, axis, ant, etc) –GridSphere 2.02 Portal Framework Database –IBM DB2 (packaged for Protein Data Bank) –Postgres --> PostGIS –SRB Client software –OPeNDAP roll (UNAVCO) Security –DB2 with GSI Plugin (developed by Teragrid) –Tripwire System Monitoring –Grid Monitor –INCA Testing and Monitoring framework (Teragrid) With GRASP benchmarks –Network Weather Service (NWS) GEON Software Stack Version 1.0 to be deployed starting Sept 1, 2004!
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Wide-Area Cluster Management Frederico Sacerdoti, Sandeep Chandra, and Karan Bhatia, “Grid Systems Deployment and Management using Rocks”, Cluster 2004, Sept , San Diego, California
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Additional Infrastructure Production/Development servers –8 development servers used for various activities –Main Production Portal –Blogs, forums, RSS –Production application services CVS services –cvs.geongrid.org Geon Certificate Authority –ca.geongrid.org
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Grid Layer Goals –Evaluate core software infrastructure CAS, Handle.net, RLS (Replica Location Service), VOMS (Virtual Organization Mgmt),Firefish, MCS (Metadata Catalog Service), SRB, CSF (Community Scheduling Framework). –Integrate or build as necessary 1.Portal Infrastructure 2.Security Infrastructure 3.Naming and Discovery Infrastructure 4.Data Management and Replication 5.Generic Mediation Physical Deployment Systems Layer Grid Layer Applications
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES 1. Portal Infrastructure GridSphere Portal Framework –Developed by GridLab (Jason Novatny, and others) Albert Einstein Institute, Berlin, Germany –Java/JSP Portlet Container JSR 168 support, WSRP and JSF coming –Supports Collaboration (standard portlet API) Personalization (eg. my.yahoo.com) Grid Services (GSI support) Web Services Other Frameworks –Open Grid Computing Environments (OGCE) Apache JetSpeed based --> Sakai
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES 2. Security Infrastructure GSI Based –Collaboration with Telescience & BIRN –GEON certificate authority: ca.geongrid.org SDSC CACL system –Roll-based access control using Globus Community Authorization System (CAS) geonAdmin, geonPI, geonUser, public –Portal Integration Account requests, certificate management
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES 3. Naming and Discovery Naming –All service instances, datasets and applications –Two level naming scheme to support replication and versioning –GeoID similar to LSID (Life Sciences ID) –Globally Unique and Resolvable Resolution –GeoID --> usable reference (eg. WSDL) –Handle system (CNRI) Discovery –Discover resources in heterogeneous metadata repositories MCAT, MCS, Geography Network (ESRI), OPeNDAP –Firefish (LBL)
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES 4. Data Management & Replication Installed Services –GridFTP –SRB Server GMR testing –Grid Movement and Replication –With IBM Research OGSA-DAI performance –With GRASP (baru, casanova, snavely)
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES 5. Mediation Services GIS Map Integration –See next talk (Ludaescher)
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Application Layer SYNSEIS –Integration with Teragrid resources –Template for app development Gravity App –Template for data movement Physical Deployment Systems Layer Grid Layer Applications
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Year 2 Summary Physical Layer –Purchased and Deployed hardware Systems Layer –Developed management software and collaborations with partner sites –Developed Geon Software Stack Grid Layer –Beginning to build out Services Portal & Security done (end of aug) Naming & Discovery, Data Management & Replication, and mediation –Basic research still being done Applications Layer –Some apps ready, used as templates for how to build apps in Geon
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Looking Ahead, Year 3 Goals: –Provide core software infrastructure –Integration with outside resources –Encourage software development and integration with partners –More data, more apps, more tools
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Questions?
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Additional Material
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Grid Movement and Replication (with IBM) Data is stored in the postgres database at UTEP on the GEON node. GMR capture service running at UTEP reads and replicates data to the postgres database running at SDSC. GMR apply and monitor service run at SDSC to store data sent by the capture service. OGSA-DAI data access service provides access to database on both UTEP and SDSC nodes. The user application grid service accepts two parameters, –The name of the node you want to access and –An SQL query to get data of interest that will be sent to the grav application. Based on the SQL query an XML query document is generated. Also based on the node, an appropriate service handle is selected. The application grid service invokes the OGSA- DAI grid service handle to access data from the database. The application grid service receives the data, and parses it to extract the relevant data values that are submitted to the grav application.