DEGREE IST Dissemination and Exploitation of GRids in Earth sciencE Earth Science Communit y RequirementsRequirements T echnologies FeedbackFeedback ApplicationsApplications DisseminationDissemination ES VOs EGEE & various GRIDS 1 Dissemination and exploitation of Grids in Earth Science Ladislav Hluchy,Monique Petitdidier II SAS, CNRS
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Introduction ES community Atmosphere Ocean Biosphere Cryosphere Coupled interdisciplinary processes Sub-surface complex modelling Complex web of sensor Optimal Trajectory First guess Complex data analysis Noisy observations Global, regional, local applications –Alternative use of the data at different time and spatial resolution Large historical distributed archives –Long term data archives to be exploited Near real-time access to data –For processing, value adding and dissemination –For now-casting and alert Models to provide long term trends and forecast –Processing-intensive, data-intensive and complex applications –Data fusion, data assimilation, data mining, modelling … Integration of different data sources
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Application examples Production (3 algorithms) and validation of 7 years of GOME Ozone profiles Rapid Earthquake analysis (mechanism and epicenter) CPUs Modeling seawater intrusion in costal aquifer (SWIMED) Geosciences: Geocluster for Academy and industry Flood of the Danube river-Cascade of models Specfem3D: Benchmark for MPI (2 to 2000CPUs) Specfem3D: seismic wave propagation model Benchmark for MPI (2 to 2000CPUs)
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February DEGREE project DEGREE aims to create a bridge between Earth Science and Grid communities In order to achieve this goal, the consortium members will: –Identify key Earth Science requirements –Disseminate Earth Science application requirements to Grid projects –Evaluate Grid middleware, tools and standards regarding Earth Science requirements –Provide feedbacks to Grid developers
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Report on ES family of applications and their Grid requirements (D1.1) Document contains: –Definition of family of applications –Chapters on monitoring requirements –5 requirements categories: Security, Data and metadata management, Workflow, Application development, Miscellaneous –Contains both listed and described requirements (providing context) D1.1 is a living document: –contains previously stated requirements (DataGrid, EGEE) –contains analysis of 15 application scenarios –After delivery to EU, 5 more scenarios added –(re) analyze requirements –Will be used to monitor progress –Base for deliverable D1.2 Final progress report on ES families of applications and their Grid requirements –78 requirements defined Document and application descriptions available on DEGREE site ES requirements
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February ES key requirements Security –Fine grained on group and role level –enforce data policies Data and metadata management –Database access, both to databases inside and outside Grid –GIS integration Workflow –Complex workflow integration Portals –Web service interfaces to Grid middleware functionality Miscellaneous –Near Real Time: Not only ‘time to completion/delivery’, but also specific start time need –Use of licensed software Requirements might not be ‘unique’ to ES domain, but they are essential for ES
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Test suite introduction Goals: Provide Grid developers clear case for our requirements Monitor on requirements fulfillment of Grid m/w (cover requirements) Consists of: Documentation: –Test suite description –Test procedures –Pass/fail criteria ES contact Real application* –Application software –Data needed –Data schemas –Workflow schemas * Application will be provided ‘as it is’ and as policies permit it. No programming done by DEGREE
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Test suite Organization document (high level document) –Test suite organization –Test suite descriptions –Test coverage –Templates: Test suite template Test reporting template 8 Test suites, all containing: –Application description –Contact information (for application, data, etc.) –Test environment setup –Test cases –Report template All on the DEGREE website: Document content
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February ApplicationFamilySpecial Focus GOME level 2 product processing (Ozone profile) SimpleJob control, data management Centroid Seismic Moment Tensor (Seismology) SimpleJob control, data access GRIMI 2 test suite (Envisat/MIPAS level 2 processor) ComplexMPI need GeoCluster (GTC)ComplexJob control, Portal GOME validation (Ozone profile validation) Complex workflowData policy, data access, Portal Space Physics Interactive Data resources (SPIDR) (meteorology) Complex workflowData access Flood Forecasting Simulation Cascade (FFSC) Complex workflowComplex workflow, Portal PUMA (Atmospheric model)Complex workflowWorkflow, data access Test suites available
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February DB GIS Files DB GIS Work done Survey of the existing data technologies in ES Survey of data usage policies in ES Grid based data management for ES application Test cases Data Management
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Data management - Results Survey of the existing data technologies and usage policies in ES –more than 20 ES application analyzed –focus on: Data provision and data flow Data access Data Policies Authorization, Authentication, Accounting –requirements gathered Test suites –data management test suits developed Grid based data management for ES applications –main grid middleware distributions analyzed according to identified requirements: gLite, ARC/NorduGrid, Globus Toolkit, Unicore 5/6, Naregi,GRIA –gaps identification –analysis of middleware services for specific problem areas (e.g. metadata services)
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Requirements on data management Intensive use of file resources –large files / large number of files Data access –Databases –GIS systems Strong Data Access Policies Authorization, Authentication, Accounting
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Gaps in data management metadata management database access from grid and replication (mirroring) toolkits providing integration of existing functionality as atomic operations QoS mechanisms for data management middleware and grid services for specialized data access protocols and interoperability at data level AMGA – missing support for spatial metadata attributes Some functionality required by ES can be achieved by combining functionality of multiple existing grid toolkits: Replicated data validity/integrity checks as part of replica management operations. File registration + metadata registration as an atomic operation Support for special protocols (e.g. SEED (seismic data)) Support for data format description Data format transformation Interoperability of different ES/environmental risk models Access protocols and transformation services are often ES specific; however well defined generic framework for specialized data access protocols and data transformation services would be advantageous. GRelC – filled gap for database acces from grid, important improvement for ES community Grid middleware for database replication and consistency missing
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Job managements Applications Workflow managers Metaschedulers Tools Monitoring gLiteARC Globus Unicore Middleware LCG Storage Computing elements
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Work done in Job managements Survey of existing technologies –Middlewares: LCG, gLite, Unicore –Tools: workflow managers, metaschedulers, monitoring tools Identifying most critical requirements of ES –Near-realtime execution –Reliability –Distributed job management Identifying of missing technologies Creating test suites
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Gaps in job management Job execution: Near-realtime, reliability Workflow: automatic workflow composition, dynamic workflow Monitoring: expected job start/end time, notification, progress MPI support: Unified and transparent way to submit MPI jobs Licensing management and scheduling Co-scheduling of data gLite supports workflow management through specifications of job dependence in JDL document. Such JDL document must be submitted through a proxy service (WMproxy) which then submits the jobs described in the JDL according to the constraints which are specified for their dependency. What is missing: Real-time workflow modifications User interaction with the workflow management Fine-grained workflow monitoring Visualization of the workflow process EGEE provide support only for short jobs (Short deadline jobs) QoS and fault tolerance are still not guaranteed Partial support in gLite with job perusal Difficult to find correct setting for MPI jobs WMS should have integrated license manager to check license availability
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February ES portals
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February ES Portals for Grid, SOA and e-Collaboration Objective –Portal is in the critical path between User and Middleware –As such a key element to increase Grid uptake & exploitation –Where do we stand now and what needs to be done to ensure adequacy for the future on Middleware (& services) side on Portals side Approach to the task –Establish current state-or-art in ES in other e-Science communities –Analyze the present solutions what are ES Portals Requirements are they met ? what can be improved and how
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February ES Portals for Grid, SOA and e-Collaboration ES Grid Portals Survey –establish current state-of-art baseline –extent of uptake of Grid technologies in ES Portals –methods techniques and middleware used –types of solutions where Grid is used –ES Portals requirements –Deliverable D4.1 (available on the web) Generic Grid Portals Survey –looking outside ES Community –other e-science communities approach to Grid Portals –focus on generic Portals & middleware solutions –analysis of ES Grid Portals requirements vs. middleware gaps –recommendations and inputs for ES Grid Roadmap –Deliverable D4.2 (available on the web)
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February ES Portals for Grid, SOA and e-Collaboration Results –Wide range of ES Portals scenarios exploiting Grid, SOA, eCollaboration –ES Portals Classification Data Dissemination: "Discover, identify and access ES data" Collaborative: "Online collaboration in ES Virtual Communities" Grid-based: "ES data intensive processing" –ES Major Requirements Integration of heterogeneous distributed services (Grid & Geo- services) Support "Gridifcation" in Geo-services and Spatial Data standards Standard "off-the-shelf" tools for integrated Grid Security and User Management Big emphasis on Metadata and Data, its Discovery and Access...
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Gaps in portal functionality Focus must be on ES functionality, grid working as back-end does not have to be visible. Higher level components targeted to ES –Computation submission without descending to grid-job level, maybe even hiding the grid completely Big emphasis on Metadata and Data, its Discovery and Access –Browsing and accessing datasets the ES way Support for Spatial Data: INSPIRE, SDI,... Spatial data searches and OGC services (e.g. WMS) Tools integrated with the Grid Interoperability and interchange –support for standard tools/protocols (ISO19115, OpenDAP, LAS, DODS, NETCDF, integration with OGSA-DAI) –ontology / semantic web (developing the rudiments) Publish, subscribe, notify Search, locate, access and process ES datasets of interest
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Gaps in portal functionality Graphical interfaces for different kinds of ES data, activated by data type –Input specification, e.g. area selection on a map for subset selection –Output visualization and browsing: components for displaying time series of images, image layering –Such components exist, but they use different technologies and APIs Standard "off-the-shelf" tools for integrated Grid Security and User Management –Interfacing Grid security and ES security Portal login models : User management integrated with certificate management –Certificates generated on the portal by the portal, transparently –Loging-in into a portal should be enough to authenticate user There are existing activities and software to remedy this, e.g. PURSE (EarthScienceGrid), GAMA,...
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Gaps in portal functionality Another approach – integrating grid into existing portals: Light-weight grid service interfaces to grid functionality for easy integration and/or mash-ups creation –Like Google Maps but for grids –Would allow easy integration of grid services into (existing) ES portals
DEGREE IST ─ 3 rd EGEE User Forum, Clermont-Ferrand, February Conclusion ES applications have important requirements The existing technologies have been reviewed and missing features have been identified: –The features are essential for ES applications Expected feedbacks: –Improvement of Grid technologies (both in existing and new Grid projects) to provide missing features –Better support for porting ES applications to Grid –Wider adoption of Grid technologies in ES community