EPOS e-Infrastructure Keith G Jeffery Natural Environment Research Council (with Jean-Pierre Vilotte and Alberto Michelini)
Structure of Presentation Who? EPOS Rationale and approach e-Infrastructure Basics Related Projects (Torild van Eck) Proposed Approach Conclusion
Rutherford Appleton Laboratory STFC Rutherford Appleton Laboratory
Structure of Presentation Who? EPOS Rationale and approach e-Infrastructure Basics Related Projects (Torild van Eck) Proposed Approach Conclusion
EPOS Rationale
EPOS Concept Massimo Cocco
Structure of Presentation Who? EPOS Rationale and approach e-Infrastructure Basics Related Projects (Torild van Eck) Proposed Approach Conclusion
e-Infrastructure Basics GRIDs Clouds Web 2.0 SOA (Service-Oriented Architecture) Research process – Fourth paradigm (Data Intensive Scientific Discovery) Virtualisation Autonomicity Security, Privacy, Trust Performance Development Maintenance
Internet – 1.5 billion fixed connections – Estimated 4 billion mobile connections Digital Storage – Estimated 280 billion Gigabytes (280 exabytes – 280*10**18) Expect all to grow ~ 1 order of magnitude in 4 years – and accelerating) Users : – Asia 550 million 14% penetration – Europe 350 million 50% penetration – USA 250 million 70% penetration Scalability Trust & security & privacy Manageability Accessability Useability Representativity Last 20 years CPU 10**16 Storage 10**18 Networks 10**4 CONTEXT
The GRIDs Architecture Knowledge Layer Information LayerComputation / Data Layer Data to Knowledge Control The GRIDs Architecture: Layering
Cloud Computing: The Intention Low cost of entry for customers Device and location independence Capacity at reasonable cost (performance, space) Cloud Operator manages resource sharing balancing different peak loads Scalable as demand rises from user Security due to data centralisation and software centralisation Sustainable and environmentally friendly – concentrated power it is a service and the user does not know or care from where, by whom, and how it is provided as long as the SLA (service level agreement) is satisfied
Features: – creativity, communications, secure information sharing, collaboration and functionality Examples: – Social networking, video- sharing, wikis, blogs, folksonomies – Crowdsourcing to gather information / knowledge wisdom? If you don’t know what Web2.0 is your kids do! Web 2.0
Bringing it Together: e-,i-,k-infrastructure server detectors e- i- k- Deduction & induction – human or machine Physical Information Systems server
Middleware – and as SOKUs (Service-Oriented Knowledge Utilities) e- i- k- Lower middleware (hides physical heterogeneity) Upper middleware (hides syntactic heterogeneity) K- upper middleware (resolves semantic heterogeneity) K- lower middleware (presents declared semantics)
Research Process: 4 th Paradigm Observations Contextual metadata Pre-processing Digital preservation Availability Analysis Visualisation Hypothesis Experimentation Observations Contextual metadata Pre-processing Digital preservation Availability Analysis Visualisation Hypothesis Characterisation Simulation/modelling Observations Contextual metadata Pre-processing Digital preservation Availability Analysis Visualisation Observational Science Experimental Science Modelling Science DATA-INTENSIVE SCIENCE (Concept from Jim Gray )
Structure of Presentation Who? EPOS Rationale and approach e-Infrastructure Basics Related Projects (Torild van Eck) Proposed Approach Conclusion
Related Projects EPOS e-infrastructure has to fit in with a)ESFRI Roadmap projects in Environmental Cluster (ENVRI) b)ESFRI roadmap projects in other clusters a)Physical sciences (STM) b)Astronomy & Astrophysics c)Economic/social science d)Arts and humanities e)PRACE (supercomputing) f)EGI/NGIs (Data and Computing Grid) c)European INFRA projects (VERCE, EUDAT…) d)National e-infrastructures for e-Research a)Especially geoscience e)Other international projects (North America, Japan, Pacific Rim, South America…)
EPOS (ESFRI roadmap) NERA Seismology & Seismic Engineering ETHZ + ORFEUS/KNMI (D. Giardini; T. van Eck) EPOS PP Solid Earth ESFRI project INGV (Massimo Cocco) SHARE Hazard ETHZ (D. Giardini) GEM Hazard VERCE Earthquake & Seismology CNRS-IPGP (J-P Vilotte) UEDIN ORFEUS/KNMI EMSC INGV LMU Univ Liverpool BADW-LRZ CINECA Fraunhofer/SCAI INFRA EUDAT Data Infrastructure CSC Finland (Kimmo Koski) EPOS (GFZ, INGV) LifeWatch … CINECA UEDIN … INFRA ENVRI Environment Research Infrastructure LifeWatch (Wouter Los) EPOS (ORFEUS/KNMI) LifeWatch EPOS EMSO EISCAT ICOS STFC UEDIN … INFRA Project proposals 2010 INFRASTR Call 8/9 EPOS IT relevant EC-project projects + proposal (summary) EC projects starting 2010 QUEST (Training network) Computational Seismology LMU (H. Igel) Under negotiation
Structure of Presentation Who? EPOS Rationale and approach e-Infrastructure Basics Related Projects (Torild van Eck) Proposed Approach Conclusion
e-Infrastructure Requirement Data collection, calibration, validation Data cataloguing and indexing Data preservation and curation Information processing – retrieval, analysis, visualisation Hypothesis processing – simulation, modelling, analysis, visualisation Hypothesis generation – data mining Knowledge processing – integration of ICT with human processing – theory processing, user interface, scholarly communication (open access) External interoperation – physical and medical sciences, economic and social sciences, arts and humanities Dissemination – outreach (website plus) Education and training Management and Coordination
Key e-Infrastructure Principles Mobile code: ability to move code to data because data large and costly to transport Virtualisation: user neither knows nor cares where computing done or where data located as long as QoS/SLA met Autonomicity: (self-*) because human management of ICT too expensive / slow
Key e-Infrastructure Challenges Interoperation – Access to heterogeneous distributed data sources – Schema integration – syntactic and semantic Security/privacy/trust – Identification – authentication – authorisation – accounting Performance – Towards exascale processing (simulation/modelling) – Towards exabyte data streams (1.0*10**18)
Steps to achieve EPOS e-Infrastructure1 Define / Agree requirements of end-user (document dynamically) – Including expected future requirements Survey available data/information sources (document dynamically) – Detector systems – Repositories / databases / file systems – Data, documents, metadata, contextual data – Conditions of use – QoS, SLA ( link to governance) Define schema mappings, convertors for interoperation (document dynamically) – Canonical interoperation standard? Note CERIF (Common European Research Information Format)
Steps to achieve EPOS e-Infrastructure2 Survey available computing and computation resources (document dynamically) – Detector systems – Data servers – HPC – Conditions of use – QoS, SLA ( link to governance) Define access and utilisation of ICT (document dynamically) – User identification, authentication, authorisation, accounting (security, privacy) – Available services – Conditions of use – QoS, SLA ( link to governance) Design first-cut ICT architecture (document dynamically) – GEANT network – GRIDs (EGI) middleware – Web services software – Web portal(s) user interface
Structure of Presentation Who? EPOS Rationale and approach e-Infrastructure Basics Related Projects (Torild van Eck) Proposed Approach Conclusion
Conclusion (take-home messages) EPOS is a HUGE CHALLENGE EPOS requires LEADING EDGE ICT to support LEADING EDGE GEOSCIENCE EPOS e-Infrastructure is the ‘GLUE’ EPOS is going to be FUN! EPOS is open to collaboration ********* Prof Keith G Jeffery CEng, CITP, FGS, FBCS, HFICS