Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Grid a brief briefing Carole Goble Information Management Group.

Similar presentations


Presentation on theme: "The Grid a brief briefing Carole Goble Information Management Group."— Presentation transcript:

1 The Grid a brief briefing Carole Goble Information Management Group

2 Roadmap What is the Grid? Example projects Relationship to the Semantic Web Example architectures The international programme

3 Take Home The Grid is an international activity The Grid has attracted high profile industrial and government support and funding The Information/Knowledge Grid is in many ways indistinguishable from the Semantic Web The Grid Community’s understanding of generic and theoretical issues for the IK Grid is immature and hackery.

4 So what’s the Grid? Isn’t it just High Performance Computing for High Energy Physicists?

5 Why Grids? Large-scale science and engineering are done through the interaction of people, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed. The overall motivation for “Grids” is to facilitate the routine interactions of these resources in order to support large-scale science and engineering. From Bill Johnston 27 July 01

6 CERN: Large Hadron Collider (LHC) Raw Data: 1 Petabyte / sec Filtered 100Mbyte / sec = 1 Petabyte / year = 1 Million CD ROMs Raw Data: 1 Petabyte / sec Filtered 100Mbyte / sec = 1 Petabyte / year = 1 Million CD ROMs CMS Detector

7 Why Grids? A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour; A biologist combines a range of diverse and distributed resources (databases, tools, instruments) to answer complex questions; 1,000 physicists worldwide pool resources for petaop analyses of petabytes of data Civil engineers collaborate to design, execute, & analyze shake table experiments From Steve Tuecke 12 Oct. 01

8 Why Grids? (contd.) Climate scientists visualize, annotate, & analyze terabyte simulation datasets An emergency response team couples real time data, weather model, population data A multidisciplinary analysis in aerospace couples code and data in four companies A home user invokes architectural design functions at an application service provider From Steve Tuecke 12 Oct. 01

9 Why Grids? (contd.) An application service provider purchases cycles from compute cycle providers Scientists working for a multinational soap company design a new product A community group pools members’ PCs to analyze alternative designs for a local road From Steve Tuecke 12 Oct. 01

10 The Grid Vision “…flexible, secure, coordinated resource-sharing among dynamic collections of individuals, institutions, and resources–what we refer to as virtual organisations” “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” Foster, Kesselman and Tuecke, 2001

11 The Grid Problem Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of… central location, central control, omniscience, existing trust relationships. From Steve Tuecke 12 Oct. 01

12 Large scale Multi-disciplinary simulation Decision support and optimization Virtual prototyping Collaborative analysis and visualization Large scale distributed data management Large scale distributed computation High speed communications Dynamic collaborative virtual organisations Visualisation DataComputation stretch

13 interrogation workflows results Collaboration GridTechnology Grid What is it? Where is it? How to get it? When did it? happen? Who knows it? Why does it? What are you doing? Governance & Control

14 Online Access to Scientific Instruments DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago tomographic reconstruction real-time collection wide-area dissemination Advanced Photon Source archival storage From Steve Tuecke 12 Oct. 01 desktop & VR clients with shared controls

15 Supernova Cosmology

16 GRID Software Components An efficient data transfer mechanism A resource broker An interface for coupled applications An interface for "computing-on-demand“ An interface for interactive use Distributed Simulation Codes for e-Science Testbed Biomolecular simulations Weather prediction Coupled CAE simulations ASP-type services Real-time data processing …

17 Network for Earthquake Engineering Simulation NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other On-demand access to experiments, data streams, computing, archives, collaboration NEESgrid: Argonne, Michigan, NCSA, UIUC, USC From Steve Tuecke 12 Oct. 01

18 Home Computers Evaluate AIDS Drugs Community = 1000s of home computer users Philanthropic computing vendor (Entropia) Research group (Scripps) Common goal= advance AIDS research From Steve Tuecke 12 Oct. 01

19 myGrid Personalised extensible environments for data-intensive in silico experiments in biology Straightforward discovery, interoperation, sharing Workflow oriented provenance propagating change Individual creativity & collaborative working personalisation

20 my Grid resources Question: Nucleotide binding protein in mouse Answer: P12345 in Swiss-Prot is an ATPase Terri Attwood is an expert on this Jackson Labs have a database but you need to register A paper has just been published in Proteins by the Stanford lab on this.

21 GeoDISE – engineering design optimisation Access to knowledge repository Access to optimisation and search tools Industrial analysis codes Distributed computing and data resources in design optimisation Applied to industrial problems - large scale CFD codes Demonstrate scalability across distributed computational and data resources and teams of designers

22 GeoDISE Modern engineering firms are global and distributed “Not just a problem of using HPC” CAD and analysis tools, user interfaces, PSEs, and Visualization Optimisation methods Data archives (e.g. design/ system usage) Knowledge repositories & knowledge capture and reuse tools. Management of distributed compute and data resources How to … ? … improve design environments … cope with legacy code / systems … integrate large-scale systems in a flexible way … produce optimized designs … archive and re-use design history … capture and re-use knowledge

23 Virtual Sky http://virtualsky.org/ http://virtualsky.org/

24 Broader Context “Grid Computing” has much in common with major industrial thrusts Business-to-business, Peer-to-peer, Application Service Providers, Storage Service Providers, Distributed Computing, Internet Computing… Sharing issues not adequately addressed by existing technologies Complicated requirements: “run program X at site Y subject to community policy P, providing access to data at Z according to policy Q” High performance: unique demands of advanced & high-performance systems From Steve Tuecke 12 Oct. 01

25 Elements of the Problem Resource sharing Computers, storage, sensors, networks, … Sharing always conditional: issues of trust, policy, negotiation, payment, … Coordinated problem solving Beyond client-server: distributed data analysis, computation, collaboration, … Dynamic, multi-institutional virtual organisations Community overlays on classic org structures Large or small, static or dynamic Problem Solving Environments From Steve Tuecke 12 Oct. 01

26 Broader Context “Grid Computing” has much in common with major industrial thrusts Business-to-business, Peer-to-peer, Application Service Providers, Storage Service Providers, Distributed Computing, Internet Computing… Sharing issues not adequately addressed by existing technologies Complicated requirements: “run program X at site Y subject to community policy P, providing access to data at Z according to policy Q” High performance: unique demands of advanced & high-performance systems From Steve Tuecke 12 Oct. 01

27 The Globus Project™ Close collaboration with real Grid projects in science and industry Development and promotion of standard Grid protocols to enable interoperability and shared infrastructure Development and promotion of standard Grid software APIs and SDKs to enable portability and code sharing The Globus Toolkit™: Open source, reference software base for building grid infrastructure and applications Global Grid Forum: Development of standard protocols and APIs for Grid computing From Steve Tuecke 12 Oct. 01

28 Doesn’t Globus solve it all? Globus ToolKit is focused on the Data/Computational layer No database connectivity Little brokering, and static not dynamic Weak metadata management, workflow Trashes firewalls No, not everything is JCL, FTP and LDAP Distributed computation dominates etc…etc…

29 Is it done? NASA Power Grid is the only one really working http://www.ipg.nasa.gov Linking similar supercomputers owned by the same organisation Computation-focused High Energy Physics is atypical

30 Example Application Projects AstroGrid: astronomy, etc. (UK) Earth Systems Grid: environment (US DOE) EU DataGrid: physics, environment, etc. (EU) EuroGrid: various (EU) Fusion Collaboratory (US DOE) GridLab: astrophysics, etc. (EU) Grid Physics Network (US NSF) MetaNEOS: numerical optimization (US NSF) NEESgrid: civil engineering (US NSF) RealityGrid (UK) DAME (UK) Comb-e-Chem (UK) GeoDISE (UK) iVDGL, StarLight (US/EU) DiscoveryNet (UK) myGrid (UK) GridPP (UK) Particle Physics Data Grid (US DOE) etc…

31 “ … Since the early days of mankind the primary motivation for the establishment of communities has been the idea that by being part of an organized group the capabilities of an individual are improved. The great progress in the area of inter-computer communication led to the development of means by which stand-alone processing sub- systems can be integrated into multi- computer ‘communities’. … “ Miron Livny, “ Study of Load Balancing Algorithms for Decentralized Distributed Processing Systems.”, Ph.D thesis, July 1983.

32 Every Community needs a Matchmaker! Condor uses Matchmakers to build Computing Communities out of Commodity Components.. someone has to bring together community members who have requests for goods and services with members who offer them. Both sides are looking for each other Both sides have constraints Both sides have preferences

33 Lets look at some Architectures

34 A Desiderata (adapted from Globus) Software development toolkits e.g. Globus toolkit Standard protocols, services & APIs A modular “bag of technologies” Enable incremental development of grid-enabled tools and applications Reference implementations Learn through deployment and applications Open source Diverse global services Core services Local OS A p p l i c a t i o n s

35

36 Globus Layered Grid Architecture CERN - High Energy Physics Application Fabric “Controlling things locally”: Access to, & control of, resources Connectivity “Talking to things”: communication (Internet protocols) & security Resource “Sharing single resources”: negotiating access, controlling use Collective “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Internet Transport Application Link Internet Protocol Architecture From Steve Tuecke 12 Oct. 01

37 Keith Jeffery

38 Scientific Problems Processes Knowledge Information Jobs and Data Data Raw Resources Knowledge / capability Semantics / process Data / applications Value chain Interoperability, higher level ontologies, reasoning, discovery, Reasoning services, Discovery services FulfillmentGrid "Reproduced by permission of the IT Innovation Centre, University of Southampton." http://www.it-innovation.soton.ac.uk Three Layer Grid Abstraction

39 Grid Information Service Uniform Resource Access BrokeringGlobal Queuing Global Event Services Co- Scheduling Data Cataloguing Uniform Data Access Collaboration and Remote Instrument Services Network Cache Communication Services Authentication Authorization Security Services AuditingFault Management Monitoring Grid Common Services: Standardized Services and Resources Interfaces Applications: Simulations, Data Analysis, etc. Toolkits: Visualization, Data Publication/Subscription, etc. Distributed Resources Discipline Specific Portals and Scientific Workflow Management Systems Condor pools network caches tertiary storage national user facilities clusters national supercomputer facilities High-speed Networks and Communications Services = Globus services Architecture of a Grid

40 Architecture of a Grid – upper layers Problem Solving Environments Knowledge based query Tools to implement the human interfaces, e.g. SciRun, ECCE, WebFlow,..... Mechanisms to express, organize, and manage the workflow of problem  solutions (“frameworks”) Access control applicatio n codes visualization toolkits collaboratio n toolkits instrument managemen t toolkits data publish and subscribe toolkits Applications and Supporting Tools Grid enabled libraries (security, communication services, data access, global event management, etc.) Globus MPI CORBACondor -G Java/ Jini DCOM Application Development and Execution Support Distributed Resources Grid Common Services

41 “Knowledge Based” Data Grids Attributes Semantics Knowledge Information Data Ingest Services ManagementAccess Services (Model-based Access) (Data Handling System - SRB) MCAT/HDF Grids XML DTD SDLIP XTM DTD Rules - KQL Information Repository Attribute- based Query Feature-based Query Knowledge or Topic-Based Query / Browse Knowledge Repository for Rules Relationships Between Concepts Fields Containers Folders Storage (Replicas, Persistent IDs) National Partnership for Advanced Computational Infrastructure

42 Compute ResourcesCatalogsData Archives Information Discovery Metadata delivery Data Discovery Data Delivery Catalog Mediator Data mediator 1. Portals and Workbenches Bulk Data Analysis Catalog Analysis Metadata View Data View 4.Grid Security Caching Replication Backup Scheduling 2.Knowledge & Resource Management Standard Metadata format, Data model, Wire format Catalog/Image Specific Access Standard APIs and Protocols Concept space 3. 5. 6. 7. Derived Collections Astronomy Sky Survey Data Grid

43 referenced items & collections referenced items & collections Referenced Items & Collections NSDL Services NSDL Services Other NSDL Services CI Services visualization... CI Services discussion CI Services personalization CI Services topic-map registry CI Services query transform Core Services: annotation Core Collection- Building Services persistent storage Core Collection- Building Services metadata harvesting Core Services: metadata normalizing Portals & Clients Portals & Clients Portals & Clients Usage Enhancement Collection Building User Interfaces NSDL Collections NSDL Collections NSDL Collections Metadata & data access-based services Core NSDL Bus Meta-data delivery Data delivery Query Global Ids Security Network Virtual Collections & Mediators Information about collections Delivery Presentation Aggregation - Channels NSDL

44 ERA Concept model

45

46 The De Roure Triangle Agents Web Services Semantic Web Grid Computing e- Business e- Science ?

47 California Institute of Technology Roy Williams Paul Messina

48 So what is going on? UK: http://www.escience-grid.org.uk/http://www.escience-grid.org.uk/ International: http://www.gridforum.org/http://www.gridforum.org/

49 £80m Collaborative projects E-Science Steering Committee DG Research Councils Director Director’s Management Role Director’s Awareness and Co-ordination Role Generic Challenges EPSRC (£15m), DTI (£15m) Industrial Collaboration (£40m) Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m) Grid TAG From Tony Hey 27 July 01 E-Science Programme

50 Key Elements of UK Grid Development Plan Development of Generic Grid Middleware Network of Grid Core Programme e-Science Centres National Centre http://www.nesc.ac.uk/http://www.nesc.ac.uk/ Regional Centres http://www.esnw.ac.uk/ Grid IRC Grand Challenge Project Support for e-Science Pilots Short term funding for e-Science demonstrators Grid Network Team * Grid Engineering Team Grid Support Centre * Task Forces Adapted from Tony Hey 27 July 01

51 Take Home The Grid is an international activity The Grid has attracted high profile industrial and government support and funding The Information/Knowledge Grid is in many ways indistinguishable from the Semantic Web The Grid Community’s understanding of generic and theoretical issues for the IK Grid is immature and hackery.

52 Spares

53 Supernova Cosmology

54 Home Computers Evaluate AIDS Drugs Community = 1000s of home computer users Philanthropic computing vendor (Entropia) Research group (Scripps) Common goal= advance AIDS research From Steve Tuecke 12 Oct. 01

55 Grid viewpoints interrogation workflows results Access Grid New Biology Technology Grid private public What is it? Where is it? How to get it? When did it happen? Who knows it? Why does it? What are you doing? Governance & Control

56 Particle Physics and Astronomy Research Council (PPARC) GridPP (http://www.gridpp.ac.uk/) to develop the Grid technologies required to meet the LHC computing challenge ASTROGRID (http://www.astrogrid.ac.uk/) a ~£4M project aimed at building a data-grid for UK astronomy, which will form the UK contribution to a global Virtual Observatory

57 Infrastructure Deployments Institutional Grid deployments: deploying services and network infrastructure DISCOM, IPG, TeraGrid, DOE Science Grid, DOD Grid, NEESgrid, ASCI (Netherlands) International deployments: supporting international experiments and science iVDGL, StarLight Support centers U.K. Grid Center U.S. GRIDS Center


Download ppt "The Grid a brief briefing Carole Goble Information Management Group."

Similar presentations


Ads by Google