Download presentation
Presentation is loading. Please wait.
Published byHilary Hicks Modified over 9 years ago
1
1 The Challenge of Data Integration Data + Grid = Discovery? Prof. Malcolm Atkinson Director www.nesc.ac.uk 22 nd January 2003
2
2 Overview Essentials of e-Science Collaboration Resource Sharing Data Sharing Mutual Dependence Essentials of the Grid Distributed Virtual Machine? Essentials of Data Sharing Database Research did it? New Challenges Data Access & Integration Building Bricks Band Wagon v Research Opportunity Thresholds, Visions and Questions
3
3
4
4 £80m Collaborative projects E-Science Steering Committee DG Research Councils Director Director’s Management Role Director’s Awareness and Co-ordination Role Generic Challenges EPSRC (£15m), DTI (£15m) Industrial Collaboration (£40m) Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m) Grid TAG UK e-Science Programme (1) 2001 - 2003
5
5 UK e-Science From presentation by Tony Hey
6
6 Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Southampton London Belfast Daresbury Lab RAL Hinxton UK e-Science Investment National e- Science Centre HPC(x) Projects > 60 started > 30 proposed + EU Projects
7
7 £80m Collaborative projects E-Science Steering Committee DG Research Councils Director Director’s Management Role Director’s Awareness and Co-ordination Role Generic Challenges EPSRC (£15m), DTI (£15m) Industrial Collaboration (£40m) Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m) Grid TAG UK e-Science Programme (2) 2003 - 2005
8
8
9
9 Collaboration Growing Hard Problems, Multi-disciplinary, Expense Sharing Ideas Thought processes and Stimuli Effort Resources Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure Scientists have done this for Centuries
10
10 Collaboration Growing Data, Policy & Digital Infrastructure Key Sharing Ideas Thought processes and Stimuli Effort Resources Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure Text, digital media, structured, organised & curated data, annotation, computable models, visualisation, shared instruments, shared systems, shared administration, … Nationally & Internationally Distributed, … Routine, Daily, Automated, … That Requires very Significant Investment in Digital Systems and their Support
11
11 Collaboration Growing Digital Communication, Metadata, … Sharing Ideas Thought processes and Stimuli Effort Resources Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure Digital networks, digital work- places, digital instruments, … Metadata, ontologies, standards, shared curated data, shared codes, … Common platforms, shared software, shared training, … The Grid SHOULD make this much easier by providing a common, supported high-level of Software and Organisational infrastructure Authentication, Authorisation, Accounting, Provenance, Policies, … Shared Provision of Platform,
12
12 Interdependence Science has relied on experiment and theory Simulation, Data Mining, Analysis Theory- Greece 400 BC Experiment - Italy 1,500 AD For problems which are: - too large/small - too fast/slow - too complex - too expensive, unethical,... -Testing Understanding Simulation - Europe 1,980 AD
13
13 Interdependence Theory ExperimentComputing Models Data
14
14 Database Growth PDB protein structures
15
15
16
16 Globus Toolkit ® History DARPA, NSF, and DOE begin funding Grid work NASA begins funding Grid work, DOE adds support The Grid: Blueprint for a New Computing Infrastructure published GT 1.0.0 Released Early Application Successes Reported NSF & European Commission Initiate Many New Grid Projects Anatomy of the Grid Paper Released Significant Commercial Interest in Grids Physiology of the Grid Paper Released GT 2.0 Released Does not include downloads from: NMI, UK eScience, EU Datagrid, IBM, Platform, etc.
17
17 Encompassing Vision data archives sensor nets computers software colleagues instruments
18
18 People & Industry Global Grid Forum GGF2260Jul 01 GGF3220Oct 01 GGF4400Feb 02 GGF5900Jul 02 GGF6450Oct 02 GGF7>1000Mar 03 UK All Hands AHM’02350Sep 02 GlobusWorld 1450Jan 03 IBM This week “IBM DRIVES GRID COMPUTING FOR COMMERCIAL BUSINESS WITH TEN NEW GRID OFFERINGS” Targets Financial, Life Sciences Automotive & Aerospace Governments Partners Platform, DataSynapse Avaki, Entropia United Devices IBM last 20 months Leaders of OGSI Development teams Grid Jamboree GGF 0 100 200 300 400 500 600 700 800 900 GGF1GGF2GGF3GGF4GGF5
19
19
20
20 High-Altitude Views A Rallying Cry Meeting a Hard Challenge requires Many Minds Operating & Maintaining Infrastructure requires Many Hands & Many Companies Another Stab at Distributed Computing Hard Challenge: Intellectually and Practically Important Dependable Ubiquity over Heterogeneity & Fallibility An Ambitious Virtual Machine Consistent large scale computational environments A Global Operating System Collective Resources, Common Management
21
21 An Architectural View Grid Plumbing & Security Infrastructure SchedulingAccountingAuthorisation MonitoringDiagnosisLogging Application Data & Compute Resources Operations Teams Distributed Providers Application Users Common Application Platform for Group of Applications Application & Platform Developers
22
22 Open Grid Services Infrastructure Confluence of Web Services & Grid Consistent Interface Description Based on WSDL 1.2 proposal Extend Properties Separate Binding from Interface Function Composition & Inheritence Exploit WS* Investment Grid Features Security Life-Time Management Service (state) Information via Data Elements Discovery Grouping Notification OGSI Version 1 Proposal at GGF7 (March 03)
23
23 Open Grid Services Architecture Ubiquitous Building Blocks Using OGSI Platform Open & Extensible Encourage Refactoring Experiments Initially The Globus 2 model Except State Information now distributed Example New Features Global Name Mapping Service Replication and Caching Service Data Access & Integration Metering, Logging, Authorisation, Charging, …
24
24 Grid Challenge Balancing “Direct” Access to the “Platforms” with Abstraction & Virtualisation Developers often have exploitable application knowledge Automation necessary & helpful Interface matching, operation validation, … Optimisation at many scales There isn’t enough effort to develop Languages & Abstractions
25
25
26
26 Data Integration Data Resource 1 Data Resource 2 Scientist with Idea 1) Find Data 2) Extract Data 3) Transform Data 4) Combine Data 5) Interpret Data
27
27 Wellcome Trust: Cardiovascular Functional Genomics Glasgow Edinburgh Leicester Oxford London Netherlands Shared data Public curated data
28
28 Oxford Glasgow Cardiff Southampton London Belfast Daresbury Lab RAL OGSA-DAI Partners EPCC & NeSC Newcastle IBM USA IBM Hursley Oracle Manchester EPCC & NeSC IBM UK IBM USA Manchester e-SC Newcastle e-SC Oracle £3 million, 18 months, started February 2002 Cambridge Hinxton
29
29 OGSA-DAI Data Access and Integration for the New Grid Uniform Service Interfaces for Accessing Multiple Data Sources within the Open Grid Services Architecture. UK e-Science Contribution to GT3
30
30 DAI Key Services GridDataServiceGDSAccess to data & DB operations GridDataServiceFactoryGDSFMakes GDS & GDSF GridDataServiceRegistryGDSRDiscovery of GDS(F) & Data GridDataTranslationServiceGDTSTranslates or Transforms Data GridDataTransportDepotGDTDData transport with persistence Integrated Structured Data Transport Relational & XML models supported Role-based Authorisation Binary structured files (later)
31
31 DAI Architecture Grid Infrastructure Scheduling Accounting Monitoring Diagnosis Data Intensive Applications for Science X Compute, Data & Storage Resources Distributed Authorisation Data Access Services Data Integration Services Structured Data Simulation, Analysis & Integration Technology for Science X Data Intensive X Scientists Data Integration Architecture GridFTP Naming Caching Generic Virtual Data Access and Integration Technology
32
32 1a. Request to Registry for sources of data about “x” 1b. Registry responds with Factory handle 2a. Request to Factory for access to database 2b. Factory creates GridDataService to manage access 2c. Factory returns handle of GDS to client 3a. Client queries GDS with XPath, SQL, etc 3b. GDS interacts with database 3c. Results of query returned to client as XML SOAP/HTTP service creation API interactions RegistryFactory Grid Data Service Client XML / Relationa l database
33
33 1a. Request to Registry for sources of data about “x” & “y” 1b. Registry responds with Factory handle 2a. Request to Factory for access and integration to databases 2b. Factory creates GridDataServices network 2c. Factory returns handle of GDS to client 3a. Client submits set of queries GDS with XPath, SQL, etc 3c. Results of queries returned to consumer as XML or binary SOAP/HTTP service creation API interactions RegistryFactory Client XML / Relationa l database Consumer XML / Relationa l database GDS 3b. Tell consumer
34
34 Biomedical (or ANY) Data Opportunities Global Production of Published Data Volume Diversity Combination Analysis Discovery Challenges Data Huggers Meagre metadata Ease of Use Automated, optimised integration Traceability, Dependability Opportunities Specialised Indexing Structurally varied replication Consistent Structured Universe of Discourse Data & Computation Integration Challenges Approximate Matching Multi-scale optimisation Bad habits / industrial structures Safety and Multi-scale optimisation
35
35 Data Integration Challenges High-Level Languages Describing the Data Extraction Recipes Describing the Sources & Components Metadata that drives automation & validation Mobility Code & Data Integrating Existing DB technology Moving the DBMS to the Grid context New Optimisation Challenges Data & Computation & Storage & Movement Shared Distributed Annotation Systems How to Reference Provenance & Acknowledgement
36
36
37
37 Challenges A Programming & Development Model Dependability at this Scale Foundations for Trust Raising the Level of Automation Supporting New Forms of Collaboration Data
38
38
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.