Download presentation
Presentation is loading. Please wait.
Published byLawrence Foster Modified over 9 years ago
1
UK e-Science Future Infrastructure for Scientific Data Mining, Integration and Visualisation Malcolm Atkinson Director of National e-Science Centre www.nesc.ac.uk 25 th October 2002 SDMIV workshop, e-Science Institute Edinburgh
2
Overview UK e-Science Reminder of Investment and Infrastructure International e-Science Examples and Collaboration Data Access and Integration Lego Bricks for Scientific Application Developers Tailored: Application and Computing Scientists A Computer Scientist’s Christmas List Diversity and Opportunity The Way Ahead
3
e-Science Fundamentally about Collaboration Sharing Ideas Thought processes and Stimuli Effort Resources Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure Scientists (Biologists) have done this for Centuries
4
e-Science (take 2) Fundamentally about Collaboration Sharing Ideas Thought processes and Stimuli Effort Resources Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure Text, digital media, structured, organised & curated data, computable models, visualisation, shared instruments, shared systems, shared administration, … Nationally & Internationally Distributed, … Routine, Daily, Automated, … That Requires very Significant Investment in Digital Systems and their Support
5
e-Science (take 3) Fundamentally about Collaboration Sharing Ideas Thought processes and Stimuli Effort Resources Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure Digital networks, digital work- places, digital instruments, … Metadata, ontologies, standards, shared curated data, shared codes, … Common platforms, shared software, shared training, … The Grid SHOULD make this much easier by providing a common, supported high-level of Software and Organisational infrastructure Citation, Authentication, Authorisation, Accounting, Provenance, Policies, … Shared Provision of Platform,
6
Grid Expectations Persistence Always there, Always Working, Always Supported Stability You can build on foundations that don’t move Trustworthy & Predictable Honours commitments Digital policies, digital contracts, security, … Data integrity, longevity and accessibility Performance High-level & Extensible The capabilities you need are already there Ubiquitous Your collaborators use it
7
Grid Reality Persistence Always there, Always Working, Always Supported Stability You can build on foundations that don’t move Trustworthy & Predictable Honours commitments Digital policies, digital contracts, security, … Data integrity, longevity and accessibility Performance High-level & Extensible The capabilities you need are already there Ubiquitous Your collaborators use it Political, Economic & Technical issues to Solve Early days but Open Grid Services link with Web Services + GGF standardisation Not yet but very substantial global effort to achieve this Good basis for extension Commitment to basic functionality WS + Community effort Global & Industrial Rallying Cry Must work with Web Services
8
Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Southampton London Belfast Daresbury Lab RAL Hinxton UK Grid Network National e- Science Centre always-on video walls Access Grid always-on video walls HPC(x)
9
Scotland via Glasgow NNW Northern Ireland MidMAN TVN South Wales MAN SWAN& BWEMAN WorldCom Glasgow WorldCom Edinburgh WorldCom Manchester WorldCom Reading WorldCom Leeds WorldCom Bristol WorldCom London WorldCom Portsmouth Scotland via Edinburgh YHMAN NorMAN EMMAN EastNet External Links LMN Kentish MAN LeNSE 10Gbps 622Mbps 155Mbps SuperJanet4, June 2002 20Gbps 2.5Gbps Tony Hey July 2001
10
National e-Science Centre Events Workshops Research Meetings International Meetings History of Events GGF5 HPDC11 Summer school > 50 workshops held > 1000 people in total Many return often Planned Events 25 workshops Conferences to 2005 Visitors 3 arrived 4 arranged International collaboration, visits & visitors China Argonne National Lab SDSC NCSA … Centre Projects Pilot Projects Regional Support Research Projects EPSRC, MRC, WT, SHEFC
11
A day in the life of NeSC
12
UCSF UIUC From Klaus Schulten, Center for Biomollecular Modeling and Bioinformatics, Urbana-Champaign
13
DataGrid Testbed Dubna Moscow RAL Lund Lisboa Santander Madrid Valencia Barcelona Paris Berlin Lyon Grenoble Marseille Brno Prague Torino Milano BO-CNAF PD-LNL Pisa Roma Catania ESRIN CERN HEP sites ESA sites IPSL Estec KNMI (>40) Francois.Etienne@in2p3.frFrancois.Etienne@in2p3.fr - Antonia.Ghiselli@cnaf.infn.itAntonia.Ghiselli@cnaf.infn.it Testbed Sites
14
A Simplified Grid Anatomy Grid Plumbing & Security Infrastructure SchedulingAccountingAuthorisation MonitoringDiagnosisLogging Scientific Application Data & Compute Resources Operations Team Application Developers Distributed Owners Scientific Users
15
The Crux Grid Plumbing & Security Infrastructure SchedulingAccountingAuthorisation MonitoringDiagnosisLogging Scientific Application Data & Compute Resources Operations Team Application Developers Distributed Owners Scientific Users Keep all the (pink) groups HAPPY
16
A SDMIV Grid Anatomy Grid Plumbing & Security Infrastructure SchedulingAccountingAuthorisation MonitoringDiagnosisLogging Scientific Application Data & Compute Resources Distributed SDMIV Users Data Access Data Integration Structured Data Data Providers Data Curators
17
Database Growth PDB protein structures
18
Data Mining : Science vs Commerce Data in files FTP a local copy /subset. ASCII or Binary. Each scientist builds own analysis toolkit Analysis is tcl script of toolkit on local data. Some simple visualization tools: x vs y Data in a database Standard reports for standard things. Report writers for non-standard things GUI tools to explore data. Decision trees Clustering Anomaly finders Jim Gray UCSC April 2002
19
But…some science is hitting a wall FTP and GREP are not adequate You can GREP 1 MB in a second You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years. Oh!, and 1PB ~10,000 disks At some point you need indices to limit search parallel data search and analysis This is where databases can help You can FTP 1 MB in 1 sec You can FTP 1 GB / min (= 1 $/GB) … 2 days and 1K$ … 3 years and 1M$ Jim Gray UCSC April 2002 50,000 Kg 250 KW 60 Racks = 120m 2
20
Web Services Grid Technology Grid Services OGSA & OGSI www.gridforum.org/ogsi-wg www.gridforum.org/ogsa-wg www.gridforum.org/
21
Web Services Rapid Integration Dynamic binding Commercial Power Financial & Political Independence Client from Service Service from Client Separation Function from Delivery Description WSDL, WSC, WSEF, … Tools & Platforms Java ONE, Visual.NET WebSphere, Oracle, … www. w3c. org / TR / SOAP or TR/wsdl
22
Grid Technology Virtual Organisations Sharing & Collaboration Security Single Sign in, delegation Distribution & fast FTP But Various Protocols Resource Mangement Discovery Process Creation Scheduling Monitoring Portability Ubiquitous APIs & Modules Gov’nm’t Agency Buy in Industrial Buy in Foster, I., Kesselman, C. and Tuecke, S., The Anatomy of the Grid: Enabling Virtual Organisations, Intl. J. Supercomputer Applications, 15(3), 2001 http://www.gridforum.org/ogsi-wg
23
Open Grid Services Architecture Virtual Grid Services Applications Multiple implementations of Grid Services Using operations Implemented by OGS infrastructure Foster, I., Kesselman, C., Nick, J. and Tuecke, S., The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration
24
Scientific Data Deluge of Data Exponential growth Doubling times Astronomy12 months Bio-Sequences9 months Functional Genomics6 months Bytes/dollar12 to 18 months Not How big it is but
25
Scientific Data Deluge of Data Exponential growth Doubling times Astronomy12 months Bio-Sequences9 months Functional Genomics6 months Bytes/dollar12 to 18 months Not How big it is but What you do with it Sharing Curation Metadata Automated movement, access & integration Computational Access
26
Scientific Data Deluge of Data Exponential growth Doubling times Astronomy12 months Bio-Sequences9 months Functional Genomics6 months Bytes/dollar12 to 18 months Not How big it is but How you Embrace & Manage Change The Database is a Knowledge chest The Database is a Communication Hub Autonomously Managed (Curated) change An Essential part of e-BioMedical, Astronomical, …, Science & Engineering
27
Wellcome Trust: Cardiovascular Functional Genomics Glasgow Edinburgh Leicester Oxford London Netherlands Shared data Public curated data
28
Data Access & Integration Central to e-Science Astronomy, Earth Sciences, Ecology, Biology, Medicine, … Collaboration Shared Databases Curated Knowledge Accumulated Observations Accumulated Simulations Computation Data mining Input to models Calibration of models Presentation Publication of results Visualisation
29
GGF DAIS WG Chairs Norman Paton (Manchester Uni.) Leanne Guy (CERN) Dave Pearson (Oracle UK) Activity BoF GGF4 Toronto WG Meeting GGF5 Edinburgh Papers for GGF6 Workshops & Mail lists Goals Agree Standards for Database Access & Integration Freely available reference implementations OGSA-DAI one source & focus for discussions Norman Paton, Inderpal Narang, Leanne Guy, Susan Maliaka, Greg Ricardi, … http://www.cs.man.ac.uk/grid-db/
30
OGSA-DAI project Lego kit for Data Access & Integration Components for e-Science Applications Accelerated Application Development Multiple Data Models Distributed Data Access via Grid & Proxies Integration, Translation & Transformation Open Source Reference Implementation For DAIS-WG standard Trigger for Component Construction Start a community
31
Oxford Glasgow Cardiff Southampton London Belfast Daresbury Lab RAL OGSA-DAI Partners EPCC & NeSC Newcastle IBM USA IBM Hursley Oracle Manchester EPCC & NeSC IBM UK IBM USA Manchester e-SC Newcastle e-SC Oracle £3 million, 18 months, started February 2002 Cambridge Hinxton
32
Primary Components
33
Advanced Components
34
Composed Components
35
Composing Components OGSA-DAI Component Data Transport
36
DAI Key Components GridDataServiceGDSAccess to data & DB operations GridDataServiceFactoryGDSFMakes GDS & GDSF GridDataServiceRegistryGDSRDiscovery of GDS(F) & Data GridDataTranslationServiceTranslates or Transforms Data GridDataTransportDepotGDTDData transport with persistence Relational & XML models supported Role-based Authorisation Binary structured files
37
OGSA Relationship ClassGridServiceRegistryNotificationConsumerNotificationProducer GDSMandatory OptionalNormal GDSFMandatory OptionalNormal GDSRMandatory Normal GDTSMandatory GDTDMandatory OptionalNormal
38
DAI portType Usage ClassGridDataServiceDataTransportFactory GDSMandatoryNormal GDSFOptionalNormalMandatory GDSROptional GDTSOptionalMandatory GDTDOptionalMandatory
39
Distributed Query
40
OGSA-DAI Time Line Feb ’02May ’02Jul ’02Sep ’02Dec ’02Feb ’03May ’03Sep ’03 Ship Alpha Release for GT3 Integration RDB + GT2 / OGSA Prototypes Available XML + OGSA Prototype Available Design Documents & Demos for DAIS WG @ GGF5 XML + OGSA Prototypes for Early Adopters WS + GSI UK support ( > 100 downloads) Phase 2 Starts Phase 1 Starts Presentation & Beta @ GGF7 GGF6 WG Papers & Prototypes Productisation, RAMPS & Extension
41
OGSA-DAI Summary On Schedule & Going Well Contributions via DAIS-WG @ GGF5 & 6 Releases with GT3 Releases scheduled Status: Early Days Released prototypes Tested Architectural Design Using OGSA Working with Early Adopter Pilot Projects AstroGrid & MyGrid First PRODUCT release Dec ‘02 Influence OGSA-DAI direction Via DAIS-WG & Direct messages to us
42
Data Processing Processing Characteristics -Well defined work flow -Correction, calibration, transformation,filtering, merging -Relatively static reference data -Stable processing functions (audited changes) -Periodic reprocessing from archive Instrument Raw Data Reference Data Multi-stage Processing Processed Data Archive In Silico Dave Pearson Provenance and Derivation workshop 18 Oct 02, Chicago
43
Analysis and Interpretation Summarisation Processed Data Archive Summarised Data Analysis Characteristics - Variable workflow - Standard functions - Standard and personal filtering and summarisation - Retain drill down capability Dave Pearson Provenance and Derivation workshop 18 Oct 02, Chicago
44
Analysis and Interpretation Analysis and Interpretation Characteristics - Highly dynamic work flow - Multiple data types - Volatile data - Annotations, inferences, conclusions - Evidential reasoning - Shared multiple versions of truth - Periodic version consolidation Processed Data Result data Retrieval & Update Summarised Data Personalised DatabaseConclusions/Inferences - Descriptions - Trends - Correlations - Relationships Dave Pearson Provenance and Derivation workshop 18 Oct 02, Chicago
45
Metadata Requirements Technical Metadata Direct referencing - Physical location and data schema/structure Data currency/status – version, time stamping Accreditation/Access permissions - Ownership (Dublin Core) Query time/Governance - data volume, no. of records, access paths Contextual Metadata Logical referencing physical data – semantic/syntactic ontologies Lexical translation – Thesaurus, ontological mapping Named derivations (summarisations) Scope of Requirements All science communities Related to provenance Dave Pearson Provenance and Derivation workshop 18 Oct 02, Chicago
46
Metadata Requirements Data Versioning Distinguish latest/agreed version of data Maintain history record of change Synchronise and mirror replicated data Distinguish shared personal interpretations and/or annotations Provenance Record of data processing – calibration, filtering, transformation Record of workflow – methods, standards and protocols Reasoning – evidential justification for inferences & conclusions Scope of Requirements All science communities Includes Technical and Contextual Metadata Dave Pearson Provenance and Derivation workshop 18 Oct 02, Chicago
47
Provenance Issues Schema evolution Granularity of record Processed v Derived Inheritance Lack of structured annotations, ontologies Interactive analysis = dynamic workflow Multiple derived data sources Context of usage Best practice can change Multiple versions of the truth Evidential reasoning Existing data & applications Where is the provenance record stored Dave Pearson Provenance and Derivation workshop 18 Oct 02, Chicago
48
Collaborative Annotation See DAS Distributed Annotation Service Challenges Autonomy Selective viewing Identification Provenance Derivation
49
Biomedical e-Scientists Is this one species? Understanding bird energy Understanding a river / ocean interaction Understanding a biochemical pathway Understanding a cell Understanding a Heart or Brain Understanding Rhododendra Understanding Evolution … No One-Size fits all solutions But sharable re-usable components
50
Opportunities Many, many … More than we can address Compute needs Data management needs Data integration needs … Must choose some pioneers To meet a range of common requirements To provoke rich & high-level platform To generate re-usable components A Long-Term Commitment Needed
51
Advancing SDMIV Grid Grid Plumbing & Security Infrastructure SchedulingAccountingAuthorisation MonitoringDiagnosisLogging Scientific Application Data & Compute Resources Distributed SDMIV Users Data Access Data Integration Structured Data SDMIV (Grid) Application Component Library
52
Summary e-Science Data as well as Compute Challenges Needed to be put together Need ubiquitous supported consistent platforms Grid A (potentially) invaluable platform Only show in town Data Integration Hard Develop & Use Standard kit of parts Started to build the kit No ready made general integration Combines application & computing science Opportunities No one-size fits all, but re-usable subsystems Invest in wider range of Problem driven pioneering Strategic choices needed
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.