Research at the National e-Science Centre Dr. Dave Berry Research Manager www.nesc.ac.uk 6 th November 2003.

Slides:



Advertisements
Similar presentations
Large-Scale, Adaptive Fabric Configuration for Grid Computing Peter Toft HP Labs, Bristol June 2003 (v1.03) Localised for UK English.
Advertisements

Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
E-Science Data Information and Knowledge Transformation Eldas Building Service Grids with Enterprise Level Data Access Services Alan Gray
National e-Science Centre Glasgow e-Science Hub Opening: Remarks NeSCs Role Prof. Malcolm Atkinson Director 17 th September 2003.
Open Grid Service Architecture - Data Access & Integration (OGSA-DAI) Dr Martin Westhead Principal Consultant, EPCC Telephone: Fax:+44.
NeSC: National e-Science Centre. NeSC Mission Help the UK develop international strength in Grid computing Industry, Commerce, Scientific Research, …
Current status of grids: the need for standards Mike Mineter TOE-NeSC, Edinburgh.
AMUSE Autonomic Management of Ubiquitous Systems for e-Health Prof. J. Sventek University of Glasgow In collaboration.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum? Colin C. Venters National Centre for e-Social Science University.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
High Performance Computing Course Notes Grid Computing.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Institute for Software Science – University of ViennaP.Brezany 1 Databases and the Grid Peter Brezany Institute für Scientific Computing University of.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Designing and Building Grid Services GGF9 Chicago October 8, 2003 Organizers: Ian Foster, Marty Humphrey, Kate Keahey, Norman Paton, David Snelling.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Assoc. prof., dr. Vladimir Dimitrov University of Sofia, Bulgaria
Welcome e-Science in the UK Building Collaborative eResearch Environments Prof. Malcolm Atkinson Director 23 rd February 2004.
Database Taskforce and the OGSA-DAI Project Norman Paton University of Manchester.
1 UK NeSC Meeting, November 18 th, 2004 Terry Sloan EPCC, The University of Edinburgh INWA : using OGSA-DAI in a commercial environment.
EdSkyQuery-G Overview Brian Hills, December
Extensible Framework for Data Access & Integration Malcolm Atkinson Director 10 th November 2004.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Advanced Techniques for Scheduling, Reservation, and Access Management for Remote Laboratories Wolfgang Ziegler, Oliver Wäldrich Fraunhofer Institute SCAI.
ODD-Genes: Accelerating data-driven scientific discovery NeSC Review 2003 NeSC
1 HPDC12 Seattle Structured Data and the Grid Access and Integration Prof. Malcolm Atkinson Director 23 rd June 2003.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Data and storage services on the NGS Mike Mineter Training Outreach and Education
Usability Talk, 26 th January 2006 Development of Usable Grid Services for the Biomedical Community Prof Richard Sinnott Technical Director National e-Science.
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
DAME: A Distributed Diagnostics Environment for Maintenance Dr Tom Jackson University of York.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
6 February 2009 ©2009 Cesare Pautasso | 1 JOpera and XtremWeb-CH in the Virtual EZ-Grid Cesare Pautasso Faculty of Informatics University.
1 The Challenge of Data Integration Data + Grid = Discovery? Prof. Malcolm Atkinson Director 22 nd January 2003.
Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff.
OGSA-DAI & DAIT projects Update for TAG Prof. Malcolm Atkinson Director 30 th October 2003.
Neil Chue Hong Project Manager, EPCC OGSA-DAI Requirements Gathering Exercise 2 nd DIALOGUE workshop eSI, 9-10.
OGSA-DAI Users’ Meeting Introduction Malcolm Atkinson Director 7 th April 2004.
1 OGSA Transition ATF Migration Strategy Prof. Malcolm Atkinson Director 28 th April 2003.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
The OGSA-DAI Project Databases and the Grid Neil Chue Hong Project Manager EPCC, Edinburgh
OGSA-DAI Open Grid Services Architecture – Data Access and Integration NeSC Review 18 March 2004.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
Data and storage services on the NGS.
© Copyright AARNet Pty Ltd PRAGMA Update & some personal observations James Sankar Network Engineer - Middleware.
Chinese Delegation Visit High Performance Computer Mission UK e-Science & The National e-Science Centre Prof. Malcolm Atkinson Director
Japanese & UK N+N Data, Data everywhere and … Prof. Malcolm Atkinson Director 3 rd October 2003.
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
Welcome Grids and Applied Language Theory Dave Berry Research Manager 16 th October 2003.
OGSA-DAI.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI on OMII 2.0 OMII The Open Middleware Infrastructure Institute NeSC,
UK e-Science OGSA-DAI November 2002 Malcolm Atkinson
Grid Portal Services IeSE (the Integrated e-Science Environment)
University of Technology
VIFI : Virtual Information Fabric for Data-Driven Discovery from Distributed Fragmented Repositories PI: Dr. Ashit Talukder Bank of America Endowed Chair.
Presentation transcript:

Research at the National e-Science Centre Dr. Dave Berry Research Manager 6 th November 2003

Three Pillars of e-Science Research FoundationsTechnologyApplications Apply known results Focus for new work Enable new science Steering of development Edinburgh: - Informatics - Physics & Astronomy Glasgow: - Computing Science - Physics & Astronomy EPCC ETF&Testbeds edikt Repositories Computing Industry Research Departments Research Institutes Other Universities Commercial Customers

Information Grids FoundationsTechnologyApplications Apply known results Focus for new work Enable new science Steering of development Publishing Scientific Data GridPP ScotGrid QCDGrid OGSA-DAI/ DAIT edikt – eldas and BinX ODD-Genes AstroGrid BRIDGES FirstDIG Biological Spatio- Temporal Databases 1,000 th Download Sep 2003 Peter Buneman’s Group Tony Doyle & Steve Playfer Richard Kenway Richard Baldock

Computation Grids FoundationsTechnologyApplications Apply known results Focus for new work Enable new science Steering of development GridPP ScotGrid RealityGrid Enhance SunDCG ODD-Genes PGPGrid Murray Cole > 3000 doc downloads Paul Cockshott

Fabrics and Platforms FoundationsTechnologyApplications Apply known results Focus for new work Enable new science Steering of development AMUSE Dynamic Configuration of Grid Fabrics Dependable Grid Services MS.NETGrid GridWeaver OGSA Test Grid IBM Grid Evaluation Joe Sventek Stuart Anderson LCFG + SmartFrog

More foundations Service Composition Deductive Synthesis Techniques … Inferring QoS Properties for Grid Applications Mobile Code Mobile Resource Guarantees IRCs CoAKTinG EQUATOR Security Technologies for Information Environment Security Alan Bundy Don Sannella, Stephen Gilmore Austin Tate Matthew Chalmers

More applications Physics CDF Grid Development NeuroInformatics Grid-enabled Modelling Tools and Databases for Neuroinformatics BioInformatics e-Diamond (mammography) David Wilshaw Rob Procter

Data Repositories Medical Genetics Generation Scotland Human Genetics Unit Mouse Atlas Nuclear Protein Database Roslin Institute ArkDB, Informatics EUSTACE Corpus FlyTrap GeoSciences Antarctic Survey data Continental seismic survey data BGS offshore survey

Example: ODD-Genes ODD-Genes is a demonstrator Demonstrates how Grid technologies enable e-Science, accelerating scientific discovery SunDCG’s TOG software allows for job submission on remote compute resources OGSA-DAI provides access, control and discovery of data resources ODD-Genes used to investigate Wilms Tumour Routine statistical conditioning of microarray results Data-driven discovery of novel targets for investigation and potential therapy Collaborative project NeSC/EPCC Scottish Centre for Genomic Technology and Informatics (GTI) Human Genetics Unit at MRC, Western General Hospital (HGU) "This project has demonstrated how Grid technologies can be used to enable true e-Science - discoveries that would not otherwise have been achieved without this infrastructure in place." Professor Peter Ghazal, Director, GTI.

SunDCG – Enabling Routine Statistical Conditioning Choose analysis to perform Automates analysis process Provides predetermined workflow Can run more than one analysis at a time Multiple reproducible avenues for investigation Reduces cost (human, machine), increases availability TOG enables this by allowing access to HPC resources

SunDCG Compute Scheduler B Grid Engine abcd e efgh d A Globus 2 User A User B Integrates Grid Engine and Globus 2 GE execution methods provide job submission/control GE job context stores job specific information Globus GSI for security Globus GRAM enables interaction with remote resource GASS for small data transfer, GridFTP for large datasets

OGSA-DAI - Results Investigation Multiple views of data Raw Heat Map Cluster Map Wilms Tumour study takes a new direction two genes appear significant in early development Researchers would like more info on these genes…

OGSA-DAI - Data Resource Discovery OGSA-DAI uses keywords to locate relevant data resources May return data resources previously unknown to researcher Researcher selects most interesting data resource to query for information about gene Researcher selects Mouse atlas – narrow, deep database of spatial gene expression in mice embryonic development Contrast with GTI database of broad, shallow genome-wide gene expression across multiple organisms, stages & conditions

OGSA-DAI - Data Resource Query OGSA-DAI returns data from query Data and annotation displayed Data contains references to related images Researcher rapidly moves from numeric and textual description to spatial representation of relevant gene expression These show that the genes are stem cell markers Targets for focussed investigation, potential therapy

1a. Request to Registry for sources of data about “x” 1b. Registry responds with Factory handle 2a. Request to Factory for access to database 2c. Factory returns handle of GDS to client 3a. Client queries GDS with XPath, SQL, etc 3b. GDS interacts with database 3c. Results of query returned to client as XML SOAP/HTTP service creation API interactions RegistryFactory 2b. Factory creates GridDataService to manage access Grid Data Service Client XML / Relationa l database Data Access & Integration Services

Example: Mobile Resource Guarantees The MRG technology consists of programming languages; type systems for the languages; logics for expressing statements of resource consumption; and proof technology for proving these statements. Camelot, a high-level functional programming language with objects and resource control; Grail, a strongly-typed intermediate language which is the target language of the Camelot compiler and is interconvertible with Java byte code; A cost model, a formal semantics for byte code execution which tracks execution time and space allocation; A byte code logic allowing the expression of costs, embedded in a generic proof system (Isabelle).

Resource-bounded mobile code

Relevance to Grids Grid service providers need to schedule competing requests for access to resources. With 25Kb of code and 1Pb of sky survey data it is infeasible to ship the data to the code. There are projects which have supported scientific programming in functional languages (e.g. Psicho). An alternative would be to transfer the MRGtechnology to Java or Java-like languages (ESC/Java, SpecialJ, and Pizza).

Example: AMUSE Autonomic Management of Ubiquitous Systems for e-Health Automated management of complex distributed application systems Architectural pattern and prototype implementations for closed-loop management of such systems Policy-based management AMUSE will integrate these to address automated management of e-Health applications

Closed-loop Management Pattern (Self-Managed Cell) Measurement Adapters “System” Under Test Provisioning Analysis, Simulation, Optimization Measurement “System” Configuration Service Goals System Policy Policy Management Topology, Other Event Bus Trends & Prediction Raw Measurement Management Application

Two-level nesting Management Application Level n Agents “System” Prov Infer Meas ConfigPolicy Event Bus Measurement Adapter Provisioning Analysis, Simulation, Optimization Measurement “System” Configuration Service Goals System Policy Policy Management Topology, Other Event Bus Trends & Prediction Raw Measurement Level n-2 Level n-1

GGF: Standardisation Grid Research Oversight Committee & Programme Committee Prof. Malcolm Atkinson Data Access and Integration Services Working Group Dr Mario Antonioletti (Group Secretary & Editor), Dr Amy Krause (Editor) Prof. Malcolm Atkinson, Dr Martin Westhead, Neil Chue Hong (Authors) Dr. Mike Jackson Data Format Definition Language Working Group Dr Martin Westhead (Founder and Chair) Job Submission Definition Language Working Group Dr Ali Anjomshoaa (founder and chair) Open Grid Services Architecture Working Group Dr Dave Berry Open Grid Services Infrastructure Working Group Dr Mike Jackson, Daragh Byrne

Data Services GGF Data Access & Integration Services (DAIS) OGSI-compliant interfaces to access relational and XML databases Needs to be generalized to encompass other data sources (see next slide…) Generalized DAIS becomes the foundation for: Replication: Data located in multiple locations Federation: Composition of multiple sources Provenance: How was data generated?

GDTS 2 GDS 3 2 GDTS 1 S x S y 1a. Request to Registry for sources of data about “x” & “y” 1b. Registry responds with Factory handle 2a. Request to Factory for access and integration from resources Sx and Sy 2b. Factory creates GridDataServices network 2c. Factory returns handle of GDS to client 3a. Client submits sequence of scripts each has a set of queries to GDS with XPath, SQL, etc 3c. Sequences of result sets returned to analyst as formatted binary described in a standard XML notation SOAP/HTTP service creation API interactions Data Registry Data Access & Integration master Client Analyst XML database Relational database GDS GDTS 3b. Client tells analyst GDS 1 Future DAI Services “scientific” Application coding scientific insights Problem Solving Environment Semantic Meta data Application Code

Take Home Message In addition to our national services, NeSC has a thriving research programme Foundation departments Technology development (EPCC, NeSC, Globus Alliance) Research scientists Wide breadth of interest Particular focus on scientific data OGSA-DAI is here now Join in making better DAI services & standards Bioinformatics and Astronomy are Priority Application Areas There are many opportunities for collaboration

OGSA Infrastructure Architecture OGSI: Interface to Grid Infrastructure Data Intensive Applications for Science X Compute, Data & Storage Resources Distributed Simulation, Analysis & Integration Technology for Science X Data Intensive Users Virtual Integration Architecture Generic Virtual Data Access and Integration Layer Structured Data Integration Structured Data Access Structured Data Relational XML Semi-structured- Transformation Registry Job Submission Data TransportResource Usage Banking BrokeringWorkflow Authorisation

ODD-Genes Caveats & Further Work ODD-Genes is a demonstrator Need to develop production applications for both routine statistical processing and data resource discovery and query Need to parameterise routine conditioning appropriately to complete automation ODD-Genes requires GRID infrastructure Participating researchers need to partner with centres who host application front-ends (or, host the infrastructure themselves) However, alternatives often proprietary, expensive, less flexible ODD-Genes requires registration by data-hosts Critical mass of registered data sources.

SunDCG - Conditioning Results Results of conditioning can be analysed and investigated Researcher has potentially several views of data to explore, all presented simultaneously in parallel (cp traditional serialised, manual process) Researcher can reproduce this initial condition for repeated analyses Researcher need not perform each step manually and serially, or ask dedicated statistician to do so.

“OGSA Data Services” Foster, Tuecke, Unger, editors Describes conceptual model for representing all manner of data sources as Web services Database, filesystems, devices, programs, … Integrates WS-Agreement Data service is an OGSI-compliant Web service that implements one or more of base data interfaces: DataDescription, DataAccess, DataFactory, DataManagement These would be extended and combined for specific domains (including DAIS)

OGSA-DAI Approach Reuse existing technologies and standards OGSA, Query languages, Java, transport Build portTypes and services which will enable: controlled exposure of heterogenous data resources on an OGSI- compliant grid access to these resource via common interfaces using existing underlying query mechanisms (ultimately) data integration across distributed data resources OGSA-DAI (the software) seeks to be a reference implementation of the GGF DAIS WG standard Can’t keep up with frequent standard changes, so software releases track specific drafts See for details.