Lessons learnt building OGSA-DAI EGC 2005 Malcolm Atkinson Director www.nesc.ac.uk 15 th February 2005.

Slides:



Advertisements
Similar presentations
© 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area.
Advertisements

Conference xxx - August 2003 Fabrizio Gagliardi EDG Project Leader and EGEE designated Project Director Position paper Delivery of industrial-strength.
Delivery of Industrial Strength Middleware Federated Strengths Agility & Coordination Prof. Malcolm Atkinson Director 21 st January 2004.
Research Councils ICT Conference Welcome Malcolm Atkinson Director 17 th May 2004.
National e-Science Centre Glasgow e-Science Hub Opening: Remarks NeSCs Role Prof. Malcolm Atkinson Director 17 th September 2003.
Open Grid Service Architecture - Data Access & Integration (OGSA-DAI) Dr Martin Westhead Principal Consultant, EPCC Telephone: Fax:+44.
1 OGSA-DAI Platform Dependencies Malcolm Atkinson for OMII SC 18 th January 2005.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Joint Information Systems Committee Digital Library Services BL/JISC Workshop Rachel Bruce JISC Programme Director The Digital Library and its Services,
CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
An Overview of OGSA-DAI Kostas Tourlas
Facilitating the use of eInfrastructure: NeSC Training Team Enabling, facilitating and delivering quality training in the UK and Internationally.
Tom Sheridan IT Director Gas Technology Institute (GTI)
Supporting education and research E-learning tools, standards and systems Sarah Porter Head of Development, JISC.
EGEE is a project funded by the European Union under contract IST International Summer School on Grid Computing Vico Equense, 16 th July 2005.
Intelligent Grid Solutions 1 / 18 Convergence of Grid and Web technologies Alexander Wöhrer und Peter Brezany Institute for Software.
Thee-Framework for Education & Research The e-Framework for Education & Research an Overview TEN Competence, Jan 2007 Bill Olivier,
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
The OMII Position At the University of Southampton.
Globus 4 Guy Warner NeSC Training.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Welcome e-Science in the UK Building Collaborative eResearch Environments Prof. Malcolm Atkinson Director 23 rd February 2004.
OGSA-DAI: Future Work and Wrap-up The OGSA-DAI Team
The OMII Perspective on Grid and Web Services At the University of Southampton.
Database Taskforce and the OGSA-DAI Project Norman Paton University of Manchester.
1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.
Quality Attributes of Web Software Applications – Jeff Offutt By Julia Erdman SE 510 October 8, 2003.
User requirements for and concerns about a European e-Infrastructure Steven Newhouse, Director.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
The GRIMOIRES Service Registry Weijian Fang and Luc Moreau School of Electronics and Computer Science University of Southampton.
Extensible Framework for Data Access & Integration Malcolm Atkinson Director 10 th November 2004.
SCSC 311 Information Systems: hardware and software.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
A new start for the Lisbon Strategy Knowledge and innovation for growth.
OGSA-DAI Architecture The OGSA-DAI Team
Introduction to OGSA-DAI The OGSA-DAI Team
DAIT (DAI Two) NeSC Review 18 March Description and Aims Grid is about resource sharing Data forms an important part of that vision Data on Grids:
OGSA-DAI in OMII-Europe Neil Chue Hong EPCC, University of Edinburgh.
CERN – European Organization for Nuclear Research Administrative Support - Internet Development Services CET and the quest for optimal implementation and.
Grids - the near future Mark Hayes NIEeS Summer School 2003.
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
OGSA-DAI.
OGSA-UK: Putting the users first Steven Newhouse OMII Deputy Director.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI Technology Update GGF17, Tokyo (Japan)
IBM & HSBC visit Malcolm Atkinson Director & e-Science Envoy UK National e-Science Centre & e-Science Institute 30 th March 2006.
1 OGSA-DAI Status Report Neil P Chue Hong 20 th May 2005.
OGSA-DAI & DAIT projects Update for TAG Prof. Malcolm Atkinson Director 30 th October 2003.
OGSA-DAI Users’ Meeting Introduction Malcolm Atkinson Director 7 th April 2004.
National e-Science Institute and National e-Science Centre The Way Ahead Prof. Malcolm Atkinson Director 30 th September 2003.
1 OGSA Transition ATF Migration Strategy Prof. Malcolm Atkinson Director 28 th April 2003.
The OGSA-DAI Project Databases and the Grid Neil Chue Hong Project Manager EPCC, Edinburgh
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
OGSA-DAI Usage Scenarios and Behaviour: Determining good practice Mario Antonioletti EPCC, University of Edinburgh
NERC e-Science Meeting Malcolm Atkinson Director & e-Science Envoy UK National e-Science Centre & e-Science Institute 26 th April 2006.
Welcome Grids and Applied Language Theory Dave Berry Research Manager 16 th October 2003.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
1 A new Architecture for OGSA-DAI Malcolm Atkinson, K. Karasavvas, M. Antonioletti, R. Baxter, A. Borley, N. Chue Hong, A. Hume, M. Jackson, A. Krause,
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
OGSA-DAI Current Version Guy Warner.
OGSA-DAI.
EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure Steven Newhouse Director, EGI.eu Project Director, EGI-InSPIRE 29/06/2016CoreGrid.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI on OMII 2.0 OMII The Open Middleware Infrastructure Institute NeSC,
Case Study of Agile Development Ronald J. Leach Copyright Ronald J. Leach, 1997, 2009, 2014,
Pragmatics 4 Hours.
UK e-Science OGSA-DAI November 2002 Malcolm Atkinson
NextGRID: From Compute Grids to Grid SOAs and beyond
Presentation transcript:

Lessons learnt building OGSA-DAI EGC 2005 Malcolm Atkinson Director 15 th February 2005

Contents Why invest in shared software Facilitating Applications Facilitating Production use Improving code quality The Data Bonanza OGSA-DAI International Collaboration Foundation for economic high-quality e-Infrastructure Summary & Take Home Message

Conflicting Views? Governments, EU Commission, … Shared e-Infrastructure will transform  Economy  Society Stimulate creativity and innovation Improve our diagnoses, research, decisions, designs & businesses Researchers, … Want to pursue their particular goals Want no change if it doesn’t help them Want new facilities ASAP, if their research needs it Want convenient, easy to understand and use facilities Want long-term commitment to support Want reliability & performance Prefer to pay as little as possible

Conflicting views? Resource providers Fund providers Institutions hosting developers and operations Specific missions Must demonstrate they have delivered  Better than the other providers Want best value for money  But in their current time scales Politically unable to give long-term commitment  With some exceptions? Technology and Service vendors Profit and business survival informs their decisions Risk averse Incremental approach – where is the business this year Distinctive business models  Inform their view on standards

Eternal Triangle Applications OperationsDevelopers all want reliability, dependability, security, performance long-term stability How do you balance innovation and safe engineering

Eternal Triangle Applications OperationsDevelopers all want reliability, dependability, security, performance long-term stability Want familiar trustworthy code No distractions With just the additions crucial to their goals  Many e-Infrastructures

Eternal Triangle Applications OperationsDevelopers all want reliability, dependability, security, performance long-term stability Many e-Infrastructures Mix & Match model Want Familiar trusted tools & libraries Few & stable deployment contexts Cost of testing & maintenance dominates Many customers / version of code  Few e-Infrastructures

Eternal Triangle Applications OperationsDevelopers all want reliability, dependability, security, performance long-term stability Many e-Infrastructures Mix & Match model Prefer just one e-Infrastructure Testing & maintenance limit innovation Want Familiar trusted tools & systems Stability Cost of Systems Administration dominates Operator error dominates loss of production  One e-Infrastructure

Eternal Triangle Applications OperationsDevelopers all want reliability, dependability, security, performance long-term stability Many e-Infrastructures Mix & Match model Prefer just one e-Infrastructure Testing & maintenance limit innovation One e-Infrastructure Stable with good management tools Compromise: One e-Infrastructure – Select services & libraries Agreed & Simple APIs / mappings wrap all common functions

Generating & Storing Data gets Easier More, faster & cheaper digital devices Higher resolution Greater deployment Faster duty cycles Do they produce required metadata? Larger, faster & cheaper storage technology Economic to store (multiple copies of) primary data Economic to store derived data Crucial to store sufficient good quality metadata Digital Communications – higher bandwidth & cheaper Practical to access and copy remote data Latency not decreasing – get what you need in a few trips

Curation and Publishing Data Invest in preserving data High guarantees that an observation will not be lost Invest in organising data Efficient access for popular queries Registration and description Provenance records Metadata (how to interpret this data) Annotation (related data & comments) Invest in publishing data Obligation from funders – democratisation Expensive and technically hard Collaboration necessary Creative scientific contribution Recognition? Attribution? Citation? Responsibility?

Multi-dimensional Growth The Number of Data Collections Grows rapidly The Size of each Data Collection Grows rapidly The Complexity of each Data Collection Grows rapidly & autonomously The Interdependencies between Collections Grow rapidly The User communities Grow rapidly – dispersed, diverse & mingling

Data Integration is Everything Motivation No business or research team is satisfied with one data resource Data Curation Expertise Human Centred Integration Human centred Domain-specialist driven Dynamic specification of combination function Iterative processes  Revised request minutes later  Revised request after months of thought Sources inevitably heterogeneous Time-varying content, structure & policies Robust, stable steerable integration services Higher-level services over multiple resources Fundamental requirements for (re)negotiation Federation or Virtualisation preceding integration or kit of integration tools to be interwoven with an application? Scientific Insight needed here

Data Integration is Everything Motivation No business or research team is satisfied with one data resource Data Curation Expertise Human Centred Integration Human centred Domain-specialist driven Dynamic specification of combination function Iterative processes  Revised request minutes later  Revised request after months of thought Sources inevitably heterogeneous Time-varying content, structure & policies Robust, stable steerable integration services Higher-level services over multiple resources Fundamental requirements for (re)negotiation Federation or Virtualisation preceding integration or kit of integration tools to be interwoven with an application? Scientific Insight needed here This is the motivation for & home of OGSA-DAI Identify the recurrent requirements Provide one infrastructure that meets them Wide use enables a robust, reliable and supported set of facilities Steadily increase power of facilities Steadily raise the level of abstraction Standardise & achieve multi-national investment

OGSA-DAI Project  OGSA-DAI is one of the Grid Middleware Centre Projects  Collaboration between: –EPCC –IBM (+ Oracle in phase 1) –National e-Science Centre –Manchester University –Newcastle University  Project funding: –OGSA-DAI, , £3.3 million from the UK Core e-Science funding programme –DAIT (DAI Two), £1.3 million from the UK e-Science Core Programme II  "OGSA-DAI" is a trade mark Funded by UK’s Department of Trade & Industry + Engineering & Physical Sciences Research Council as part of the e-Science Core Programme Thanks to Mario Antonioletti for these EPCC slides

Geographically Distributed Team EPCC Team, EdinburghIBM Development Team, Hursley ESNW, Manchester Neresc, Newcastle NeSC, Edinburgh

Communication vital Web Site Support Desk Web Site Training Telephone conferences Face-to-face meetings Access Grid meetings Bugzilla Twiki IRC Mailing Lists

Infrastructure IBM EPCC Test Machines & Databases IRC Mailing Lists Access Grid Telephone NeSC CVS Repository Twiki

Basic Operational Model Data Resource Container DAISGR Client GDSF GDS

More Complex Behaviour Data Resource Container Client GDS GDT Data Resource Container GDS GDT Deliver data back to the client. Data Resource Deliver data to a third party. Deliver data another GDS. And there's a lot more that you can do …

Grid Data Service Data Resource Perform Document Response Document Result Data

gzipCompression zipArchive xslTransform Predefined Activities sqlQueryStatement sqlStoredProcedure sqlUpdateStatement sqlBulkLoadRowset xPathStatement xUpdateStatement xQueryStatement xmlResourceManagement xmlCollectionManagement relationalResourceManager inputStream outputStream DeliverFromURL DeliverToURL DeliverToGFTP DeliverFromGFTP DeliverToStream DeliverFromGDT DeliverToGDT DeliverToFile DeliverFromFile fileWriting directoryAccess fileAccess fileManipulation Developers encouraged to roll their own – many do

INFOD GridFTP DT TM BoF ADF BoF GGF ArchDataISPSRM OGSA CMMGIR GSM GFS DAIS CGSGRAAP Policy IETF SNMP Other Standards Bodies ????W3C XQuery ANSI SQL DMTF CIM OASIS WS-DM WS-RF WS-N WS Policy WS Address OREP JDBC JCP DFDL Standardisation is important

Example Projects Using OGSA-DAI OGSA-DAI ( AstroGrid ( BioSimGrid ( BioGrid ( Bridges ( eDiaMoND ( FirstDig ( GeneGrid ( GEON ( IU RGRBench ( myGrid ( N2Grid ( ODD-Genes ( OGSA-WebDB ( INWA (

OGSA-DAI User Project classification OGSA-DAI Biological Sciences Physical Sciences Commercial Applications Computer Sciences FirstDig INWA Bridges AstroGrid BioSimGrid BioGrid eDiamond myGrid ODD-Genes N2Grid GEON MCS IU RGBench OGSA Web-DB GeneGrid GridMiner

OGSA-DAI Downloads 690 downloads since May 04 Actual user downloads not search engine crawlers Does not include downloads as part of GT3.2 releases Data from 13 December 04 Total of 966 registered users R1.0 (Jan 03)107 R1.5 (Feb 03)110 R2.0 (Apr 03)254 R2.5 (Jun 03)294 R3.0 (Jul 03)792 R3.1 (Feb 04)655 R4.0 (May 04)939 R5.0 (Dec 04)138 Total3323

OGSA-DAI Conclusion  OGSA-DAI provides middleware tools to grid-enable existing databases access discovery integration transformation collaboration

Further Information  The OGSA-DAI Project Site: –  The DAIS-WG site: –  OGSA-DAI Users Mailing list –General discussion on grid DAI matters  Formal support for OGSA-DAI releases –  OGSA-DAI training courses

Platforms & Users Currently on OGSI (GT3) Discontinue support when ~ R6 Currently on WS-I+ (OMII1) Will be in next release Without asynchronous & Third-party data transfers Currently in Preview on WSRF (GT4) Not yet a supported release ~GT4 release Users about equally divided Some still use R3! Re-designed architecture Long list of requested features Many projects want long-term support commitments

DS (Mobius) DS (DAIS) DS (OGSA-DAI) DRAM initiateDataService( ) Registry DSDL Request TADD DR Initiates/ Manages 0 n Id – UUID performRequest() Single Service Session Id - UUID DID Type Format Txn Response TADD Compute & storage resources Registry2 Local Store DRs Logging Service

33 OGSA-DAI & Triangle Applications OperationsDevelopers all want reliability, dependability, security, performance long-term stability One client library Increasingly important More abstraction needed Mostly use client library Some use protocols No extra tools yet No tools or interfaces yet Motivation for new architecture Compromise: One e-Infrastructure – Select services & libraries Develop: Higher-level Client Library & Tools, more Integration, Operations support

OGSA-DAI team needs Agreed data naming system OGSA effort – 3-level: human, abstract & physical address Addresses of state & data resources WS-Addressing Life Time Management WS-RF Resource LifeTime – “imported into OGSA-DAI” Properties WS-Resource Properties – easily implemented look alike Error reporting WS-BaseFaults Agreed Data Transport Abstraction OGSA-Data Design + InfoD meeting at GGF13 Seoul Most of all we need these standard with APIs only one of each!

The Ultimate Challenge Testing Large-scale, always on, distributed persistent infrastructure Product space of platforms and external components {Oracle, DB2, MySQL, Postgres, …}  {Xindice, eXist, …}  {files, DFDL, semi-structured, indexed, text-mined, …}  {java, J2EE,.NET, …}  {OGSI, WS-I+, WSRF, …}  {client libraries in: java, C, C++, C#, …}  … Growing proportion of team effort – though mechanised Maintenance Fixing bugs ( 20%) Providing new required functions (~60%) Better coding and testing can at best save 20% Maintenance is a life sentence No remission for good behaviour! Grows to dominate costs and limit development

To Meet the Challenge 1 Agree an Architecture: OGSA + NextGRID + … To partition the problem space To raise the level of abstraction & discourse To provide a framework for collaboration  Environments in which alternative solutions can perform roles Incremental progress to agreement  Profiles Invest in APIs Protect Application Developer investment Protect Middleware subsystem investment Clarify requirements Specify semantics

To Meet the Challenge 2 Raise the Level of Abstraction Greater benefit for Application Developers Greater benefit for other Middleware Developers Easier comprehension for designers, implementers & exploiters Opportunity for implementation improvement increases Form ≤ 2 International Alliances / Consortia Agree on target e-Infrastructure function and properties Safety of Open Source – future maintenance always possible  But only affordable through alliances Agree partitioned R&D task:  Country X delivers A and Country Y delivers B Incremental development of relationship  Compete  partnership  trust  mutual dependence Avoid brittle dependency  Minimum functionality base platform in which subsystems can work  OGSA base profile a good candidate

To Meet the Challenge 3 Desist from Starting from Scratch “pencil sharpening” auto-distraction Sort-term illusion of progress and success Long-term – another body of software to maintain Division of effort Your legacy: Transition and translation problems Your legacy: Another body of software to maintain Darwinian survival of the fittest Doesn’t result in “best” Expensive & slow Some diversity and competition valuable But don’t let it split users, developers, operations, training, … Discard ego trips, “nationalism” & excess competition – wasteful and harmful

Observations 1 E-Infrastructure Disruptive technology  Will change what we do and how we do it Opportunity to reap major benefits  Is Europe prepared? Education, Education, Education  Will we focus effort? Critical mass. Don’t divide & conquer ourselves Education essential Must Collaborate Internationally To agree, build and operate e-Infrastructure To give adequate support to our users To afford maintenance and operations To facilitate international research, business & decisions

Observations 2 – OGSA-DAI >30 staff years of effort, Release 5, coming soon R6 3 platforms, >1000 users, world-wide use, diversity Backed by standards effort – hard work! User community & User group Major investment in client-side API Beneficial for users, application developers & training Provides implementation options Undergoing re-design Flexible and Extensible framework Essential for applications Contributors build using this – webDB, streaming data, … No contributor code shipped with releases yet  Diverse demand for new features Diverse & multi-platform Looking for reassurance about future support and maintenance Perhaps 25% of original OGSA-DAI vision built so far

Reserve slides for questions End of slide show

From OGSA Status and Future, Hiro Kishimoto and Ian Foster, GGF12 slide originally from Michael Behrens, DISA consultant

Provided by David Snelling (Fujitsu) and Mark Linesch (GGF & HP).

47 Why Invest in OGSArchitecture 4 Integration Completeness Abstraction Cooperation OGSA partitions the e-Infrastructure implementation Encourages independent concurrent & coordinated  Development or evaluation of each part’s standards  R&D on implementation of each part Promises assembly of the parts Basic profile provides context for concurrent R&D  Context for each M/W developer to build against  Reduced interdependence – each can deliver if others don’t Focus effort on reaching minimum threshold that makes this work

48 Back OGSA more Invest effort in OGSA Investigating, evaluating, contributing, commenting Implementing profiles Using it It is the ONLY show in town Which offers  integration, completeness, abstraction  A foundation for collaboration UK focus on Data Design Team UK efforts in other design teams EMS, Grid markets, JSDL, GSM, mySpace, …

49 Use OGSA for Collaboration Big push to Reach OGSA Basic Profile Sufficient platform, context & framework For safely partitioning further R&D Agree a division of work Upgrading / alternative trade-off components New components Higher level facilities Minimise duplication Maximise combined efforts to deliver Function, Stability, Quality & Abstraction Bury the egos, project competitio n & national pride silos