Neil Chue Hong Project Manager, EPCC +44 131 650 5957 OGSA-DAI data access and integration NERC GridGIS workshop eSI, 1 February.

Slides:



Advertisements
Similar presentations
Tom Sugden EPCC OGSA-DAI Future Directions OGSA-DAI User's Forum GridWorld 2006, Washington DC 14 September 2006.
Advertisements

© 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area.
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh Alan Chappell PNNL
Neil Chue Hong Project Manager, EPCC Data Services What, Why, How e-Research Meeting NeSC, 2 nd.
National e-Science Centre Glasgow e-Science Hub Opening: Remarks NeSCs Role Prof. Malcolm Atkinson Director 17 th September 2003.
Enterprise Java and Data Services Designing for Broadly Available Grid Data Access Services.
Open Grid Service Architecture - Data Access & Integration (OGSA-DAI) Dr Martin Westhead Principal Consultant, EPCC Telephone: Fax:+44.
The National Grid Service and OGSA-DAI Mike Mineter
E-Science Data Information and Knowledge Transformation Edikt : e-Science Data, Information and Knowledge Transformation NeSC Review, 30 September 2003.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
INTERNATIONAL INSTITUTE FOR GEO-INFORMATION SCIENCE AND EARTH OBSERVATION Towards quality-aware Infrastructures for Geographic Information Services Richard.
Data Management Expert Panel - WP2. WP2 Overview.
An Overview of OGSA-DAI Kostas Tourlas
Database System Concepts and Architecture
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
THHGCS07B Coordinate Marketing Activities Lecture 2.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
EGEE is a project funded by the European Union under contract IST International Summer School on Grid Computing Vico Equense, 16 th July 2005.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
1 An Introduction to OGSA-DAI Konstantinos Karasavvas 13 th September 2005.
Mike Jackson EPCC OGSA-DAI Today Release 2.2 Principles and Architectures for Structured Data Integration: OGSA-DAI.
17 July 2006ISSGC06, Ischia, Italy1 Agenda Session 26 – 14:30-16:00 An Overview of OGSA-DAI OGSA-DAI today – and future features How to extend OGSA-DAI.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Intelligent Grid Solutions GridMiner A Framework for Knowledge Discovery on the Grid – from a Vision to Design and Implementation Peter.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
Information System Development Courses Figure: ISD Course Structure.
DAIT (DAI Two) NeSC Review 18 March Description and Aims Grid is about resource sharing Data forms an important part of that vision Data on Grids:
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
OGSA-DAI in OMII-Europe Neil Chue Hong EPCC, University of Edinburgh.
1 1 EPCC 2 Curtin Business School & Edinburgh University Management School Michael J. Jackson 1 Ashley D. Lloyd 2 Terence M. Sloan 1 Enabling Access to.
Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.
Data and storage services on the NGS Mike Mineter Training Outreach and Education
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
OGSA-DAI.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Data access and integration with OGSA-DAI: OGSA-DQP Steven Lynden University of Manchester.
INFSO-RI Enabling Grids for E-sciencE OGSA DAI Data Access and Integration Marek Ciglan Institute of Informatics, Slovac Academy.
State Key Laboratory of Resources and Environmental Information System China Integration of Grid Service and Web Processing Service Gao Ang State Key Laboratory.
Mike Jackson EPCC OGSA-DAI Architecture + Extensibility OGSA-DAI Tutorial GGF17, Tokyo.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
OGSA-DAI Neil Chue Hong 29 th January 2007 OGF19, Chapel Hill.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI Technology Update GGF17, Tokyo (Japan)
1 OGSA-DAI Status Report Neil P Chue Hong 20 th May 2005.
Introduction to OGSA-DAI Neil Chue Hong OGSA-DAI Project Manager 14 th February 2006 GGF16, Athens.
Neil Chue Hong Project Manager, EPCC OGSA-DAI Requirements Gathering Exercise 2 nd DIALOGUE workshop eSI, 9-10.
Neil Chue Hong Project Manager, EPCC OGSA-DAI Status and Benchmarks All Hands Meeting 2005 Nottingham, 22 September.
Data and storage services on the NGS.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Neil Chue Hong EPCC Authorization Models for Data Services EGEE Workshop on Management of Rights in Production Grids.
OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed.
OGSA-DAI 简介及其它在 China-VO DAS 系统中的应用 杨阳 中国虚拟天文台研发团队 Chinese Virtual Observatory.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
OGSA-DAI.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI on OMII 2.0 OMII The Open Middleware Infrastructure Institute NeSC,
UK e-Science OGSA-DAI November 2002 Malcolm Atkinson
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Neil Chue Hong Project Manager, EPCC OGSA-DAI data access and integration NERC GridGIS workshop eSI, 1 February 2006

NERC GridGIS workshop - 1 February Overview The Data Deluge –challenges of increasing data availability –benefits of bringing data together OGSA-DAI –overview –use as a data integration base layer

NERC GridGIS workshop - 1 February The Data Deluge Entering an age of data –Data Explosion –CERN: LHC will generate 1GB/s = 10PB/y –VLBA (NRAO) generates 1GB/s today –Pixar generate 100 TB/Movie –Storage getting cheaper Data stored in many different ways –Data resources –Relational databases –XML databases / files –Result files Need ways to facilitate –Data discovery –Data access –Data integration Empower e-Business and e-Science –The Grid is a vehicle for achieving this

NERC GridGIS workshop - 1 February Composing Observations in Astronomy No. & sizes of data sets as of mid-2002, grouped by wavelength 12 waveband coverage of large areas of the sky Total about 200 TB data Doubling every 12 months Largest catalogues near 1B objects Data and images courtesy Alex Szalay, John Hopkins

NERC GridGIS workshop - 1 February Data Services: motives Key to Integration of Scientific Methods –Publication and sharing of results –Primary data from observation, simulation & experiment –Encourages novel uses –Allows validation of methods and derivatives –Enables discovery by combining data collected independently Key to Large-scale Collaboration –Economies: data production, publication & management –Sharing cost of storage, management and curation –Many researchers contributing increments of data –Pooling annotation leads to rapid incremental publication –Accommodates global distribution –Data & code travel faster and more cheaply –Accommodates temporal distribution –Researchers assemble data –Later (other) researchers access data

NERC GridGIS workshop - 1 February Data Services: challenges to management Scale –Many sites, large collections, many uses Longevity –Research requirements outlive technical decisions Diversity –No one size fits all solutions will work –Primary Data, Data Products, Meta Data, Administrative data, … Many Data Resources –Independently owned & managed –No common goals –No common design –Work hard for agreements on foundation types and ontologies –Autonomous decisions change data, structure, policy, … –Geographically distributed and I havent even mentioned security yet!

NERC GridGIS workshop - 1 February Small problems Not just Grand Challenges! –Also the small problems For instance: –What happens to data when a researcher leaves a team? –How can a research leader point to popular data when a new researcher joins? –How can you manage your data when you start to run out of local storage space? –How do I get my data from one format/database to another? –How do I combine my data with your data? You need to manage your data

NERC GridGIS workshop - 1 February What is a data service? An interface to a stored collection of data –e.g. Google and Amazon –web services But the data could be: –replicated –shared –federated –virtual –incomplete Dont care about the underlying representation –do care about the information it represents Adding a service layer to existing data sources can improve composability

NERC GridGIS workshop - 1 February Examples of Data Services Many Data Services and applications –Commercial databases –Web interfaces –Applications developed individually by groups and projects Also many places to get hold of public data –Publications and citation servers –Results servers But… no such thing as a free lunch –Things are not yet Plug and Play –You need to expend some effort to use these services effectively

NERC GridGIS workshop - 1 February Use Cases for Data Services Data Filtering: –Single source producing large amounts of data distributed to many sites downstream Data Discovery: –many sources, many query entry points in a linked system Data Translation: –source to sink, conversion of data model / structure Data Federation: –many sources, linked to provide view as a single source Data Replication –full or partial copies to improve throughput Data Integration (model aggregation) –e.g. integration of time variant data, streams, files Data Integration (knowledge expansion) –forming links between databases to increase knowledge

NERC GridGIS workshop - 1 February Trade Offs Speed vs completeness –do you require the exact answer or an answer? Application specific vs language specific queries –how will users interrogate a data service? Static system vs Dynamic Discovery –do you actually have dynamic resources? Static vs Dynamic data –READ only, READ/INSERT only, UPDATE permitted Static vs Dynamic queries –optimisation over flexibility Intranet vs Internet –speed over security Single data model versus mixed data models –ease/speed over integration Queries vs Questions –assume that we know the structure when we form the query

NERC GridGIS workshop - 1 February Requirements on Data Services? Common Data Model e.g. RowSet Common Query Language(s) e.g. XQuery, SQL Standard access to –data resource schema information for schema mapping –physical data resource information for optimisation purposes –data resource descriptive information for discovery / integration Single, seamless security model Dynamic publication and discovery Multiple, efficient delivery methods Move computation towards data Data aggregation functionality Provenance information Replication information

NERC GridGIS workshop - 1 February OGSA-DAI In One Slide An extensible framework for data access and integration. Expose heterogeneous data resources to a grid through web services. Interact with data resources: – Queries and updates. – Data transformation / compression – Data delivery. Customise for your project using – Additional Activities – Client Toolkit APIs – Data Resource handlers A base for higher-level services – federation, mining, visualisation,…

NERC GridGIS workshop - 1 February OGSA-DAI team IBM Development Team, Hursley NEReSC, Newcastle NeSC, Edinburgh EPCC Team, Edinburgh ESNW, Manchester IBM Dissemination Team

NERC GridGIS workshop - 1 February OGSA-DAI Design Principles – I Efficient client-server communication –Minimise where possible –One request specifies multiple operations No unnecessary data movement –Move computation to the data –Utilise third-party delivery –Apply transforms (e.g., compression) Build on existing standards –Fill-in gaps where necessary –DAIS specifications from DAIS WG at GGF

NERC GridGIS workshop - 1 February OGSA-DAI Design Principles – II Do not hide underlying data model –Users must know where to target queries –Data virtualisation is hard Extensible architecture –Modular and customisable –e.g., to accommodate stronger security Extensible activity framework –Cannot anticipate all desired functionality –Activity = unit of functionality –Allow users to plug-in their own

NERC GridGIS workshop - 1 February MySQL OGSA-DAI service Engine SQLQuery JDBC Data Resources Activities DB2 The OGSA-DAI Framework GZipGridFTPXPath XMLDB XIndice readFile File SWISS PROT XSLT SQL Server Data- bases Application Client Toolkit

NERC GridGIS workshop - 1 February Intermediary Simple intermediary –potential to accelerate development, logging, or filtering Persistent intermediary –e.g. to allow efficient local indexing

NERC GridGIS workshop - 1 February Redirector, Coordinator, Network Allowing composition and decentralisation

NERC GridGIS workshop - 1 February MySQL OGSA-DAI service Engine SQLQuery JDBC SQL JDBC SQL JDBC SQL JDBC SQL JDBC Multiple SQL GDS SQLQuery Extensibility Example

NERC GridGIS workshop - 1 February Map Retrieval: Current OGC browser Internet ServiceGIS Oracle EDINA

NERC GridGIS workshop - 1 February Map Retrieval: Grid Prototype OGC GIS Oracle OGSA-DAI 1 Client EDINA Basic client to demonstrate proof of concept SO-OGC

NERC GridGIS workshop - 1 February Map Retrieval: Security Exploit NGS infrastructure to provide secure access layer OGC ODS 1GIS Oracle Portlet Allowed users dn SO-OGC NGS Authentication EDINA

NERC GridGIS workshop - 1 February Map Retrieval: Integration Exploit OGSA-DAI extensibility to add e.g. overlay OGC ODS 2GIS Oracle Portlet ODS 1 Oracle Census ODS 3 Application data SO-OGC JDBC SO-OGC SQL/XML NGS Authentication

NERC GridGIS workshop - 1 February OGSA-DAI / EDINA prototyping work Stage 1: Using existing OGSA-DAI technology Stage 2: Extending OGSA-DAI OGSA-DAI service HTTP Data Resource WMS Server DeliverFrom URL GIS Client GIS Client URL Input Parameters Image/XML File HTTP Request HTTP Response GIS Activities

NERC GridGIS workshop - 1 February Core features of OGSA-DAI – I A framework for building applications –Supports data access, insert and update –Relational: MySQL, Oracle, DB2, SQL Server, Postgres –XML: Xindice, eXist –Files – CSV, BinX, EMBL, OMIM, SWISSPROT,… –Supports data delivery –SOAP over HTTP –FTP; GridFTP – –Inter-service –Supports data transformation –XSLT –ZIP; GZIP –Supports security –X.509 certificate based security

NERC GridGIS workshop - 1 February Core features of OGSA-DAI – II A framework for building data clients –Client toolkit library for application developers A framework for developing functionality –Extend existing activities, or implement your own –Mix and match activities to provide functionality you need Highly-extensible –Customise our out-of-the-box product –Provide your own services, client-side support and data-related functionality Comprehensive documentation and tutorials Latest release supports GT4.0 and Axis 1.2 / OMII_2 using Java 1.4

NERC GridGIS workshop - 1 February Distributed Query Processing Higher level services building on OGSA-DAI – specialised metadata extraction Execute queries in parallel over multiple data resources Queries mapped to algebraic expressions for evaluation Parallelism represented by partitioning queries –Use exchange operators Equality based joins in current release – supported types: long, integer, string, double and float table_scan (protein) table_scan termID=S92 (proteinTerm) reduce hash_join (proteinId) op_call (Blast) reduce exchange 3,4 12

NERC GridGIS workshop - 1 February DQP architecture

NERC GridGIS workshop - 1 February GridMiner: Data Mediation Service Principles –Tight Federation: –global (relational) schema –Virtual integration: –leave the data where it is –always up-to-date data –Build on data access from OGSA-DAI –Not bound to special architecture Supported data sources: –RDBMS (via JDBC), XMLDB (Xindice), CSV files Operators: Union all and inner join Operators are XQuery based (using SAXON)

NERC GridGIS workshop - 1 February Data Integration Scenario Heterogeneities: –Name in A is First Last (as the target format) –Name in C has to be combined Distribution: –3 data sources Java based schema mapping to global schema –types limited by WebRowSet

NERC GridGIS workshop - 1 February Data Integration Scenario (cont.) Query: SELECT p_name FROM patient WHERE id=10 to Standard optimized

NERC GridGIS workshop - 1 February caBIG Object-Oriented view of data –Data types are well-defined and registered in a repository –Standardized metadata facilitates discovery –custom query language implemented as an activity

NERC GridGIS workshop - 1 February LEAD IU NCSA Illinois UA Huntsville Millersville UCAR Unidata Okla Univ Master catalog Each satellite replicates its contents to the master catalog

NERC GridGIS workshop - 1 February FirstDIG Data mining with the First Transport Group, UK –Example: When buses are more than 10 minutes late there is an 82% chance that revenue drops by at least 10% –"The results of this exercise will revolutionise the way we do things in the bus industry., Darren Unwin, Divisional Manager, First South Yorkshire. –Client based joins, using temporary tables OGSA-DAI OGSA-DAI Client Application Data Mining Application

NERC GridGIS workshop - 1 February OGSA-DAI Challenges Metadata extraction –define a common model for e.g. database schema? Intermediate representation –between multiple models (relational, XML,…) –XML WebRowSet is flexible (c.f. GridMiner) but expansive –DFDL and GridFTP/parallel HTTP? Query definition –translation of queries –aggregation of results Data transport and workflow –workflow is typically compute driven Move computation to data –mobile code activities? –data services hosted on DBMS?

NERC GridGIS workshop - 1 February Contributing to OGSA-DAI Additional functionality: –Provide activities which implement specific functionality –Provide extra client functionality –Provide different security mechanisms –Provide higher level components and applications Different levels of contributions –Based on OGSA-DAI? –Works with OGSA-DAI? –Part of OGSA-DAI?

NERC GridGIS workshop - 1 February In the near future A new version of the OGSA-DAI Engine –should look mostly the same externally –better support for concurrency, sessions and monitoring Implementing new versions of specifications –DAIS Specifications Key things that we will be addressing: –Performance –A Security Model which can be applied across platforms –Full Transactions framework, distributed transactions –More data integration facilities –Better abstraction over DBMS variation Application centric queries –collaborating with other projects Research projects looking at: –schema mapping –extended data resources

NERC GridGIS workshop - 1 February Associated Meetings and Workshops DIALOGUE Workshops ( –Data Integration Applications: Linking Organisations to Gain Understanding and Experience –Bringing together Data Integration middleware and application providers with users –Next one at NeSC: 9-10 th February 2006 – Next Generation Distributed Data Management (HPDC15, Paris) – Data Management on Grids (VLDB06, Seoul)

NERC GridGIS workshop - 1 February Conclusions The benefits of trying to integrate data are hindered by challenges such as heterogeneity, scale and distribution A common data service layer should make data integration easier OGSA-DAI provides an extensible, data service based framework which makes it easier to implement data integration GIS data is amenable to integration using data services

NERC GridGIS workshop - 1 February Further information The OGSA-DAI Project Site: – The DAIS-WG site: – OGSA-DAI Users Mailing list –General discussion on grid DAI matters Formal support for OGSA-DAI releases – OGSA-DAI training courses