Download presentation
Presentation is loading. Please wait.
Published byAshlynn Marianna Fletcher Modified over 9 years ago
1
Neil Chue Hong Project Manager, EPCC N.ChueHong@epcc.ed.ac.uk +44 131 650 5957 OGSA-DAI Requirements Gathering Exercise 2 nd DIALOGUE workshop eSI, 9-10 February 2006
2
2nd DIALOGUE workshop - 9-10 February 20062 OGSA-DAI Requirements Gathering Aims –learn more about the data access and integration challenges that other projects are facing –use this information to inform the future development of the OGSA- DAI software Timescale –Nov 2005 – Jan 2006 Gatherers –Ally Hume –Amy Krause –Tom Sugden
3
2nd DIALOGUE workshop - 9-10 February 20063 Projects AstroGrid –(www.astrogrid.org) - distributed queries over large astronomy databases.www.astrogrid.org Automed and ISpider –(www.doc.ic.ac.uk/automed/) and (www.ispider.man.ac.uk) – model-based data integration and Grid-based informatics platform for proteomics.www.doc.ic.ac.uk/automed/www.ispider.man.ac.uk CancerGrid –(www.cancergrid.org) – storage and analysis of distributed data containing clinical trial and lab data.www.cancergrid.org ESSC –(www.nerc-essc.ac.uk[MA1]) – environmental and atmospheric simulations.[MA1] Gold –(www.goldproject.ac.uk) – provides infrastructure for virtual organisations.www.goldproject.ac.uk NTRAC –(www.ntrac.org.uk) – similar to CancerGrid.
4
2nd DIALOGUE workshop - 9-10 February 20064 Structure of Meeting Reports Data –the kind of data that the project is concerned with, including the structure, quantity and types of data resource. Queries –the types of queries that are performed against this data, including the query languages used and the typical size of result sets. The problem –the main problems that the project are currently facing with regards to data access and integration. What Can OGSA-DAI Provide? –the functionality that the project would like OGSA-DAI to provide. Checklist –summarises the importance of various aspects of data access and integration for the project.
5
2nd DIALOGUE workshop - 9-10 February 20065 AstroGrid a number of distributed databases, each of which contains astronomical data captured from different modalities Almost all the tables in these databases contain a spatial coordinate of each feature and some numerical attributes associated with that feature. want to do distributed queries using their algorithmic domain- specific joins.
6
2nd DIALOGUE workshop - 9-10 February 20066 AutoMed and ISpider middleware to transform schemas from different data sources (relational databases, XML documents, etc.) and evaluate distributed queries expressed in their own IQL language. By creating a path of schema- transformations, it is possible to federate multiple data sources so that they appear as a single data source to the user how to optimise distributed queries using metadata such as data size, occurrence of indexes, performance rates, etc. how to fit AutoMed into a grid architecture
7
2nd DIALOGUE workshop - 9-10 February 20067 CancerGrid By analysing laboratory data and correlating it with hospital and trials data, it is hoped that new subsets of patients can be discovered who respond best to particular treatments Security is a major concern because many of the owners of data are aware of the value of their data and consequently are concerned about who has access to it. A good means of transforming trial forms (XML documents) into a format suitable for automatic insertion into relational tables is required.
8
2nd DIALOGUE workshop - 9-10 February 20068 ESSC dealing with large data sets of between 2 to 3 terabytes, stored mostly on a single machine. The user requests portions of data, often assembled from various files. Uniform web service interfaces are provided for accessing data sets using the standard APIs associated with the binary data file formats that are used (netCDF, GRIB, HDF, etc.). The queries used by ESCC are currently synchronous which causes request timeout problems when the resulting datasets are large. Sceptical of current WS-Notification implementations that require open ports on client machines.
9
2nd DIALOGUE workshop - 9-10 February 20069 GOLD develop an infrastructure to facilitate collaboration within virtual organisations Data storage services will be used for capturing interactions amongst parties of a VO in order to facilitate auditing and VO-playback. Data analysis services will be used for performing particular types of analysis of data existing mostly in relational database back-ends. primary concern is managing security policies and service access rights of different types of user dynamically.
10
2nd DIALOGUE workshop - 9-10 February 200610 NTRAC build platforms to bring different systems together Many of the data resources that they are accessing are stored in private networks (e.g. NHS patient information) with no open gateway to the public. Researchers want to mine the data to find people to recruit into studies.
11
2nd DIALOGUE workshop - 9-10 February 200611 Prioritised Requirements
12
2nd DIALOGUE workshop - 9-10 February 200612 Notes on requirements Prioritised based on a judgement of their importance to the various projects that were investigated. –Whether or not they are within the scope of the OGSA-DAI project, or have already satisfied by OGSA-DAI, is not considered here. Frequent mention of the non-functional requirement: ease-of- use. –Some concern that installation and configuration remains too complex when compared with typical WAR-based web service deployment. Hope to publish the full document in near future –let me know if you want a copy
13
2nd DIALOGUE workshop - 9-10 February 200613 Conclusions Efficient transportation of large quantities of data between heterogeneous data resources is a crucial requirement for several projects from distinct domains. –This is also an implicit requirement for projects requiring data federation and distributed query processing. –If we could solve this problem, it would be of great benefit to these projects, and also to higher-level middleware projects such as OGSA-DQP Security remains a major concern because of the commercial and sensitive nature of much data –want a generalised, role-based mechanism for exposing different views of data resources to different users, and managing these views dynamically. –is this outside the scope of data integration middleware? While we were previously aware of most of the requirements described in this document, associating them with actual projects can help with prioritisation.
14
2nd DIALOGUE workshop - 9-10 February 200614 Further information The OGSA-DAI Project Site: –http://www.ogsadai.org.uk The DAIS-WG site: –http://forge.gridforum.org/projects/dais-wg/ OGSA-DAI Users Mailing list –users@ogsadai.org.uk –General discussion on grid DAI matters Formal support for OGSA-DAI releases –http://bugs.ogsadai.org.uk/ OGSA-DAI training courses
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.