Brian Matthews, CRIS 2002, 31/08/02 1 Accessing the Outputs of Scientific Projects Brian Matthews, Michael Wilson, Business & Information Technology Dept,


Similar presentations
Grey Literature, Institutional Repositories and the Organisational Context Simon Lambert, Brian Matthews & Catherine Jones Business & Information Technology.

September 13, 2004NVO Summer School1 VO Protocols Overview Tom McGlynn NASA/GSFC T HE US N ATIONAL V IRTUAL O BSERVATORY.
Remote Visualisation System (RVS) By: Anil Chandra.
Louisa Casely-Hayford e-Science Ontologies & Ontology tools for the CCLRC Neutron & Muon Facility.
Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC.
Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge b. a School of Chemistry, University of Southampton, UK.; b School of Electronics.
Towards an information model for I2S2
I2S2 - Infrastructure for Integration in Structural Sciences Cross-Institutional Pilot
1 euroCRIS Members Meeting Tartu Eddy Grąbczewski May 2005.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 12 Slide 1 Distributed Systems Design 2.
1 CEOS/WGISS20 – Kyiv – September 13, 2005 Paul Kopp SIPAD New Generation: Dominique Heulet CNES 18, Avenue E.Belin Toulouse Cedex 9 France
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
The KB on its way to Web 2.0 Lower the barrier for users to remix the output of services. Theo van Veen, ELAG 2006, April 26.
University of Southampton, U.K.
EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
SpaceGRID and EGSO Satu Keski-Jaskari Maria Vappula Parallal Computing – Seminar
Information architecture Summary Natalia Shatokhina CS575 Spring 2010.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Center for Environmental Studies Arizona State University Digital Research Records at Center for Environmental Studies Peter McCartney.
The Earth System Grid Discovery and Semantic Web Technologies Line Pouchard Oak Ridge National Laboratory Luca Cinquini, Gary Strand National Center for.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
EU 2nd Year Review – Jan – WP9 WP9 Earth Observation Applications Demonstration Pedro Goncalves :
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Integrated e-Infrastructure for Scientific Facilities Kerstin Kleese van Dam STFC- e-Science Centre Daresbury Laboratory
Metadata for Large Science: The ICAT Data Model Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory.
1 The NERC DataGrid DataGrid The NERC DataGrid DataGrid AHM 2003 – 2 Sept, 2003 e-Science Centre Metadata of the NERC DataGrid Kevin O’Neill CCLRC e-Science.
Brian Matthews, CRIS 2002, 30/08/02 ERIS Workshop, CRIS2002 Architecture Brian Matthews, Business & Information Technology Dept, CLRC
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
F. Toussaint (WDCC, Hamburg) / / 1 CERA : Data Structure and User Interface Frank Toussaint Michael Lautenschlager World Data Center for Climate.
CSED Computational Science & Engineering Department CHEMICAL DATABASE SERVICE The Current Service is Well Regarded The CDS has a long and distinguished.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
1 All-Hands Meeting 2-4 th Sept 2003 e-Science Centre The Data Portal Glen Drinkwater.
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
The european ITM Task Force data structure F. Imbeaux.
The Digital Library for Earth System Science: Contributing resources and collections Meeting with GLOBE 5/29/03 Holly Devaul.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Building the e-Minerals Minigrid Rik Tyer, Lisa Blanshard, Kerstin Kleese (Data Management Group) Rob Allan, Andrew Richards (Grid Technology Group)
State Key Laboratory of Resources and Environmental Information System China Integration of Grid Service and Web Processing Service Gao Ang State Key Laboratory.
1 Chapter 1 Introduction to Databases Transparencies.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Metadata for structural science Workshop on research metadata in context Nijmegen, 7–8 September 2010 Simon Lambert STFC e-Science UK.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
Edinburgh e-Science MSc Bob Mann Institute for Astronomy & NeSC University of Edinburgh.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Find Research Data B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
CombeDay Making Data Openly Available Simon Coles.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Brian Matthews, euroCRIS, 18/09/03 CRIS architecture to support an ERA Brian Matthews.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
Flanders Marine Institute (VLIZ)
Research Data Context Preservation in SCAPE
Grid Portal Services IeSE (the Integrated e-Science Environment)
VI-SEEM Data Repository
Session 2: Metadata and Catalogues
Reportnet 3.0 Database Feasibility Study – Approach
Presentation transcript:

Brian Matthews, CRIS 2002, 31/08/02 1 Accessing the Outputs of Scientific Projects Brian Matthews, Michael Wilson, Business & Information Technology Dept, CLRC Kerstin Kleese-van Dam E-Science Centre, CLRC

Brian Matthews, CRIS 2002, 31/08/02 2 Overview Science produces two outputs – Conventional Publications – Science Data Sets In traditional Science, the 1 st is used as a measure of success –The second is locked away. In this talk I shall discuss: –A general purpose science data portal for allowing access to data sets –Potential links to publications. To make all the outputs of science available.

Brian Matthews, CRIS 2002, 31/08/02 3 Central Laboratory of the Research Councils 1700 staff - supporting scientists and engineers from universities and industry Based at 3 sites: –Daresbury Laboratory –Rutherford Appleton Laboratory –Chilbolton Observatory A Multidisciplinary Laboratory Who we are (CLRC)

Brian Matthews, CRIS 2002, 31/08/02 4 A Multidisciplinary Laboratory Spallation Neutron and Muon Source (ISIS) Synchrotron Radiation Source (SRS) Lasers Microstructures Space Science and Technology Molecular Spectroscopy Earth Observation Atmospheric Science Computational Science Energy Research Information Technology Particle Physics Radio Communications Surfaces Transforms and Interfaces

Brian Matthews, CRIS 2002, 31/08/02 5 The Problem Scientific institutions generate vast quantities of data –CLRC - ISIS, SRS, Space Science, Particle Physics, Computational Science,... More data coming on stream all the time: –CERN-LHC, Diamond, CASIM, HGP,... Very good at handling large amounts of data Diverse approaches to organising and distributing it. Need a usable way of gaining access to the data

Brian Matthews, CRIS 2002, 31/08/02 6 User Scenarios Lecturer: –This published study would be a good example for teaching, is the raw data publicly available? Researcher: –This is an interesting paper - can I check the data? Experiment Proposer: –Have there been any neutron or X-Ray studies of this molecule at 100 K? What reports and papers have been published on them? Instrument Scientist: –The instrument seems a bit unstable recently, fetch me the results of all calibration runs from the last 3 months? Is there are report on this instrument? Need a usable way of gaining access to publications with data

Brian Matthews, CRIS 2002, 31/08/02 7 The Data Portal Concept Single point of access to the CLRC data resources Encompasses a wide range of data holdings –Describes what data is available from the facilities –Links to the data held at the facility –Different archiving methods Caters for a wide range of users –general community  data curators Supports a wide range of queries –employing data mining, thesauri, ….

Brian Matthews, CRIS 2002, 31/08/02 8 Combine Diverse Users & Searches... DiscoveryExcavation Wider science community Data curator Specialist user Experimenter General community

Brian Matthews, CRIS 2002, 31/08/02 9 … with Distributed Data Silos…. Facility 1Facility 2Facility 3Facility 4

Brian Matthews, CRIS 2002, 31/08/02 10 …using a central common metadata index... http CLRC Data Access Server Client XML wrapper Common metadata catalogue database Local data Local metadata XML wrapper Facility 1

Brian Matthews, CRIS 2002, 31/08/02 11 … and a Web based interface Exploit the existing Web infrastructure. –Use New Technologies (XML/RDF); –rapidly disseminated; –widely accessible; –database and user platform independent –can be developed now, but with the GRID in mind. Every user who needs to can get to the information.

Brian Matthews, CRIS 2002, 31/08/02 12 Metadata Science Metadata Model ISISSRSHEP Space Science Social Science Env. Science A generic metadata model for all scientific applications with Specialisation for each domain Can answer questions across domains Can answer questions about specific domains

Brian Matthews, CRIS 2002, 31/08/02 13 Metadata Model Metadata Object Topic Study Description Access Conditions Data Location Data Description Related Material Keywords providing a index on what the study is about. Provenance about what the study is, who did it and when. Conditions of use providing information on who and how the data can be accessed. Detailed description of the organisation of the data into datasets and files. Locations providing a navigational to where the data on the study can be found. References into the literature and community providing context about the study.

Brian Matthews, CRIS 2002, 31/08/02 14 Study Description The Study is the basic unit for a scientific activity. Can be further divided into: –Programmes: for connected studies. –Investigations: for a single measurement, experiment or simulation.

Brian Matthews, CRIS 2002, 31/08/02 15 Hierarchy of Data Holdings With investigations, there are associated data holdings. These are themselves arranged in a hierarchy: data sets, and files, with links between them Logical organisation – identity separated from location. Data Holding File 1 name: date: Investigation Data Holding Data-Set 1 (Raw)Data-Set 2 (Inter)Data-Set 3 ( Final) File 1 name: date: File 1 name: date:

Brian Matthews, CRIS 2002, 31/08/02 16 Metadata example Chemistry Crystal Structure Copper... Crystal Structure: Copper : Palladium: :complex: 150K... Porter... University of Peebles... EPSRC... 21/04/1999…. To study the structure of Copper and Palladium co-ordination complexes at a 150K. Teat... SRS Station 9.8, BRUKER AXS SMART 1K......Wavelength... Angstrom …Crystal-to-detector distance cm The user has to be one of: Prof. F. Porter….

Brian Matthews, CRIS 2002, 31/08/02 17 Metadata collection Metadata collection and maintenance is a big problem. But doing science is a process. Submit proposal Prepare experiment Generate results Analyse results Write report Provenance metadata + access conditions data description +++ data location Related material Collecting the metadata can then become part of the experimental support environment

Brian Matthews, CRIS 2002, 31/08/02 18 Grid middleware Architecture Users Other Data Portals Local data Local metadata XML wrapper Facility 4 Local data Local metadata XML wrapper Facility 2 Local data Local metadata XML wrapper Facility 1 Local data Local metadata XML wrapper Facility 3 CLRC broke r XML wrapper Common metadata catalogue database CLRC Data Portal

Brian Matthews, CRIS 2002, 31/08/02 19 Server Architecture User input interpreter pre-set XSL Script Query Generator USER Central metadata repository XML File XML Parser Key: Internal http Ascii file External agent module User output generator Response Generator Local metadata repository XML File

Brian Matthews, CRIS 2002, 31/08/02 20 Example Result of searching : search across facilities - returns XML to session and displays summary

Brian Matthews, CRIS 2002, 31/08/02 21 Expand Results - give more details from the same XML

Brian Matthews, CRIS 2002, 31/08/02 22 Going Deeper - Can browse the data sets

Brian Matthews, CRIS 2002, 31/08/02 23 Select data - pick the required data files and download from convenient location.

Brian Matthews, CRIS 2002, 31/08/02 24 Current developments Pilot completed Consolidate and broaden existing system –move towards a development system –handle a greater diversity of data sources – e.g. Max Planck Institute for Meteorology Enhance the Technology –Web services (SOAP, WDSL, OGSA, XML Query) Provide links to other information sources: –Library systems –Thesauri

Brian Matthews, CRIS 2002, 31/08/02 25 Interface with existing archives CLRC maintains existing data archives –Atmospheric, earth observation, STP, astronomy. –Existing access mechanisms (Web, Z39.50) –Existing metadata catalogues and formats Can we use the Data Portal to access them? –Use the Metadata format as a framework to be specialised to express existing metadata framework –XML Query as a query layer on the archive

Brian Matthews, CRIS 2002, 31/08/02 26 Re-architect system Break up the portal middleware into components. DP Results collation Data source location Query generation ontology service Security service Replication service User service replication service Globus GIS - MDS Globus GSI Grid Enable with Web Services RDF+DAML+OIL XML Query

Brian Matthews, CRIS 2002, 31/08/02 27 Access to Data and Publications The Data Portal offers the potential to integrate the outputs of scientific research: data and publications. Need to have a common search mechanism over library and data portals. –Can abstract the science metadata to Dublin Core. –Links to CERIF would further deepen connection. –Access to common thesauri for classification. Common web service interface –Data Portal provides this. –XML Query as a communication mechanism

Brian Matthews, CRIS 2002, 31/08/02 28 Mapping between Dublin Core and Science Metadata Title –Study: Name Creator –Study: Investigator: Name (Role is principle investigator) Subject –Topic: Keyword Description –Study: Study Information: Purpose Publisher –Investigation: Data Manager Contributor –Study: Investigator: Name ; Investigation: Data Manager Date –Study: Study Information: Time Resource Type –Collection; or Dataset. Format –Data Description: File Format Resource Identifier –Study: Study Id (whole study) –Data description: File: URI (for individual data files). Source –Data description: Data sets: Related Data sets –Related Material: Related work Language –Not covered in the current metadata format; but an simple extension Relation –Related Material: Related work Coverage –Data description: Logical Description: Coverage Rights Management –Access Conditions

Brian Matthews, CRIS 2002, 31/08/02 29 Where are we? Data Portal up and running –Being developed in the E-Science Centre in CLRC –Science metadata proving very robust –Trying to extend its use into other areas of science – materials science, environmental science. Beginning to approach the problem of integrating with electronic library resources.