E-Science and the Grid – Data, Information and Knowledge Tony Hey Director of UK e-Science Core Programme

Slides:



Advertisements
Similar presentations
IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre
Advertisements

Open Educational Resources for Science and Engineering: Creation of Effective Educational Systems Without Compromising Quality Mangala Sunder Krishnan.
Comb-e-Chem Jeremy Frey Sept 2003 From e-Science to Jeremy Frey School of Chemistry University of Southampton, UK X-ray single Mol STM.
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Research Councils ICT Conference Welcome Malcolm Atkinson Director 17 th May 2004.
National e-Science Centre Glasgow e-Science Hub Opening: Remarks NeSCs Role Prof. Malcolm Atkinson Director 17 th September 2003.
National e-Science Centre & e-Science Institute Malcolm Atkinson Director 2 nd March 2005.
Philip LordDigital Archiving Consultancy Alison Macdonald Digital Archiving Consultancy Liz LyonDigital Curation Centre David GiarettaDigital Curation.
SWITCH Visit to NeSC Malcolm Atkinson Director 5 th October 2004.
02/07/03 Grid Support Centre 1 UK Grid Support Centre Alistair Mills CLRC e-Science Centre
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.
Crystal Structure EPrints: Source Through the Open Archive Initiative S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge b. a School of Chemistry, University of Southampton, UK.; b School of Electronics.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
RCUK, Octiber Archiving research data and research publications. Dr Leslie Carr, Intelligence, Agents Multimedia, University of Southampton Dr Simon.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Digital | Curation | Centre Adding value to open access research data: reflections on the process of data curation Dr Liz Lyon, DCC Associate Director.
UKOLN is supported by: Digital Repositories Roadmap: looking forward The JISC/CNI Meeting, July 2006 Rachel Heery Assistant Director R&D, UKOLN
Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath
A centre of expertise in digital information management Moving towards eResearch - some recent trends Dr Liz Lyon, Director UKOLN, University.
A centre of expertise in digital information management Introducing Research Grids and e-Science – whats in it for the Humanities? Dr Liz.
A centre of expertise in digital information management UKOLN is supported by: UK Perspectives on the Curation and Preservation of Scientific.
Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,
UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.
© S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K.
EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.
EBank UK CCLRC Workshop February eBank and CCLRC Workshop February 2005 University of Bath.
Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
Collection-level description & the Information Landscape: users evaluate strategies for resource discovery Collection Description Focus Workshop 5 Cambridge,
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
INFSO-RI Enabling Grids for E-sciencE Grid & Data Preservation Boon Low System Development, EGEE Training National.
E-Science and Global Grids in the Information Society: The Role of an EU e-IRG? Tony Hey Director of the UK e-Science Core Programme
EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.
© HATII, University of Glasgow Introduction to the UK ’ s Digital Curation Centre Prof Seamus Ross Visiting Fellow at Oxford Internet Institute ,
Digital Library Architecture and Technology
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
A long tradition. e-science, Data Centres, and the Virtual Observatory why is e-science important ? what is the structure of the VO ? what then must we.
E-Science and the Grid – for Research and Industry Tony Hey Director of UK e-Science Core Programme
EBank UK: linking scientific data, scholarly communication and learning Michael Day and Rachel Heery UKOLN, University of Bath
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
1 10-June-2004Andy Lawrence : PPARC data curation panel meeting AstroGrid, Data Centres, & Edinburgh What is curation ? Data Centres in the VO era Data.
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Data and the UK e-Science Programme Paul Watson Director North-East Regional e-Science Centre School of Computing Science University of.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
UKOLN is supported by: Introduction to UKOLN Dr Liz Lyon, Director UKOLN, University of Bath, UK Grand Challenge Meeting, June a centre.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
A centre of expertise in digital information management Shaping the e-future? Grids, Web Services and Digital Libraries Professor Tony.
The Collaborative Semantic Grid David De Roure University of Southampton, UK
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
An Introduction to UK e-Science Anne E Trefethen Deputy Director UK e-Science Core Programme.
CombeDay Making Data Openly Available Simon Coles.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
David De Roure Workflows in Support of Large-Scale Science Provenance, a.
The National Grid Service Mike Mineter.
Welcome Grids and Applied Language Theory Dave Berry Research Manager 16 th October 2003.
UKOLN is supported by: Library futures in the new research landscape. Dr Liz Lyon, UKOLN, University of Bath, UK CURL Members Meeting October 2004, London.
RC ICT Conference 17 May 2004 Research Councils ICT Conference The UK e-Science Programme David Wallace, Chair, e-Science Steering Committee.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
Introduction: AstroGrid increases scientific research possibilities by enabling access to distributed astronomical data and information resources. AstroGrid.
DART: Drivers, Design, Dimensions, Demonstrators and Deliverables
Presentation transcript:

e-Science and the Grid – Data, Information and Knowledge Tony Hey Director of UK e-Science Core Programme

Lickliders Vision Lick had this concept – all of the stuff linked together throughout the world, that you can use a remote computer, get data from a remote computer, or use lots of computers in your job. Larry Roberts – Principal Architect of the ARPANET

A Definition of e-Science e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it. John Taylor Director General of Research Councils Office of Science and Technology Purpose of e-Science initiative is to allow scientists to do faster, different, better research

The e-Science Paradigm The Integrative Biology Project involves the University of Oxford (and others) in the UK and the University of Auckland in New Zealand Models of electrical behaviour of heart cells developed by Denis Nobles team in Oxford Mechanical models of beating heart developed by Peter Hunters group in Auckland Researchers need to be able to easily build a secure Virtual Organisation allowing access to each groups resources Will enable researchers to do different science

Common Fabric Group A Group B Resources Generic services e-Infrastructure/Cyberinfrastructure for Research: The Virtual Laboratory Private Resources Private Resources

The Global Grid = A set of core middleware services running on top of Global Terabit Research Networks

The Grid Vision of Foster, Kesselman and Tuecke The Grid is a software infrastructure that enables flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources Includes computational systems and data storage resources and specialized facilities Long term goal for Grid middleware infrastructure is to allow scientists to build transient Virtual Organisations routinely

RCUK e-Science Funding First Phase: 2001 –2004 Application Projects –£74M –All areas of science and engineering Core Programme –£15M Research infrastructure –£20M Collaborative industrial projects Second Phase: 2003 –2006 Application Projects –£96M –All areas of science and engineering Core Programme –£16M Research Infrastructure –DTI Technology Fund

Some Example e-Science Projects Particle Physics –global sharing of data and computation Astronomy –Virtual Observatory for multi-wavelength astrophysics Chemistry –remote control of equipment and electronic logbooks Engineering –industrial healthcare and virtual organisations Bioinformatics –data integration, knowledge discovery and workflow Healthcare –sharing normalized mammograms

CERN Users in the World – A Global VO Europe: 267 institutes, 4603 users Elsewhere: 208 institutes, 1632 users

Powering the Virtual Universe (Edinburgh, Belfast, Cambridge, Leicester, London, Manchester, RAL) Multi-wavelength showing the jet in M87: from top to bottom – Chandra X-ray, HST optical, Gemini mid-IR, VLA radio.

Comb-e-Chem Project X-Ray e-Lab Analysis Properties Properties e-Lab Simulation Video Diffractometer Grid Middleware Structures Database

In flight data Airline Maintenance Centre Ground Station Global Network eg: SITA Internet, , pager DS&S Engine Health Center Data centre DAME Project

myGrid Project Imminent deluge of data Highly heterogeneous Highly complex and inter-related Convergence of data and literature archives

Nucleotide Annotation Workflows Discovery Net Project Download sequence from Reference Server Save to Distributed Annotation Server Interactive Editor & Visualisation Execute distributed annotation workflow NCBIEMBL TIGRSNP Inter Pro SMART SWISS PROT GO KEGG 1800 clicks 500 Web access 200 copy/paste 3 weeks work in 1 workflow and few second execution

eDiaMoND Project Mammograms have different appearances, depending on image settings and acquisition systems Standard Mammo Format Standard Mammo Format Temporal mammography Computer Aided Detection 3D View

Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Southampton London Belfast DL RAL Hinxton UK e-Science Grid

A Status Report on UK e-Science An exciting portfolio of Research Council e-Science projects –Beginning to see e-Science infrastructure deliver some early wins in several areas –TeraGyroid success at SC03: heroic achievement –Astronomy, Chemistry, Bioinformatics, Engineering, Environment, Healthcare …. The UK is unique in having a strong collaborative industrial component –Nearly 80 UK companies contributing over £30M –Engineering, Pharmaceutical, Petrochemical, IT companies, Commerce, Media, …

Identifiable UK Focus Data Access and Integration –OGSA-DAI and DAIT project Grid Data Services –Workflow, Provenance, Notification –Distributed Query, Knowledge Management Data Curation and Data Handling –Digital Curation Centre Security, AA and all that –Digital Certificates and Single Sign-On –Federated Shibboleth framework for universities

Metadata & Ontologies Metadata – computationally accessible data about the services Ontologies – the shared and common understanding of a domain –A vocabulary of terms –Definition of what those terms mean. –A shared understanding for people and machines –Usually organised into a taxonomy.

The Semantic Grid: Data to Knowledge Computational Complexity Data Complexity

JISC Committee for Support of Research (JCSR) Ensure JISC addresses the needs of the HE research community Recurrent budget of £3M p.a. Strategy to co-fund some of the JCSR activities with Research Councils Report on e-Science Data Curationavailable JISC emphasis on the D of R&D and on Best Practice, Training and Services

JISC/JCSR e-Science Support Digital Curation Centre –Joint funding with e-Science Core Programme The e-Bank Project –Uses Comb-e-Chem Project as exemplar Text Mining Centre –Led by UMIST

2.4 Petabytes Today

Digital Curation Centre (DCC) In next 5 years e-Science projects will produce more scientific data than has been collected in the whole of human history In 20 years can guarantee that the operating and spreadsheet program and the hardware used to store data will not exist Research curation technologies and best practice Need to liaise closely with individual research communities, data archives and libraries Edinburgh with Glasgow, CLRC and UKOLN selected as site of DCC

Terminology: Digital Curation Digital Curation = Digital Preservation and Data Curation Actions needed to maintain and utilise digital data and research results over entire life-cycle –For current and future generations of users Digital Preservation –Long-run technological/legal accessibility and usability Data curation in science –Maintenance of body of trusted data to represent current state of knowledge in area of research

Digital Preservation: The issues Long-term preservation –Preserving the bits for a long time (digital objects) –Preserving the interpretation (emulation vs. migration) Political/social –Appraisal - what to keep? –Responsibility - who should keep it? –Legal - can you keep it? Size –Storage of/access to Petabytes of regular data –Grid issues Finding and extracting metadata –Descriptions of digital objects

Data Publishing: The Background In some areas – notably biology – databases are replacing (paper) publications as a medium of communication –These databases are built and maintained with a great deal of human effort –They often do not contain source experimental data.Sometimes just annotation/metadata –They borrow extensively from, and refer to, other databases –You are now judged by your databases as well as your (paper) publications! –Upwards of 1000 (public databases) in genetics

Data Publishing: The issues Data integration –Tying together data from various sources Annotation –Adding comments/observations to existing data –Becoming a new form of communication among scientists Provenance –Where did this data come from? Exporting/publishing in agreed formats –To other program as well as people Security –Specifying/enforcing read/write access to parts of your data

Edinburgh has research positions in databases, digital curation, XML, web technology, fundamentals. Top-rated department. World-class database group. Good connections with logical foundations, scientific DBs, distributed computation (Grid) Edinburgh is a great place to live!!! Contact Peter Buneman

The e-Bank JISC e-Science Project School of Chemistry and School of Electronics and Computer Science University of Southampton UKOLN University of Bath Psigate University of Manchester

or Referee on demand? High data throughout Any given data set is not that important Cannot justify a full referee process for each Better to make data available rather than simply leave it alone Need to have access to raw data to allow users to check

Goals of e-Bank Project Provide self archive of results plus the raw and analysed data Provide a route to disseminate these results Links from traditionally published work provides the provenance to the work Disseminate for Public Review – raw data provided so that users can check themselves Avoid the publication bottleneck but still provide the quality check

Crystallographic e-Prints EPrint (Local) EBank (World) JOURNAL PUBLICATION DATA HOLDING INVESTIGATION RAW DERIVED RESULTS CIF STRUCTURE REPORT DATASET (Contains DATAFILES) REPORT (EPrint) EBank REPORT

Crystallographic e-Prints Note this is a fully rotateable 3D image of the molecule

Direct access to data DERIVED DATA Links to download the raw and processed data

Direct access to data RAW DATA Raw data sets can be very large and these are stored at the Atlas Datastore (using SRB server) and made available via a URI resolver

Moving on from Crystallography Crystallography only a start –Chosen due to suitability of data –International agreement on representation of much of the data Next stage spectroscopic data –Interest of several instrument manufacturers –Again use international standards

e-Bank: Some Comments Data as well as traditional bibliographic information is made available via an OAI interface Can construct high level search on data – aggregate data from many e-print systems Build new data services Will make provision of real spectra (rather than very reduced summaries) for chemistry publications

Grid E-Scientists Entire E-Science Cycle Encompassing experimentation, analysis, publication, research, learning 5 Institutional Archive Local Web Publisher Holdings Digital Library E-Scientists Graduate Students Undergraduate Students Virtual Learning Environment E-Experimentation E-Scientists Technical Reports Reprints Peer- Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses Data, Metadata & Ontologies

JCSR Text Mining Centre Initial focus is biology/biomedicine domain. –Growth of biomedical knowledge means users need new tools to deal with an increasingly large body of biomedical articles Attempt to discover new, previously unknown information by applying techniques from natural language processing, data mining, and information retrieval Develop prototype service for academia and industry UMIST/UofManchester selected as Centre

Grids in Education? Exploiting e-Science Grids whose resources can be adapted for use in education –Opportunity to make education more real and to give students an idea what scientific research is like Support the teachers and learners with Community Grids –Heterogeneous community with teachers, learners, parents, employers, publishers, informal education, university staff …. Education Grid as a Grid of Grids?

Education Grid Teacher Educator Grids Informal Education (Museum) Grid Student/Parent … Community Grid Science Grids Bioinformatics Earth Science ……. Typical Science Grid Service such as Research Database or simulation Transformed by Grid Filter to form suitable for education Learning Management Grid Publisher Grid Campus or Enterprise Administrative Grid Education as a Grid of Grids (with thanks to Geoffrey Fox) Digital Library Grid

The JISC Communities Portals Applications Content Meta Data & Delivery tools Finding /Access tools E-learning Digital librariesE-science Internet AAA Services

MIT DSpace Vision Much of the material produced by faculty, such as datasets, experimental results and rich media data as well as more conventional document-based material (e.g. articles and reports) is housed on an individuals hard drive or department Web server. Such material is often lost forever as faculty and departments change over time.

A Definition of e-Research? The invention and exploitation of advanced IT –to generate, curate and analyse research data –to develop and explore models and simulations –to enable dynamic distributed virtual organisations

Acknowledgements With special thanks to Peter Buneman, Peter Burnhill, Jeremy Frey, David Gavaghan, Carole Goble and Liz Lyon