Neil Chue Hong Project Manager, EPCC +44 131 650 5957 Data Services What, Why, How e-Research Meeting NeSC, 2 nd.

Slides:



Advertisements
Similar presentations
1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Advertisements

OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
© Fraunhofer Institute SCAI and other members of the SIMDAT consortium Data Grids for Process and Product Development using Numerical Simulation and Knowledge.
The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh Alan Chappell PNNL
Research Councils ICT Conference Welcome Malcolm Atkinson Director 17 th May 2004.
National e-Science Centre Glasgow e-Science Hub Opening: Remarks NeSCs Role Prof. Malcolm Atkinson Director 17 th September 2003.
Data services on the NGS.
A centre of expertise in data curation and preservation DCC/NeSC eScience Workshop, June 2008 Working in partnership with the eScience community This work.
The National Grid Service and OGSA-DAI Mike Mineter
E-Science Data Information and Knowledge Transformation Edikt : e-Science Data, Information and Knowledge Transformation NeSC Review, 30 September 2003.
Current status of grids: the need for standards Mike Mineter TOE-NeSC, Edinburgh.
SWITCH Visit to NeSC Malcolm Atkinson Director 5 th October 2004.
Peter Berrisford RAL – Data Management Group SRB Services.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Joint Information Systems Committee Digital Library Services BL/JISC Workshop Rachel Bruce JISC Programme Director The Digital Library and its Services,
A centre of expertise in data curation and preservation DCC Workshop: Curating sApril 24 – 25, 2006 Funded by: This work is licensed under the Creative.
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
An Overview of OGSA-DAI Kostas Tourlas
Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
EInfrastructures (Internet and Grids) US Resource Centers Perspective: implementation and execution challenges Alan Blatecky Executive Director SDSC.
INFSO-RI Enabling Grids for E-sciencE Grid & Data Preservation Boon Low System Development, EGEE Training National.
EGEE is a project funded by the European Union under contract IST International Summer School on Grid Computing Vico Equense, 16 th July 2005.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
Grid Information Systems. Two grid information problems Two problems  Monitoring  Discovery We can use similar techniques for both.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
1 Data services and computing. 2 We tend to be dealt the computing environment in which we must operate. Few of us have enough influence to steer the.
DAIT (DAI Two) NeSC Review 18 March Description and Aims Grid is about resource sharing Data forms an important part of that vision Data on Grids:
OGSA-DAI in OMII-Europe Neil Chue Hong EPCC, University of Edinburgh.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
DataTAG Research and Technological Development for a Transatlantic Grid Abstract Several major international Grid development projects are underway at.
Introduction to Grid Computing Ed Seidel Max Planck Institute for Gravitational Physics
Data and storage services on the NGS Mike Mineter Training Outreach and Education
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
Data and storage services on the NGS.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
NERC e-Science Meeting Malcolm Atkinson Director & e-Science Envoy UK National e-Science Centre & e-Science Institute 26 th April 2006.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
OGSA-DAI.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Support to scientific.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Bob Jones EGEE Technical Director
Open Exeter Project Team
Data services on the NGS
Data services on the NGS
Joseph JaJa, Mike Smorul, and Sangchul Song
EGI Webinar - Introduction -
Presentation transcript:

Neil Chue Hong Project Manager, EPCC Data Services What, Why, How e-Research Meeting NeSC, 2 nd March 2005

e-Research within The University of Edinburgh Overview The difficulty with data Data Services Data Middleware Data Repositories

e-Research within The University of Edinburgh The Data Deluge Entering an age of data –Data Explosion –CERN: LHC will generate 1GB/s = 10PB/y –VLBA (NRAO) generates 1GB/s today –Pixar generate 100 TB/Movie –Storage getting cheaper Data stored in many different ways –Data resources –Relational databases –XML databases / files –Result files Need ways to facilitate –Data discovery –Data access –Data integration Empower e-Business and e-Science –The Grid is a vehicle for achieving this

e-Research within The University of Edinburgh What is e-Science? Goal: to enable better research Method: Invention and exploitation of advanced computational methods –to generate, curate and analyse research data –From experiments, observations and simulations –Quality management, preservation and reliable evidence –to develop and explore models and simulations –Computation and data at extreme scales –Trustworthy, economic, timely and relevant results –to enable dynamic distributed virtual organisations –Facilitate collaboration with resource sharing –Security, reliability, accountability, and manageability Multiple, independently managed sources of data – each with own time-varying structure Creative researchers discover new knowledge by combining data from multiple sources

e-Research within The University of Edinburgh Composing Observations in Astronomy No. & sizes of data sets as of mid-2002, grouped by wavelength 12 waveband coverage of large areas of the sky Total about 200 TB data Doubling every 12 months Largest catalogues near 1B objects Data and images courtesy Alex Szalay, John Hopkins

e-Research within The University of Edinburgh Data Services: motives Key to Integration of Scientific Methods –Publication and sharing of results –Primary data from observation, simulation & experiment –Encourages novel uses –Allows validation of methods and derivatives –Enables discovery by combining data collected independently Key to Large-scale Collaboration –Economies: data production, publication & management –Sharing cost of storage, management and curation –Many researchers contributing increments of data –Pooling annotation leads to rapid incremental publication –Accommodates global distribution –Data & code travel faster and more cheaply –Accommodates temporal distribution –Researchers assemble data –Later (other) researchers access data

e-Research within The University of Edinburgh Data Services: challenges Scale –Many sites, large collections, many uses Longevity –Research requirements outlive technical decisions Diversity –No one size fits all solutions will work –Primary Data, Data Products, Meta Data, Administrative data, … Many Data Resources –Independently owned & managed –No common goals –No common design –Work hard for agreements on foundation types and ontologies –Autonomous decisions change data, structure, policy, … –Geographically distributed and I havent even mentioned security yet!

e-Research within The University of Edinburgh The Discovery Process Choosing data sources –How do you find them? –How do they describe and advertise them? –Is the equivalent of Google possible? Obtaining access to that data –Overcoming administrative barriers –Overcoming technical barriers Understanding that data and extracting from multiple sources –The parts you care about for your research Combing them using sophisticated models –The picture of reality in your head Analysis on scales required by statistics –Coupling data access with computation Repeated Processes –Examining variations, covering a set of candidates –Monitoring the emerging details

e-Research within The University of Edinburgh Small problems Not just Grand Challenges! –Also the small problems For instance: –What happens to data when a researcher leaves a team? –How can a research leader point to popular data when a new researcher joins? –How can you manage your data when you start to run out of local storage space? –How do I get my data from one format/database to another? –How do I combine my data with your data? You need to manage your data: metadata

e-Research within The University of Edinburgh What is a data service? An interface to a stored collection of data –e.g. Google and Amazon –web services But the data could be: –replicated –shared –federated –virtual –incomplete Dont care about the underlying representation –do care about the information it represents

e-Research within The University of Edinburgh Examples of Data Services Many Data Services and applications –Commercial databases –Web interfaces –Applications developed individually by groups and projects Also many places to get hold of public data –Publications and citation servers –Results servers Highlight a few of these –principally ones trying to bridge the gap between local and distributed But… no such thing as a free lunch –Things are not yet Plug and Play –You will need to expend some effort to use these tools effectively

e-Research within The University of Edinburgh OGSA-DAI / DQP Data Access and Integration / Distributed Query Processing – –Provides a way to access and query hetereogenous, structured data resources –Relational databases –XML databases –files –Provides a framework for extending services –more smarts, closer to the data –Everything looks like a database National Grid Service starting to host –both through OGSA-DAI and Oracle

e-Research within The University of Edinburgh SRB Storage Resource Broker – –Provides a way to access data sets and resources based on their attributes and/or logical names rather than their names or physical locations. –may be hetererogenous, distributed and/or replicated –Many different ways of connecting –Can connect SRB systems together –zoneSRB –Everything looks like a filesystem

e-Research within The University of Edinburgh SRM and more Storage Resource Managers – –a joint effort between a number of institutions –EU DataGrid/CERN, FermiLab, LBNL, JL –to define a standardised interface to Storage Resource Managers so that different implementations can work together –principally between physics communities, extending further now Many other examples of data middleware –Replication management and location: RLS, QCDGrid –Many datagrids: SciDAC, Gfarm –GridFTP for efficient transfer –Packaged software: Virtual Data Toolkit

e-Research within The University of Edinburgh EDINA and friends EDINA – –Offers the UK tertiary education and research community networked access to a library of data, information and research resources, e.g geographical data Digital Curation Centre – –support UK institutions to store, manage and preserve these data to ensure their enhancement and their continuing long-term use. Other national data centres: –MIMAS, UKDA, CCLRC DataPortal…

e-Research within The University of Edinburgh Summary Data is important to research –across all disciplines There is already a large amount of data –but its sometimes difficult to find and bring together Data Services are built to standards –which define particular functionality Data Services should be composable –so that it is easier to work with data There is already software out there –so it is possible to evaluate against your requirements