Data and storage services on the NGS Mike Mineter Training Outreach and Education

Slides:



Advertisements
Similar presentations
The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Advertisements

Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
The National Grid Service Mike Mineter.
The Storage Resource Broker and.
Neil Chue Hong Project Manager, EPCC Data Services What, Why, How e-Research Meeting NeSC, 2 nd.
National e-Science Centre Glasgow e-Science Hub Opening: Remarks NeSCs Role Prof. Malcolm Atkinson Director 17 th September 2003.
Data services on the NGS.
The National Grid Service and OGSA-DAI Mike Mineter
Peter Berrisford RAL – Data Management Group SRB Services.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
Porto, January Grid Computing Course Summary of day 2.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Research Data Management At the Smithsonian Using SIdora Nano Tech Working Group May 15, 2014.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
KNMI Applications on Testbed 1 …and other activities.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
The National Grid Service Guy Warner.
INFSO-RI Enabling Grids for E-sciencE Supporting legacy code applications on EGEE VOs by GEMLCA and the P-GRADE portal P. Kacsuk*,
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
OGSA-DAI.
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
MTA SZTAKI Hungarian Academy of Sciences Introduction to Grid portals Gergely Sipos
Next Steps: becoming users of the NGS Mike Mineter
Technical Update 2008 Sandy Payette, Executive Director Eddie Shin, Senior Developer April 3, 2008 Open Repositories 2008, Fedora User Group.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Next Steps.
Creating and running an application.
The National Grid Service Mike Mineter.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Introduction to The Storage Resource.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
EGEE-0 / LCG-2 middleware Practical.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Data and storage services on the NGS.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
Rob Allan Daresbury Laboratory NW-GRID Training Event 26 th January 2007 Next Steps R.J. Allan CCLRC Daresbury Laboratory.
MAPS Middleware Action Plan & Strategy Project Middleware Action Plan & Strategy Project (MAPS) Patricia McMillan, Project Manager.
The National Grid Service Mike Mineter.
The National Grid Service Mike Mineter
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
The Storage Resource Broker and.
Monitoring Guy Warner NeSC Training.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data management in EGEE.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Overview of gLite, the EGEE middleware Mike Mineter Training Outreach Education National.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 22 February 2006.
OGSA-DAI.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
Kathleen Shearer Data management: The new frontier for libraries.
(on behalf of the POOL team)
Data services on the NGS
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Induction course Mike Mineter Training Outreach and Education
Data services on the NGS
Data services in gLite “s” gLite and LCG.
Presentation transcript:

Data and storage services on the NGS Mike Mineter Training Outreach and Education

2 Policy for re-use This presentation can be re-used for academic purposes. However if you do so then please let training- know. We need to gather statistics of re-use: no. of events, number of people trained. Thank you!!training-

3 Data and storage Yesterday: how to run jobs BUT –Can’t have computation without data! AND BUT –It’s the data that drives research –Data → Information → Knowledge AND YEAH BUT NO BUT –Can’t have digital data without computation!

4 Astronomy No. & sizes of data sets as of mid-2002, grouped by wavelength 12 waveband coverage of large areas of the sky Total about 200 TB data Doubling every 12 months Largest catalogues near 1Billion objects Data and images courtesy Alex Szalay, John Hopkins University

5 Biomedical PDB 39,853 Protein structures EMBL DB 111,416,302,701 nucleotides PDB - Protein Data Bank EMBL Nucleotide Sequence Database

6 Data, Data Everywhere Entering an age of data: data explosion – growing volumes –CERN: LHC will generate 1GB/s = 10PB/y Data stored in many different ways – growing diversity –Relational databases –XML databases –Flat files From –Legacy datasets –Simulations –New measurements, sources, analyses…. Need integration across personal, public, licensed, and project datasets

7 So how do we…. Mine data riches for nuggets of information? –Discovery depends on insights –Unpredictable or unexpected use of data Facilitate –Data discovery –Data understanding –Data access –Data integration Empower e-Business and e-Research A Grid is a vehicle for achieving this

8 A Grid and data Grid: diverse services sharing AuthN and AuthZ across admin domains, including: –Data storage Controlled AA –Data transfer Between stores, stores and compute nodes –Data catalogues, registries,… How can I find the data in the resources that I can access? Key issue: –Move data to where computation will happen? (Yesterday – staging files to compute nodes) –Move computation to be close to data?

9 Move computation to the data Assumption: code size << data size –Minimise data transport Computation hosted close to storage –I know where data are – stage or preinstall executables –Later today – run workflow “very close” to data

10 Meta-data: describing data Choosing data sources –How do you find them? –How are they described and advertised? Meta-data is required describing –Content –Provenance –Structure –Types, Formats & Ontologies –Operations available –Access requirements –Quality of service Necessary to find, understand, access, automate, assess quality and manage But very hard to create & maintain: Incentives & tools Solutions include: Relational databases, repositories (like Fedora)

11 Files on Grids Simple data files on grid-specific storage Middleware supporting –Replica files to be close to where you want computation For resilience –Catalogue: maps logical name to physical storage device/file –Virtual filesystems, POSIX-like I/O –Services provided: storage, transfer, catalogue that maps logical filenames to replicas. Solutions include –gLite (EGEE): data service –Globus: Data Replication Service –Storage Resource Broker – deployed by NGS

12 Files on the NGS Accounts –Not reliable as long-term storage –Pool of accounts – undefined persistence Useful with care! E.g. you are likely to see the fjles you used yesterday Storage Resource Broker –THE way to hold files –Accessible across all nodes, and desktop/workstations….

13 Oracle and the NGS The requirement for data hosting will grow Oracle database: for both users and services offered by the NGS. Additional application needed after joining the NGS

14 OGSA-DAI DAI – Data Access and Integration Structured data you don’t want in Oracle –E.g. It would be a replica of existing service –Structured data: RDBMS, XML databases,… –Files on project’s filesystems –Data that may already have other user communities not using a Grid Require extendable middleware tools to support –Computation near to data –Controlled exposure of data without replication Basis for integration and federation

15 Summary StorageTransferCatalogue Basic functions NGS services Oracle Storage Resource Broker These basic functions are provides by different services for different types of data OGSA-DAIGridFTP