Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 1 Patrick Fuhrmann On behave of the project team Supporting.

Slides:



Advertisements
Similar presentations
Data Storage Solutions Module 1.2. Data Storage Solutions Upon completion of this module, you will be able to: List the common storage media and solutions.
Advertisements

Operating System.
Cloud Storage in Czech Republic Czech national Cloud Storage and Data Repository project.
SLA-Oriented Resource Provisioning for Cloud Computing
Cloud Computing to Satisfy Peak Capacity Needs Case Study.
HPC USER FORUM I/O PANEL April 2009 Roanoke, VA Panel questions: 1 response per question Limit length to 1 slide.
Clouds C. Vuerli Contributed by Zsolt Nemeth. As it started.
An Approach to Secure Cloud Computing Architectures By Y. Serge Joseph FAU security Group February 24th, 2011.
Virtualization and the Cloud
Chapter 21: Mobile Virtualization Infrastracture and Related Security Issues Guide to Computer Network Security.
Cloud computing Tahani aljehani.
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Cloud Computing.
F Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005.
Hands-On Microsoft Windows Server 2008
Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
DMF Configuration for JCU HPC Dr. Wayne Mallett Systems Manager James Cook University.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.
NAS Last Update Copyright Kenneth N. Chipps Ph.D. 1.
Cloud Computing in NASA Missions Dan Whorton CTO, Stinger Ghaffarian Technologies June 25, 2010 All material in RED will be updated.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
Evolution, by tackling new challenges| CHEP 2015, Japan | Patrick Fuhrmann | 16 April 2015 | 1 Patrick Fuhrmann On behave of the project team Evolution,
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.
WNoDeS – Worker Nodes on Demand Service on EMI2 WNoDeS – Worker Nodes on Demand Service on EMI2 Local batch jobs can be run on both real and virtual execution.
© 2005 Princeton Softech, Inc. Princeton Softech Anatomy of an Archive Project Let’s Talk About Data!! April 18, 2007 Alan Schneider.
Virtualization for the LHCb Online system CHEP Taipei Dedicato a Zio Renato Enrico Bonaccorsi, (CERN)
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
Overview of grid activities in France in relation to FKPPL FKPPL Workshop Thursday February 26th, 2009 Dominique Boutigny.
Future home directories at CERN
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Web Technologies Lecture 13 Introduction to cloud computing.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-2.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
DCache features overview | Academia Sinica, Taipei| Patrick Fuhrmann | 17 March 2013 | 1 dCache federation design Head Node Request Headnode SITE Plus.
Virtual Server Server Self Service Center (S3C) JI July.
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
WP4 Summary Patrick Fuhrmann for the WP4 Tream RIA
EMI INFSO-RI Patrick Fuhrmann EMI Data area leader At the EGI Technical Forum 2011, in Lyon EMI-Data The second year.
CRISP WP18, High-speed data recording Krzysztof Wrona, European XFEL PSI, 18 March 2013.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 1 Presenter: Patrick Fuhrmann dCache.org Patrick Fuhrmann, Paul Millar,
Page 1 Cloud Computing JYOTI GARG CSE 3 RD YEAR UIET KUK.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Sync n share and QoS in dCache | NEC 2015, Montenegro| Patrick Fuhrmann | 02 Oct 2015 | 1 Patrick Fuhrmann On behave of the dCache team by especially Tigran.
EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure Steven Newhouse Director, EGI.eu Project Director, EGI-InSPIRE 29/06/2016CoreGrid.
Scalable sync-and-share service with dCache
WP18, High-speed data recording Krzysztof Wrona, European XFEL
OCP: High Performance Computing Project
Patrick Fuhrmann (DESY) Benjamin Ertl (KIT) Maciej Brzezniak (PSNC)
Introduction to Cloud Computing
Management of Virtual Execution Environments 3 June 2008
Building continuously available systems with Hyper-V
Presentation transcript:

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 1 Patrick Fuhrmann On behave of the project team Supporting the scientific data lifecycle

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 2 Content How are software features selected. How are software features funded. Hardening new features. Exploring new communities. Responding on new technologies HW and SW Something about INDIGO-DataCloud Essentially a random walk focusing on things I thought might be interesting.

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 3 Some words on why and when dCache does what it does.

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 4 How are software features selected ? Scientific communities believe, that Open Source Software is growing on trees. Consequently they are not willing to contribute to the development and software management at all. They assume that complains are very valuable contributions. Next consequence is that Open Source teams mainly implement software features, which are required by the labs, where the core team members are hosted. In order to explore new communities and satisfy their software requirements, Open Source Projects need external money.

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 5 How are new features funded ? This is where “National” and “European” projects come into play. For dCache, this : – was EMI – is the German national LSDMA project – and will be INDIGO-DataCloud The drawback: They tell you what they want to see in your code.

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 6 Funded features are not necessary those you need ? However, dCache has some invariant objectives: – The master plan (last slide of this presentation) – Be up to date on new technologies, either software or hardware. – Attract new communities as their specific requirements, if they can be fulfilled, make dCache even better. It can be a bit tricky to tune the funding projects exactly into the direction of our objectives. So, let’s see how dCache managed/es that …….

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 7 Funding influences dCache development topics Standardization NFS 4.1 / pNFS HTTP / WebDAV Contributing to the Dynamic Federation Data Life Cycle Multi Tier Storage Quality of Service Migration Archiving AAI Deploying new technologies into Production and exploring new communities

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 8 From 2013 to now, we slowed down development between two very demanding development projects, EMI and INDIGO- DataCloud, to : Deploy newly implemented technologies into production. Explore new communities and learn about their needs.

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 9 Deploying NFS into production CMS Grid DESY The Desy-Cloud FERMIlab (various Intensity Drontier) And the issues

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 10 New Production Systems based on dCache NFS. NFS 4.1 / pNFS NFS/ pNFS Direct low latency access Workernodes HPC dCache Backend Storage Layer Wide Area FTS GLOBUS (ONLINE) Sync & Share Laptops Mobile Devices See Paul’s presentation on Thursday

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 11 CMS Tier DESY Slowly migrating CMS Grid worker nodes to NFS4.1 data access. Good experience as long as the network is stable.

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 12 NFS 4.1 pNFS dCap Execution Time (hours) Job Efficiency (CPU / Wall Time ) Job Efficiency (NFS – dCap)

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 13 As with all news spec’s, there are issues Network problems cause the system to be behave unpredictable. Data Server behind firewalls Weak clients on VM’s Specification Violation – infinite state recovery with Linux kernel

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 14 Exploring new communities.

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 15 Exploring new communities. Jülich – Aachen Research Association, JADE – "Supercomputing and modeling for the Human Brain (SMHB)”, associated to the European Human Brain Project (Plenary by KH Meier) MoSGrid – Scientific Gateway for molecular simulation. VAVID – Data Gateway for analyzing wind energy infrastructures

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 16 JADE Aachen Jülich

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 17 Projects in HPC HPC jobs on supercomputer HPC jobs get access to dCache storage.

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 18 With the start of INDIGO-DataCloud, its money and a larger team (8+3) we can continue to explore new horizons. (Back to development mode)

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 19 New Disk Technologies – Open Ethernet Disks (HGST) New Object-Store Back-ends – CEPH New European Projects (INDIGO DC) – Focusing on Data Quality of Service and – Data Lifecycle Management Responding to new technologies

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 20 HGST Open Ethernet Disks Small ARM CPU with Ethernet piggybacked on regular Disk. Spec: – Any Linux (Debian on demo) – CPU 32-bit ARM, 512 Level 2 – 2 GB DRAM DDR-3 Memory 1792 MB available – Block storage driver as SCSI sda – Ethernet network driver as eth0

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 21 HGST Open Ethernet Disks (cont) Additional CPU is not used by disk itself and can run arbitrary customer OS. Disk is seen as regular block device. Not yet on the market. dCache got 5 disks and we are evaluating to run pool nodes on the disk itself. See talk on Thursday.

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 22 Response to CEPH CEPH complements dCache perfectly. – Simplifies operating dCache disks. – dCache accesses data as object-store anyway already. dCache is evaluating a ‘two step approach’. – Each pools sees it own object space in CEPH – All pools have access to the entire space, which is a slight change of dCache pool semantics. Would merge CEPH and dCache advantages – Multi Tier (Tape, Disk, SSD) – Multi protocol support for a common namespace. All protocols see the same namespace – All the dCache AAI features Support for X509, Kerberos, username/password

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 23 INDIGO-DataCloud Cheat-Sheet Horizon 2020 project starting April or May Budget 11.1 Million Euros ( for dCache) 26 Partners Duration 30 months The project aims for an Open Source Data and Computing platform targeted at scientific communities, deployable on multiple hardware, and provisioned over private and public e- infrastructures. See Ludek’s presentation on Wednesday

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 24 INDIGO in a nutshell 1.Self-service, on-demand 2.Access through the network 3.Resource pooling 4.Elasticity (with infinite resources) 5.Pay as you go In the end, Applications Rule. Stolen from Davide Salomoni (Project Director)

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 25 dCache involvement in INDIGO dCache is mostly involved in WP4, which is about Virtual Infrastructures. (IaaS) For storage systems, like dCache, this essentially means SDS (Software Defined Storage), which according to Wikipedia is: – Software-defined storage (SDS) is an evolving concept for computer data storage software to manage policy-based provisioning and management of data storage independent of hardware.

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 26 SDS according to dCache User/PaaS defined “Quality of Service” management – User/PaaS defined “Access Latency” SSD or Tape depending from application requirements. – User/PaaS Defined “Data Protection” On one disk, two disks or tree tapes depending on how precious your data is. – User/PaaS Defined “Data Migration Policies” Like Amazon Glacier vers. S3 Automatic Storage-Tier migration – Based on access profile All this wouldn’t be needed if SSD’s would be cheap and 100 % reliable.

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 27 dCache is well prepared Historically dCache supports multi-tier storage and the corresponding transition. SSDs Spinning Disks Tape, Blue Ray … Virtual File-system Layer NFS/pNFSgridFTPhttpWebDAVxRootd/dCap Automatic and Manual Media transitions

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 28 Recently added We optimized the ‘small file’ problem with disk tape transitions. Tape System Containers

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 29 What’s missing Mainly a common agreement (standard) on how to trigger transitions. (Protocol, API ?? ) We have some experience with SRM, however it seems not to be suitable for this purpose. Another candidate is CMDI (SNIA), which is an industry standard. Migration Policies are already discussed, documented and implemented within RDA (Practical Policy Working Group). Details will only be available after the INDIGO kick off meeting end of April ‘15.

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 30 Summary Magically, up to now, at the right moment, there was always an EU or National Project, funding dCache exactly for those features or activites, dCache was planning to do anyway and with that they helped us following our master plan : The support of the Complete Scientific Big Data Life Cycle Management.

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 31 Scientific Data Lifecycle High Speed Data Ingest Fast Analysis NFS 4.1/pNFS Wide Area Transfers (Globus Online, FTS) by GridFTP Visualization & Sharing by WebDAV, OwnCloud

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 32 Don’t forget Upcoming dCache Workshop

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 33 The END further reading

Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 34