Scalla Update Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 25-June-2007 HPDC DMG Workshop

Slides:



Advertisements
Similar presentations
Potential Data Access Architectures using xrootd OSG All Hands Meeting Harvard University March 7-11, 2011 Andrew Hanushevsky, SLAC
Advertisements

Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Distributed Xrootd Derek Weitzel & Brian Bockelman.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Barracuda Networks Confidential1 Barracuda Backup Service Integrated Local & Offsite Data Backup.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
Microsoft ® Application Virtualization 4.6 Infrastructure Planning and Design Published: September 2008 Updated: February 2010.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
Scalla Back Through The Future Andrew Hanushevsky SLAC National Accelerator Laboratory Stanford University 8-April-10
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Xrootd Authentication & Authorization Andrew Hanushevsky Stanford Linear Accelerator Center 6-June-06.
Version 4.0. Objectives Describe how networks impact our daily lives. Describe the role of data networking in the human network. Identify the key components.
The Next Generation Root File Server Andrew Hanushevsky Stanford Linear Accelerator Center 27-September-2004
Scalla/xrootd Andrew Hanushevsky SLAC National Accelerator Laboratory Stanford University 19-August-2009 Atlas Tier 2/3 Meeting
Xrootd Demonstrator Infrastructure OSG All Hands Meeting Harvard University March 7-11, 2011 Andrew Hanushevsky, SLAC
Scalla/xrootd Andrew Hanushevsky, SLAC SLAC National Accelerator Laboratory Stanford University 19-May-09 ANL Tier3(g,w) Meeting.
Scalla/xrootd Andrew Hanushevsky SLAC National Accelerator Laboratory Stanford University 29-October-09 ATLAS Tier 3 Meeting at ANL
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Application of Content Computing in Honeyfarm Introduction Overview of CDN (content delivery network) Overview of honeypot and honeyfarm New redirection.
Scalla/xrootd Introduction Andrew Hanushevsky, SLAC SLAC National Accelerator Laboratory Stanford University 6-April-09 ATLAS Western Tier 2 User’s Forum.
Computing Infrastructure for Large Ecommerce Systems -- based on material written by Jacob Lindeman.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Xrootd Monitoring Atlas Software Week CERN November 27 – December 3, 2010 Andrew Hanushevsky, SLAC.
Are SE Architectures Ready For LHC? Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 3-November-08 ACAT Workshop.
July-2008Fabrizio Furano - The Scalla suite and the Xrootd1.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
Xrootd Update Andrew Hanushevsky Stanford Linear Accelerator Center 15-Feb-05
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
CERN IT Department CH-1211 Genève 23 Switzerland Internet Services Xrootd explained
Performance and Scalability of xrootd Andrew Hanushevsky (SLAC), Wilko Kroeger (SLAC), Bill Weeks (SLAC), Fabrizio Furano (INFN/Padova), Gerardo Ganis.
Xrootd Present & Future The Drama Continues Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University HEPiX 13-October-05
Geo-distributed Messaging with RabbitMQ
Setup and Management for the CacheRaQ. Confidential, Page 2 Cache Installation Outline – Setup & Wizard – Cache Configurations –ICP.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Scalla/xrootd Andrew Hanushevsky, SLAC SLAC National Accelerator Laboratory Stanford University 08-June-10 ANL Tier3 Meeting.
Scalla Advancements xrootd /cmsd (f.k.a. olbd) Fabrizio Furano CERN – IT/PSS Andrew Hanushevsky Stanford Linear Accelerator Center US Atlas Tier 2/3 Workshop.
XRootD & ROOT Considered Root Workshop Saas-Fee September 15-18, 2015 Andrew Hanushevsky, SLAC
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Scalla Authorization xrootd /cmsd Andrew Hanushevsky SLAC National Accelerator Laboratory CERN Seminar 10-November-08
Slide 1/29 Informed Prefetching in ROOT Leandro Franco 23 June 2006 ROOT Team Meeting CERN.
Xrootd Proxy Service Andrew Hanushevsky Heinz Stockinger Stanford Linear Accelerator Center SAG September-04
SRM Space Tokens Scalla/xrootd Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-May-08
Scalla As a Full-Fledged LHC Grid SE Wei Yang, SLAC Andrew Hanushevsky, SLAC Alex Sims, LBNL Fabrizio Furano, CERN SLAC National Accelerator Laboratory.
1 Andrew Hanushevsky - CHEP, February 7-11, 2000 Practical Security In Large Scale Distributed Object Oriented Databases
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Scalla + Castor2 Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 27-March-07 Root Workshop Castor2/xrootd.
Federated Data Stores Volume, Velocity & Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC.
Distributed Data Access Control Mechanisms and the SRM Peter Kunszt Manager Swiss Grid Initiative Swiss National Supercomputing Centre CSCS GGF Grid Data.
1 Xrootd-SRM Andy Hanushevsky, SLAC Alex Romosan, LBNL August, 2006.
09-Apr-2008Fabrizio Furano - Scalla/xrootd status and features1.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
DCache/XRootD Dmitry Litvintsev (DMS/DMD) FIFE workshop1Dmitry Litvintsev.
PetaCache: Data Access Unleashed Tofigh Azemoon, Jacek Becla, Chuck Boeheim, Andy Hanushevsky, David Leith, Randy Melen, Richard P. Mount, Teela Pulliam,
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,
Storage Architecture for Tier 2 Sites Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 8-May-07 INFN Tier 2 Workshop
a brief summary for users
Introduction to Distributed Platforms
Vincenzo Spinoso EGI.eu/INFN
SLAC National Accelerator Laboratory
Introduction to Data Management in EGI
Data access and Storage
Simulation use cases for T2 in ALICE
Ákos Frohner EGEE'08 September 2008
Scalla/XRootd Advancements
Data Management cluster summary
INFNGRID Workshop – Bari, Italy, October 2004
Presentation transcript:

Scalla Update Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 25-June-2007 HPDC DMG Workshop

25-June-20072: Outline Introduction Design points Architecture Clustering The critical protocol element Capitalizing on Scalla Features Solving some vexing grid-related problems Conclusion

25-June-20073: Scalla What is Scalla? SCA Structured Cluster Architecture for LLA Low Latency Access xrootd Low latency access to data via xrootd servers POSIX-style byte-level random access arbitrary Hierarchical directory-like name space of arbitrary files Does not have full file system semantics notThis is not a general purpose data management solution Protocol includes high performance & scalability features olbd Structured clustering provided by olbd servers Exponentially scalable and self organizing

25-June-20074: General Design Points experimental High speed access to experimental data Write once read many times processing mode Small block sparse random access (e.g., root files) fast High transaction rate with rapid request dispersal (fast opens) Low setup cost High efficiency data server (low CPU/byte overhead, small memory footprint) Very simple configuration requirements No 3 rd party software needed (avoids messy dependencies) Low administration cost Non-assisted fault-tolerance Self-organizing servers remove need for configuration changes No database requirements (no backup/recovery issues) Wide usability Full POSIX access Server clustering for scalability Castor Plug-in architecture and event notification for applicability (HPSS, Castor, etc)

25-June-20075: Management (XMI Castor, DPM) lfn2pfn prefix encoding Storage System (oss, drm/srm, etc) authentication (gsi, krb5, etc) Clustering (olbd) authorization (name based) File System (ofs, sfs, alice, etc) Protocol (1 of n) (xrootd) xrootd Plugin Architecture Protocol Driver (Xrd) Many ways to accommodate other systems EventsEvents Events

25-June-20076: Architectural Significance Plug-in Architecture Plus Events Easy to integrate other systems Orthogonal Design Uniform client view irrespective of server function Easy to integrate distributed services System scaling always done in the same way Plug-in Multi-Protocol Security Model Permits real-time protocol conversion System Can Be Engineered For Scalability Generic clustering plays a significant role

25-June-20077: Quick Note on Clustering xrootd xrootd servers can be clustered Increase access points and available data Allows for automatic failover Structured point-to-point connections Cluster overhead (human & non-human) scales linearly Cluster size is not limited I/O performance is not affected xrootdolbd Always pairs xrootd & olbd servers xrootdolbd Data handled by xrootd and cluster management by olbd Symmetric cookie-cutter arrangement (a no-brainer) Architecture can be used in very novel ways E.g., cascaded caching for single point files (ask me) Redirection Protocol is Central olbd xrootd

25-June-20078: Routing File Request Routing ClientManager (Head Node/Redirector) Data Servers open(/a/b/c) A B C go to C open(/a/b/c) Who has /a/b/c? I have Cluster Client sees all servers as xrootd data servers 2 nd open go to C Managers cache the next hop to the file No External Database

25-June-20079: Routing Two Level Routing ClientManager (Head Node/Redirector) Data Servers open(/a/b/c) A B C go to C open(/a/b/c) Who has /a/b/c? I have Cluster Client sees all servers as xrootd data servers Supervisor(sub-redirector) Who has /a/b/c? D E F I have go to F open(/a/b/c) I have

25-June : Significance Of This Approach Uniform Redirection Across All Servers routing Natural distributed request routing No need for central control Scales in much the same way as the internet Only immediate paths to the data are relevant, not the location and Integration and distribution of disparate services Client is unaware of the underlying model Critical for distributed analysis using “stored” code Natural fit for the grid Distributed resources in multiple administrative domains

25-June : Capitalizing on Scalla Features Addressing Some Vexing Grid Problems GSI overhead Data Access Firewalls SRM Transfer overhead Network Bookkeeping Scalla building blocks are fundamental elements Many solutions are constructed in the “same” way

25-June : GSI Issues GSI Authentication is Resource Intensive Significant CPU & Administrative Resources each Process occurs on each server Well Known Solution Perform authentication once and convert protocol Example, GSI to Kerberos conversion Elementary Feature of Scalla Design Allows each site to choose local mechanism

25-June : Speeding GSI Authentication 1 st Point of Contact (Specialized xroot Server) Login & Authenticate Return signed Cert and redirect to xroot cluster Login Using Signed Cert Standard xrootd Cluster Client sees all servers as xrootd data servers Client can be redirected to 1st point of contact When signature expires GSIToSSIPlug-in xrootd Subsequent Points of Contact (xrootd with SSI Auth) xrootd Client

25-June : Firewall Issues Scalla Architected as a Peer-to-Peer Model A server can as act as a client Provides Built-In Proxy Support Can bridge firewalls Scalla clients also support SOCKS4 protocol Elementary Feature of Scalla Design Allows each site to choose their own security policy

25-June : open(/a/b/c) Vaulting Firewalls 1 st Point of Contact (Specialized xroot Server) open(/a/b/c) Subsequent Data Access Standard xrootd Cluster Client sees all servers as xrootd data servers ProxyPlug-in xrootd xrootd Client

25-June : Grid FTP Issues Scalla Integrates With Other Data Transports Using the POSIX Preload Library Rich emulation avoids application modification Example, GSIftp Elementary Feature of Scalla Design Allows fast and easy deployment

25-June : open(/a/b/c) Providing Grid FTP Access 1 st Point of Contact (Standard GSIftp Server) open(/a/b/c) Subsequent Data Access Standard xrootd Cluster FTP servers can be Firewalled and Replicated for scaling PreloadLibraryGSIFTP xrootd Client

25-June : SRM Issues Data Access via SRM Falls Out Requires a trivial SRM Only need a closed surl-turl rewriting mechanism Thanks to Wei Yang for this insight Some Caveats Requires existing SRM changes Simple if url rewriting were a standard “plug-in” Plan to have StoRM and LBL SRM versions available Many SRM functions become no-ops Generally not needed for basic point-to-point transfers Typical for smaller sites (i.e., tier 2 and smaller)

25-June : open(/a/b/c) Providing SRM Access Client ftphost (Standard GSIftp Server) open(/a/b/c) Subsequent Data Access Standard xrootd Cluster SRM access is a Simple interposed add-on ProloadLibraryGSIFTP xrootd SRM gsiftp://ftphost/a/b/c srm://srmhost/a/b/c srmhost

25-June : A Scalla Storage Element (SE) Clients (Linux, MacOS, Solaris, Windows) Data Servers Managers Optional Fire Wall gridFTP SRM xrootdproxy All All servers, including gridFTP, SRM and Proxy replicated/clustered Can be replicated/clustered within the Scalla Framework For scaling and fault tolerance Optional Optional Optional

25-June : Data Transport Issues Enormous effort spent on bulk transfer Requires significant SE resource near CE’s Impossible to capitalize on opportunistic resources Can result in large wasted network bandwidth Unless most of data used multiple times Still have the “missing file” problem Requires significant bookkeeping effort Large job startup delays until all of the required data arrives Bulk Transfer originated from a historical view of the WAN Too high latency, unstable, and unpredictable for real-time access Large unused relatively cheap network capacity for bulk transfers Much of this is no longer true It’s time to reconsider these beliefs

25-June : WAN Real Time Access? CPU/event <= RTT/p Where p is number of pre-fetched events Real time WAN access “equivalent” to LAN Some assumptions here Pre-fetching is possible Analysis framework structured to be asynchronous Firewall problems addressed For instance, using proxy servers

25-June : Workflow In a WAN Model Bulk transfer only long-lived useful data Need a way to identify this Start jobs the moment “enough” data present Any missing files can be found on the “net” LAN access to high use / high density files WAN access to everything else Locally missing files Low use or low density files Initiate background bulk transfer when appropriate Switch to local copy when finally present

25-June : Scalla Supports WAN Models Native latency reduction protocol elements Asynchronous pre-fetch Maximizes overlap between client CPU and network transfers Request pipelining Vastly reduces request/response latency Vectored reads and writes Allows multi-file and multi-offset access with one request Client scheduled parallel streams Removes the server from second guessing the application Integrated proxy server clusters Firewalls addressed in a scalable way Federated peer clusters Allows real-time search for files on the WAN

25-June : A WAN Data Access Model Independent Tiered Resource Sites Cross-share data when necessary Local SE unavailable or file is missing Site B SE CE Site A CE SE Site C CE SE Site D CE SE Sites Federated As Independent Peer Clusters

25-June : Conclusion Many ways to build a Grid Storage Element (SE) Choice depends on what needs to be accomplished Light weight simple solutions many times work the best This is especially relevant to smaller or highly distributed sites WAN-cognizant architectures should be considered Effort needs to be spent on making analysis WAN compatible This may be the best way to scale production LHC analysis Data analysis presents the most difficult challenge The system must withstand of 1,000’s of simultaneous requests Must be lightening fast within significant financial constraints

25-June : Acknowledgements Software Collaborators INFN/Padova: Fabrizio Furano (client-side), Alvise Dorigo Root: Fons Rademakers, Gerri Ganis (security), Bertrand Bellenet (windows) Alice: Derek Feichtinger, Guenter Kickinger, Andreas Peters STAR/BNL: Pavel Jackl Cornell: Gregory Sharp SLAC: Jacek Becla, Tofigh Azemoon, Wilko Kroeger Operational collaborators BNL, CNAF, FZK, INFN, IN2P3, RAL, SLAC Funding US Department of Energy Contract DE-AC02-76SF00515 with Stanford University