Scalla Update Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 25-June-2007 HPDC DMG Workshop

Scalla Update Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 25-June-2007 HPDC DMG Workshop http://xrootd.slac.stanford.edu

25-June-20072: http://xrootd.slac.stanford.edu Outline Introduction Design points Architecture Clustering The critical protocol element Capitalizing on Scalla Features Solving some vexing grid-related problems Conclusion

25-June-20073: http://xrootd.slac.stanford.edu Scalla What is Scalla? SCA Structured Cluster Architecture for LLA Low Latency Access xrootd Low latency access to data via xrootd servers POSIX-style byte-level random access arbitrary Hierarchical directory-like name space of arbitrary files Does not have full file system semantics notThis is not a general purpose data management solution Protocol includes high performance & scalability features olbd Structured clustering provided by olbd servers Exponentially scalable and self organizing

25-June-20074: http://xrootd.slac.stanford.edu General Design Points experimental High speed access to experimental data Write once read many times processing mode Small block sparse random access (e.g., root files) fast High transaction rate with rapid request dispersal (fast opens) Low setup cost High efficiency data server (low CPU/byte overhead, small memory footprint) Very simple configuration requirements No 3 rd party software needed (avoids messy dependencies) Low administration cost Non-assisted fault-tolerance Self-organizing servers remove need for configuration changes No database requirements (no backup/recovery issues) Wide usability Full POSIX access Server clustering for scalability Castor Plug-in architecture and event notification for applicability (HPSS, Castor, etc)

25-June-20075: http://xrootd.slac.stanford.edu Management (XMI Castor, DPM) lfn2pfn prefix encoding Storage System (oss, drm/srm, etc) authentication (gsi, krb5, etc) Clustering (olbd) authorization (name based) File System (ofs, sfs, alice, etc) Protocol (1 of n) (xrootd) xrootd Plugin Architecture Protocol Driver (Xrd) Many ways to accommodate other systems EventsEvents Events

25-June-20076: http://xrootd.slac.stanford.edu Architectural Significance Plug-in Architecture Plus Events Easy to integrate other systems Orthogonal Design Uniform client view irrespective of server function Easy to integrate distributed services System scaling always done in the same way Plug-in Multi-Protocol Security Model Permits real-time protocol conversion System Can Be Engineered For Scalability Generic clustering plays a significant role

25-June-20077: http://xrootd.slac.stanford.edu Quick Note on Clustering xrootd xrootd servers can be clustered Increase access points and available data Allows for automatic failover Structured point-to-point connections Cluster overhead (human & non-human) scales linearly Cluster size is not limited I/O performance is not affected xrootdolbd Always pairs xrootd & olbd servers xrootdolbd Data handled by xrootd and cluster management by olbd Symmetric cookie-cutter arrangement (a no-brainer) Architecture can be used in very novel ways E.g., cascaded caching for single point files (ask me) Redirection Protocol is Central olbd xrootd

25-June-20078: http://xrootd.slac.stanford.edu Routing File Request Routing ClientManager (Head Node/Redirector) Data Servers open(/a/b/c) A B C go to C open(/a/b/c) Who has /a/b/c? I have Cluster Client sees all servers as xrootd data servers 2 nd open go to C Managers cache the next hop to the file No External Database

25-June-20079: http://xrootd.slac.stanford.edu Routing Two Level Routing ClientManager (Head Node/Redirector) Data Servers open(/a/b/c) A B C go to C open(/a/b/c) Who has /a/b/c? I have Cluster Client sees all servers as xrootd data servers Supervisor(sub-redirector) Who has /a/b/c? D E F I have go to F open(/a/b/c) I have

25-June-200710: http://xrootd.slac.stanford.edu Significance Of This Approach Uniform Redirection Across All Servers routing Natural distributed request routing No need for central control Scales in much the same way as the internet Only immediate paths to the data are relevant, not the location and Integration and distribution of disparate services Client is unaware of the underlying model Critical for distributed analysis using “stored” code Natural fit for the grid Distributed resources in multiple administrative domains

25-June-200711: http://xrootd.slac.stanford.edu Capitalizing on Scalla Features Addressing Some Vexing Grid Problems GSI overhead Data Access Firewalls SRM Transfer overhead Network Bookkeeping Scalla building blocks are fundamental elements Many solutions are constructed in the “same” way

25-June-200712: http://xrootd.slac.stanford.edu GSI Issues GSI Authentication is Resource Intensive Significant CPU & Administrative Resources each Process occurs on each server Well Known Solution Perform authentication once and convert protocol Example, GSI to Kerberos conversion Elementary Feature of Scalla Design Allows each site to choose local mechanism

25-June-200713: http://xrootd.slac.stanford.edu Speeding GSI Authentication 1 st Point of Contact (Specialized xroot Server) Login & Authenticate Return signed Cert and redirect to xroot cluster Login Using Signed Cert Standard xrootd Cluster Client sees all servers as xrootd data servers Client can be redirected to 1st point of contact When signature expires GSIToSSIPlug-in xrootd Subsequent Points of Contact (xrootd with SSI Auth) xrootd Client

25-June-200714: http://xrootd.slac.stanford.edu Firewall Issues Scalla Architected as a Peer-to-Peer Model A server can as act as a client Provides Built-In Proxy Support Can bridge firewalls Scalla clients also support SOCKS4 protocol Elementary Feature of Scalla Design Allows each site to choose their own security policy

25-June-200715: http://xrootd.slac.stanford.edu open(/a/b/c) Vaulting Firewalls 1 st Point of Contact (Specialized xroot Server) open(/a/b/c) Subsequent Data Access Standard xrootd Cluster Client sees all servers as xrootd data servers ProxyPlug-in xrootd xrootd Client

25-June-200716: http://xrootd.slac.stanford.edu Grid FTP Issues Scalla Integrates With Other Data Transports Using the POSIX Preload Library Rich emulation avoids application modification Example, GSIftp Elementary Feature of Scalla Design Allows fast and easy deployment

25-June-200717: http://xrootd.slac.stanford.edu open(/a/b/c) Providing Grid FTP Access 1 st Point of Contact (Standard GSIftp Server) open(/a/b/c) Subsequent Data Access Standard xrootd Cluster FTP servers can be Firewalled and Replicated for scaling PreloadLibraryGSIFTP xrootd Client

25-June-200718: http://xrootd.slac.stanford.edu SRM Issues Data Access via SRM Falls Out Requires a trivial SRM Only need a closed surl-turl rewriting mechanism Thanks to Wei Yang for this insight Some Caveats Requires existing SRM changes Simple if url rewriting were a standard “plug-in” Plan to have StoRM and LBL SRM versions available Many SRM functions become no-ops Generally not needed for basic point-to-point transfers Typical for smaller sites (i.e., tier 2 and smaller)

25-June-200719: http://xrootd.slac.stanford.edu open(/a/b/c) Providing SRM Access Client ftphost (Standard GSIftp Server) open(/a/b/c) Subsequent Data Access Standard xrootd Cluster SRM access is a Simple interposed add-on ProloadLibraryGSIFTP xrootd SRM gsiftp://ftphost/a/b/c srm://srmhost/a/b/c srmhost

25-June-200720: http://xrootd.slac.stanford.edu A Scalla Storage Element (SE) Clients (Linux, MacOS, Solaris, Windows) Data Servers Managers Optional Fire Wall gridFTP SRM xrootdproxy All All servers, including gridFTP, SRM and Proxy replicated/clustered Can be replicated/clustered within the Scalla Framework For scaling and fault tolerance Optional Optional Optional

25-June-200721: http://xrootd.slac.stanford.edu Data Transport Issues Enormous effort spent on bulk transfer Requires significant SE resource near CE’s Impossible to capitalize on opportunistic resources Can result in large wasted network bandwidth Unless most of data used multiple times Still have the “missing file” problem Requires significant bookkeeping effort Large job startup delays until all of the required data arrives Bulk Transfer originated from a historical view of the WAN Too high latency, unstable, and unpredictable for real-time access Large unused relatively cheap network capacity for bulk transfers Much of this is no longer true It’s time to reconsider these beliefs

25-June-200722: http://xrootd.slac.stanford.edu WAN Real Time Access? CPU/event <= RTT/p Where p is number of pre-fetched events Real time WAN access “equivalent” to LAN Some assumptions here Pre-fetching is possible Analysis framework structured to be asynchronous Firewall problems addressed For instance, using proxy servers

25-June-200723: http://xrootd.slac.stanford.edu Workflow In a WAN Model Bulk transfer only long-lived useful data Need a way to identify this Start jobs the moment “enough” data present Any missing files can be found on the “net” LAN access to high use / high density files WAN access to everything else Locally missing files Low use or low density files Initiate background bulk transfer when appropriate Switch to local copy when finally present

25-June-200724: http://xrootd.slac.stanford.edu Scalla Supports WAN Models Native latency reduction protocol elements Asynchronous pre-fetch Maximizes overlap between client CPU and network transfers Request pipelining Vastly reduces request/response latency Vectored reads and writes Allows multi-file and multi-offset access with one request Client scheduled parallel streams Removes the server from second guessing the application Integrated proxy server clusters Firewalls addressed in a scalable way Federated peer clusters Allows real-time search for files on the WAN

25-June-200725: http://xrootd.slac.stanford.edu A WAN Data Access Model Independent Tiered Resource Sites Cross-share data when necessary Local SE unavailable or file is missing Site B SE CE Site A CE SE Site C CE SE Site D CE SE Sites Federated As Independent Peer Clusters

25-June-200726: http://xrootd.slac.stanford.edu Conclusion Many ways to build a Grid Storage Element (SE) Choice depends on what needs to be accomplished Light weight simple solutions many times work the best This is especially relevant to smaller or highly distributed sites WAN-cognizant architectures should be considered Effort needs to be spent on making analysis WAN compatible This may be the best way to scale production LHC analysis Data analysis presents the most difficult challenge The system must withstand of 1,000’s of simultaneous requests Must be lightening fast within significant financial constraints

25-June-200727: http://xrootd.slac.stanford.edu Acknowledgements Software Collaborators INFN/Padova: Fabrizio Furano (client-side), Alvise Dorigo Root: Fons Rademakers, Gerri Ganis (security), Bertrand Bellenet (windows) Alice: Derek Feichtinger, Guenter Kickinger, Andreas Peters STAR/BNL: Pavel Jackl Cornell: Gregory Sharp SLAC: Jacek Becla, Tofigh Azemoon, Wilko Kroeger Operational collaborators BNL, CNAF, FZK, INFN, IN2P3, RAL, SLAC Funding US Department of Energy Contract DE-AC02-76SF00515 with Stanford University

Scalla Update Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 25-June-2007 HPDC DMG Workshop

Similar presentations

Presentation on theme: "Scalla Update Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 25-June-2007 HPDC DMG Workshop"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scalla Update Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 25-June-2007 HPDC DMG Workshop

Similar presentations

Presentation on theme: "Scalla Update Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 25-June-2007 HPDC DMG Workshop"— Presentation transcript:

Similar presentations

About project

Feedback