AliE Pablo Saiz/CERN P.Buncic, J-E. Revsbech R.Piskac, V.Sego, L. Aphecetche ALICE Collaboration ALICE Environment on the GRID
AliE Content Alice at LHC Alice Computing Model Building AliEn AliEn Components Deploying AliEn AliEn Roadmap Conclusions
AliE LHC
AliE CERN - LHC
AliE Construction
AliE Problem Typical next generation HEP experiment Large scale simulation & reconstruction effort Heavily distributed processing and event storage ~1000 scientists in ~100 of institutions Complex analyses of distributed data Large files (one event up to 2GB) 10^9 files/year (x n, n>2) 2 PB/year Experiment lifetime years GRID Widely accepted as a solution
AliE Alice Use Cases Simulation, Data Challenges & Reconstruction Centrally managed production of background events Distributed processing and event storage Event mixing Not necessarily centrally managed Once background events exist, the subsequent requests for event mixing must be routed to the location which holds required input Analysis Using AliEn API, PROOF will locate optimal site(s) for macro execution, try to execute it in parallel, collect the output and return it to the user (or register it in the catalogue)
AliE ALICE Computing Model
AliE AliE AliROOT ROOT User Simulation, Reconstruction, Calibration, Analysis System GUI Persistent IO Utility libs World Interfaces & Distributed computing environment C++ anything Nice! I only have to learn C++
AliE Challenge Can we provide, building on top of available public domain and open source components and standards, a functional distributed computing infrastructure to community of ALICE users which will remain operational even if underlying technologies keep changing?
AliE Building AliEn
AliE Building AliEn
AliE Open Source Components SASL/OpenSSL/OpenCA as authentication protocol Globus/GSS as an implementation of authentication compatible with other Grid projects CONDOR ClassAds language for job description (compatible with EU DataGrid) OpenLDAP for configuration management Apache for Web Portal MySQL as relational database backend Bbftp as file transfer protocols
AliE Gluing it together… Already existing pieces of code (NA49 file catalogue) in perl5 Good interface to different databases Easy Web integration Simple Object Access Protocol (also known as Service Oriented Access Protocol) Good Perl implementation (SOAP::Lite) on client and server side Possibility to provide client access from many different platforms and languages (Java,C,C++…) Provides standard means to invoke procedures (services) in distributed environment
AliE Components AliEn Services Modules & libraries
AliE “Web of Services”
AliE Statistics… SLOCCount is Open Source Software/Free Software, licensed under the FSF GPL.Please credit this data as "generated using 'SLOCCount' by David A. Wheeler."
AliE Statistics… SLOCCount is Open Source Software/Free Software, licensed under the FSF GPL.Please credit this data as "generated using 'SLOCCount' by David A. Wheeler.“
AliE Benefits of development based on OpenSource components are more than obvious… AliEn vs OpenSource
AliE AliEn Components
AliE AliEn SASL implementation SASL is the Simple Authentication and Security Layer, a method for adding authentication support to connection-based protocols AliEn now has perl module with implementation GSSAPI This allows us to use all SASL authentication schemes old AliEn authentication (token, AFS password, SSH) X509 certificates Globus/GSI (credential delegation) AliEn distribution includes necessary Globus/MDS/GSI software This allows us to develop secure Peer-To-Peer File Transfers based on machine/protocol/user certificates and LDAP based configuration management
AliE Authentication Client Proxy Server Database LDAP Request methods List of methods SASL Authentication Checking if user exists Data X509(AliEn/Globus) PKI/RSA (ssh) Token (AliEn) AFS password
AliE File catalogue ALICE USERS ALICE SIM Tier1 ALICE LOCAL |--./ | |--cern.ch/ | | |--user/ | | | |--a/ | | | | |--admin/ | | | | | | | | | |--aliprod/ | | | | | |--f/ | | | | |--fca/ | | | | | |--p/ | | | | |--psaiz/ | | | | | |--as/ | | | | | | | | |--dos/ | | | | | | | | |--local/ |--simulation/ | | / | | |--V3.05/ | | | |--Config.C | | | |--grun.C | |--36/ | | |--stderr | | |--stdin | | |--stdout | | |--37/ | | |--stderr | | |--stdin | | |--stdout | | |--38/ | | |--stderr | | |--stdin | | |--stdout | | | | | |--b/ | | | | |--barbera/ Files, commands (job specification) as well as job input and output and metadata are stored in the catalogue
AliE Command Interface
AliE GUI: AliEn Xfiles
AliE Web Portal Generic Web portal Virtual Organizations Alice Atlas NA49 Demo Mammogrid
AliE Task Queue “Pull” rather than “push”architecture
AliE AliEnTasksCEs alien job-submit job.jdl Broker Yes: Select Match ? No: Next CE contacts CPUServer and presents its own ClassAd, Resource Broker will match them against job ClassAds and select the most appropriate job to run on that CE Resource Broker
AliE Resource Broker Resource Broker Optimizer
AliE Class Ads &JDL Requirements = ( other.Type == "machine" ) &&(member(other.Packages,"AliRoot") ); Packages = "AliRoot"; Arguments = "--round run event version v –grun G+F"; Executable = "/Alice/bin/AliRoot.sh"; InputFile = { "LF:/alice/simulation/ /v /00071/Config.C", "LF:/alice/simulation/ /v /00071/grun.C" }; Type = "Job"; An Example – JDL file to run Alice Simulation job:
AliE Class Ads &JDL Requirements = ( other.Type == “Job" ); Type = “machine"; Host = “alienx.cern.ch”; CE =“Alice::CERN::LXBATCH”; Packages = { "AliRoot“, “ROOT”, “AliRoot:: ” }; CloseSE = { “Alice::CERN::Castor”, “Alice::CERN::File”, “Alice::CERN::scratch” }; Class Ads of CE:
AliE Computer Local Center GRID CENTER Computer Monitoring In order to develop and deploy more refined Resource Broker we need monitoring framework Frequent data updates, large data volume for large number of computers The idea is to implement hierarchy of clients and servers where each client (child) maintains the history of measurements reports the summary information to upper layer (parent) using SOAP protocol
AliE Deploying AliEn
AliE First implementation of Alice World Computing Model
AliE Production Summary 5682 events validated, 118 failed (2%) Up to 300 concurrently running jobs worldwide (5 weeks) 5 TB of data generated and stored at the sites with mass storage capability (CERN 73%,CCIN2P3 14%, LBL, 14%, OSC 1%) GSI, Karlsruhe, Dubna, Nantes, Budapest, Bologna, Zagreb, Birmingham, Utrecht, Calcutta in addition ready by now 13 clusters, 9 sites 10^5 CPU hours
AliE AliEn Roadmap
AliE AliEn as a meta-GRID AliEn User Interface AliEn stackiVDGL stackEDG stack
AliE Roadmap… Optimization and test suite PROOF interface & support for interactive jobs EDG interface GRID partitioning Queue optimization (based on AliEn monitoring) Implementation of Web services SOAP (Simple Object Access Protocol) WSDL (Web Services Description Language) UDDI (Universal Description Discovery & Integration) Virtual datasets
AliE Summary AliEn framework is a lightweight, simplified but functionally equivalent alternative to full blown GRID based on standard components (SOAP, Web services) It has been tested in production will be continuously developed with aim to provide long term stable interface to GRID(s) for Alice users AliEn will used to provide GRID component for MammoGRID – 3 year, 2M Euro project funded by EC, starting in September Summary of AliEn features (visit Authentication module which supports various authentication methods (Globus/GSI) Distributed file catalogue built on top of RDBMS with user interface that mimics the file system Secure file transport and replication Service Task queue which holds commands to be executed in the system and Resource Broker Configuration and Information Service Computing and Storage elements Metadata catalogue Monitoring framework C/C++/perl API Web portal