EGEE-II INFSO-RI Enabling Grids for E-sciencE gLite Overview Gang Chen CC-IHEP, Chinese Academy of Sciences The 6th Joint Training of OMII-Europe & CNGrid Hong Kong, January, 2008
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Outline LCG & EGEE Projects The gLite Middleware –Security –Information System –Workload management –Data management –… Summary
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 LHC: Large Hadron Collider LHC gets ready!
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 The LHC Computing Challenge Data volume –High rate x large number of channels x 4 experiments 15 PetaBytes of new data each year Compute power –Event complexity x Nb. events x thousands users 100 k of today's fastest CPUs Worldwide analysis & funding –Computing funding locally in major regions & countries –Efficient analysis everywhere GRID technology (WLCG: Worldwide LHC Computing Grid)
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 WLCG Tier Structure
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Centers around the world form a Supercomputer The EGEE and OSG projects are the basis of WLCG
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 The EGEE project EGEE –1 April 2004 – 31 March 2006 –71 partners in 27 countries, federated in regional Grids EGEE-II –1 April 2006 – 31 March 2008 –91 partners in 32 countries –13 Federations Objectives –Large-scale, production-quality infrastructure for e-Science –Attracting new resources and users from industry as well as science –Improving and maintaining “gLite” Grid middleware Globus 2 based Web services based EGEE-2EGEE-1LCG-2LCG-1
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Applications on EGEE Applications from an increasing number of domains –Astrophysics –Computational Chemistry –Earth Sciences –Financial Simulation –Fusion –Geophysics –High Energy Physics –Life Sciences –Multimedia –Material Sciences –… –Book of abstracts: pdfhttp://doc.cern.ch//archive/electronic/egee/tr/egee-tr pdf
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Related EU projects EUGRID
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 EGEE Middleware: gLite gLite –Exploit experience and existing components from VDT (CondorG, Globus), EDG/LCG, AliEn, and others –Develop a lightweight stack of generic middleware useful to EGEE applications (HEP and Biomedics are pilot applications). Should eventually deploy dynamically (e.g. as a globus job) Pluggable components – cater for different implementations –Focus is on re-engineering and hardening –Early prototype and fast feedback turnaround envisaged
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 The release of gLite 3.0 Convergence of LCG and gLite in spring 2006 –Continuity on the production infrastructure ensured usability by applications –Initial focus on the new Job Management Thorough testing and optimization together with the applications Migration to the ETICS build system –ETICS project started in January Reorganization of the work according to the new process –EGEE Technical Coordination Group and Task Forces –Start of the EGEE SA3 Activity for integration and certification –“Continuous release process” No big-bang releases! LCG-2 prototyping product product gLite 2006 gLite gLite 3.1
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Middleware Structure Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory Foundation Grid Middleware will be deployed on the EGEE infrastructure –Must be complete and robust –Should allow interoperation with other major grid infrastructures –Should not assume the use of Higher-Level Grid Services Foundation Grid Middleware Security model and infrastructure Computing (CE) and Storage Elements (SE) Accounting Information and Monitoring Higher-Level Grid Services Workload Management Replica Management Visualization Workflow Grid Economies... Applications
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 gLite Services Decomposition API Access Job Management Services Computing Element Workload Management Metadata Catalog Data Services Storage Element Data Movement File & Replica Catalog Authorization Security Services Authentication Information & Monitoring Information & Monitoring Services Service Discovering Accounting Auditing Job Provenance Package Manager CLI Network Monitoring
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Main components User Interface (UI) User Interface (UI):The place where users logon to the Grid Computing Element (CE) Computing Element (CE): A batch queue on a site’s computers where the user’s job is executed Storage Element (SE) Storage Element (SE): provides (large-scale) storage for files Resource Broker (RB) Resource Broker (RB): Matches the user requirements with the available resources on the Grid Information System Information System: Characteristics and status of CE and SE (Uses “GLUE schema”)
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Current production middleware ReplicaCatalogue Logging & Book-keeping ResourceBrokerStorageElementComputingElement InformationService Job Status DataSets info Author. &Authen. Job Submit Event Job Query Job Status Input “sandbox” Input “sandbox” + Broker Info Output “sandbox” Publish SE & CE info “User interface”
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Grid Foundation: Security Authentication based on X.509 PKI infrastructure –Certificate Authorities (CA) issue (long lived) certificates identifying individuals (much like a passport) Commonly used in web browsers to authenticate to sites –Trust between CAs and sites is established (offline) –In order to reduce vulnerability, on the Grid user identification is done by using (short lived) proxies of their certificates Proxies can –Be delegated to a service such that it can act on the user’s behalf –Include additional attributes (like VO information via the VO Membership Service VOMS) –Be stored in an external proxy store (MyProxy) –Be renewed (in case they are about to expire)
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 AuthN and AuthZ: pre-VOMS Authentication –User receives certificate signed by CA –Connects to “UI” by ssh –Downloads certificate –Single logon to Grid – create proxy - then Grid Security Infrastructure identifies user to other machines Authorisation –User joins Virtual Organisation –VO negotiates access to Grid nodes and resources –Authorisation tested by CE –gridmapfile maps user to local account UI AUP VO mgr Personal/once VO database grid-mapfiles on Grid services GSI VO service Daily update CA
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 VOMS: concepts Virtual Organization Membership Service: –Extends the proxy with info on VO membership, group, roles –Fully compatible with GSI –Each VO has a database containing group membership, roles and capabilities informations for each user –User contacts VOMS server requesting his authorization info –Server sends authorization info to the client, which includes it in a proxy certificate [glite-tutor] /home/giorgio > voms-proxy-init --voms gilda Cannot find file or dir: /home/giorgio/.glite/vomses Your identity: /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Emidio Enter GRID pass phrase: Your proxy is valid until Mon Jan 30 23:35: Creating temporary proxy Done Contacting voms.ct.infn.it:15001 [/C=IT/O=GILDA/OU=Host/L=INFN "gilda" Creating proxy Done Your proxy is valid until Mon Jan 30 23:35: Query Authentication Request Auth DB C=IT/O=INFN /L=CNAF /CN=Pinco Palla /CN=proxy VOMS AC
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Grid foundation: Information Systems Generic Information Provider (GIP) –Provides LDIF information about a grid service in accordance to the GLUE Schema BDII: Information system in gLite 3.0 (by LCG) –LDAP database that is updated by a process –More than one DBs is used separate read and write –A port forwarder is used internally to select the correct DB 2171 LDAP 2172 LDAP 2173 LDAP 2170 Port Fwd Update DB & Modify DB 2170 Port Fwd Swap DBs GIP Provider Config File LDIF File Plugin Cache
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Grid foundation: Information Systems R-GMA: provides a uniform method to access and publish distributed information and monitoring data –Used for job and infrastructure monitoring in gLite 3.0 –Working to add authorization Service Discovery: –Provides a standard set of methods for locating Grid services –Currently supports R-GMA, BDII and XML files as backends –Will add local cache of information –Used by some DM and WMS components in gLite 3.0
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Grid foundation: Computing Element LCG-CE (GT2 GRAM) –in production now but will be phased-out later GLITE-CE (GSI-enabled Condor-C) –already deployed but still needs thorough testing and tuning. Being done now CREAM (WS-I based interface) –being deployed on the JRA1 preview test- bed now. After a first testing phase will be certified and deployed together with the gLite-CE –Our contribution to the OGF-BES group for a standard WS-I based CE interface BLAH is the interface to the local resource manager (via plug-ins) –CREAM and gLite-CE –Information pass-through: pass parameters to the LRMS to help job scheduling New! WMS, Clients LRMS WN glexec + LCAS/ LCMAPS BLAH Grid Site Information System New! Computing Element bdII R-GMA CEMon
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Grid foundation: Storage Element gLite 3.0 data access protocols: –File Transfer: GSIFTP (GridFTP) –File I/O (Remote File access) Posix-like file access Grid File Access Layer (GFAL) Support for ACL in the SRM layer gsidcap insecure RFIO secured RFIO (gsirfio)
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 SE types Classic SE: –GridFTP server –Insecure RFIO daemon (rfiod) – only LAN limited file access –Single disk or disk array –No quota management –Does not support the SRM interface Mass Storage Systems (Castor, dCache) –Files migrated between front-end disk and back-end tape storage hierarchies –GridFTP server –Insecure RFIO (Castor), secure gsidcap (dCache) –Provide a SRM interface with all the benefits Disk pool managers (dCache, DPM, StoRM) –manage distributed storage servers in a centralized way –Physical disks or arrays are combined into a common (virtual) file system –Disks can be dynamically added to the pool –GridFTP server –Secure remote access protocols (gsidcap for dCache, gsirfio for DPM) –SRM interface
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Highlights: Disk Pool Manager Light-weight disk-based Storage Element –Easy to install, configure, manage and to join or remove resources –Integrated security (authentication/authorization) based on VOMS groups and roles All control and I/O services have security built-in: GSI or Kerberos 5 Problem of ACLs propagation during replication between SEs will be addressed in the first half of 2007 –SRMv1 and SRMv2.1, SRMv2.2 Grid ClientData ServerSRM ServerName ServerDisk Pool Manager Disk SystemGridftp ClientRFIO ClientSRM ClientNS DatabaseDPM Database DPM DaemonNS DaemonRFIO Daemon Gridftp Server RFIO Client Request Daemon SRM Daemon
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Grid foundation: Accounting APEL: Uses R-GMA to propagate and display job accounting information for infrastructure monitoring –Reads LRMS log files provided by gLite-CE and BLAH –Preparing an update for gLite 3.0 to use the files form BLAH DGAS: Collects, stores and transfers accounting data. Compliant with privacy requirements –Reads LRMS log files provided by LCG-CE and BLAH. –Stores information in a site database (HLR) and optionally in a central HLR. Access granted to user, site and VO administrators –Not yet certified in gLite 3.0. Deployment plan: certify and activate local sensors and site HLR in parallel with APEL replace APEL sensors with DGAS (DGAS2APEL) certify and activate central HLR; perform scalability tests New!
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 High Level Services: Workload mgmt. WMS helps the user accessing computing resources –Resource brokering, management of job input/output,... lcg-RB: GT2 + Condor-G –To be replaced when the gLite WMS proves to be reliable gLite WMS: Web service (WMProxy) + Condor-G –Management of complex workflows (DAGs) and compound jobs bulk submission and shared input sandboxes support for input files on different servers (scattered sandboxes) –Support for shallow resubmission of jobs –Job File Perusal: file peeking during job execution –Supports collection of information from CEMon, BDII, R-GMA and from DLI and StorageIndex data management interfaces –Support for parallel jobs (MPI) when the home dir is not shared –Deployed for the first time in gLite 3.0 New!
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 High Level Services: Workflows Direct Acyclic Graph (DAG) is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs A Collection is a group of jobs with no dependencies –basically a collection of JDL’s A Parametric job is a job having one or more attributes in the JDL that vary their values according to parameters Using compound jobs it is possible to have one shot submission of a (possibly very large, up to thousands) group of jobs –Submission time reduction Single call to WMProxy server Single Authentication and Authorization process Sharing of files between jobs –Availability of both a single Job ID to manage the group as a whole and an ID for each single job in the group nodeE nodeC nodeA nodeD nodeB
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Highlights: FTS Reliable and manageable File Transfer System for VOs Transfers are treated as jobs –May be split onto multiple “channels” –Channels are point-to-point or “catch-all” (only one end fixed). More flexible channel definitions on the way... New features that will be available in production soon: –Cleaner error reporting and service monitoring interfaces –Proxy renewal and delegation –SRMv2.2 support Longer term development: –Optimized SRM interaction split preparation from transfer –Better service manag. controls –Notification of finished jobs –Pre-staging tape support –Catalog & VO plug-ins framework Allow catalog registration as part of transfer workflow
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 File management in gLite Files are write-once, read-many –If users edit files then they manage the consequences! Middleware supporting –Replica files –Logical filenames –Catalogue: maps logical name to physical storage device/file –Virtual filesystems, POSIX-like I/O: GFAL Services provided: –Storage: SE –transfer : FTS –catalogue that maps logical filenames to replicas: LFC
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Name conventions Users primarily access and manage files through “logical filenames” Mapping by the “LFC” catalogue server Defined by the userLFC Namespace LFC has a directory tree structure /grid/ /
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 Summary gLite 3.0 is an important milestone in EGEE program –New components from gLite 1.X being deployed for the first time on the Production Infrastructure Address requirements in terms of functionality and scalability Components deployed for the first time need extensive testing! –New organization in EGEE II New build and integration environment form ETICS More controlled software process and certification Development is client driven (TCG) Development is continuing to provide increased robustness, usability and functionality Collaboration with other projects for interoperability and definition/adoption of international standards
Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe & CNGrid,Hong Kong, January, /32 QUESTIONS?