Download presentation
Presentation is loading. Please wait.
Published byMiles Chandler Modified over 9 years ago
1
Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006
2
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 2 What is Accounting? (in the Grid context) Grid accounting is the process of maintaining a (consistent) Grid-wide view of VO members' resource utilization.[1] [1] Accounting in Grid Environments, by Peter Gardfjäll, Department of Computing Science, Umeå University,Sweden
3
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 3 Why do we want an accounting system? Resource providers (SLAC, Fermilab…) want to perform cost- benefits analysis Resource providers wants to improve planning Resource providers want better security Resource providers want to improve QoS (priorities, debugging…) Support a Grid “Economic Model”
4
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 4 What is the real problem (solution)? Nobody talked about “Grid economy” Do we really want an Accounting system? Or maybe a monitoring system will do? Lets look at accounting and monitoring
5
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 5 Accounting vs. Monitoring A monitoring system: Purpose: monitoring system health, debugging, system profiling Gathers state information about the system resources Collects system events. It works like a DAQ system: as close as possible to the system, as less intrusive as possible Quasi Real-time to real-time An accounting system: It keeps track of resources usage It links a users’ service requests with the resources consumed to satisfied that requests It has accounts, banks, “currency” and support an economic model (policies) “After the facts”
6
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 6 For Example: Monitoring at SLAC What do we monitor: Network Switches, routers status Internet Mbytes/sec in/out Computer Clusters Batch systems, NFS and AFS servers, databases servers Storage Space Disks usage, HPPS Some metrics we use: CPU utilization, Memory Disk usage, Disk I/O Various Networking metrics (Mbytes in/out of switches, routers, servers…) Some primitive job submission results (LSF) We use a lot of monitoring tools and infrastructure: Ganglia, Nagios, OpenView, SNTP tools, Monalisa…
7
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 7 For Example: Accounting at SLAC? The monitoring system cannot link resource usage to users/groups Maybe by looking into the logs and correlating the events…but a lot of work Accounting infrastructures and tools ala Ganglia or Nagios do not exist Basically we cannot (yet) fully link a user name with a precise set of computing resource usage metrics
8
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 8 What I think we should track Job submission: Priority in the batch queue CPU-time Wall clock time Memory usage Storage Disk usage, Tape storage usage Storage class (to be defined) Network data transfer Network speed Quantity of data transferred Special software usage, Operator/Administrator services…maybe later
9
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 9 Goals Track services and resources usage per grid user after the fact Focus on quality, integrity and security of the information Accounting Information easily available to people (web interface) and to applications (Web Services) Build a system that is simple to manage (install, configure and upgrade) and to extends (well defined APIs) Based on well proven and standard (industrial strength) technologies However we do not cover (but keep in mind) User charging system, Resources or services pricing Support for an economic model for resource allocation
10
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 10 System Properties Interoperability The Accounting System should leverage existing standards to maximize interoperability with other Grids and Accounting Services. Fault Tolerance Reduce and flag data loss. Resilient to communication failures over LAN and WAN. resilient to the failure of one of its component. Security Guarantees integrity and non–repudiation of the accounting records at the site level. Uses secure communication channels (mutual authentication, message integrity, confidentiality) and access control lists. Scalability and Performance Not really an issue Other leverage existing tools and infrastructures to solve related problems.
11
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 11 Simple Domain Model
12
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 12 Design Direction We are currently focused on getting the infrastructure right more than the specific metrics to measure resources usage Open: we give APIs Distributed: Meters are distributed objects Based on open source standard technologies: Web Services, Java Platform, Tomcat, Axis, Hibernate Same idea as GUMS and JClarens: the service is an independent Tomcat Application (JClarens for authentication) Insure interoperability with OSG partners (LCG, TeraGrid…)
13
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 13 Architecture Overview
14
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 14 Meter A Meter is responsible for Gathering all the data about a Grid service usage Gathering all the data about the resources used by that Grid service Assembling a Service Usage record Logically there is 1 Meter entity per 1 Grid Service Each Meter is composed by one or more Probes and one Assembler (plus some other components for management functions) Grid Service uses resources distributed across the Resource Provider’s LAN, therefore the Meter is also distributed
15
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 15 Meter Logical View
16
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 16 Meter’s Probe and Assembler Probes use secure channel (mutual authentication, data integrity) to send usage information to the Assemblers. Usage information is packaged in ProbeEvents that are send to the Assemblers through a Web Service interface. Each ProbeEvent object has a standard header and a payload in XML format. Probes use “at least one semantics” technique to send ProbeEvents to the Assemblers (communication is resilient to failure) Assemblers can choose synchronous or asynchrous processing of ProbeEvents
17
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 17 Collector Main functionalities: Hosting the Meters' components (the Assemblers) that are responsible for assembling Service Usage Records Monitoring the Meters' components called Probes Communication between Probes and Assemblers: routing of ProbesEvents to the proper Assembler Communication between Assemblers and Data Store
18
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 18 Collector Logical View Data Store Component
19
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 19 Accountant This is a component thought for future use. Main functionalities: further process the Service Usage Records to apply economic policy (pricing & billing)
20
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 20 Deployment View Deployed as a Tomcat application: can take advantage of Tomcat clustering features for scalability and availability Collector and Publisher can run on two different Tomcat instance Can use the most popular database implementations; the database server can be on the same host with Tomcat or on different host Probes can run anywhere on the LAN
21
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 21 Deployment Diagram
22
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 22
23
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 23 Conclusion More Information Project Charter, Requirements and Design Documents Project Charter OSG Accounting Twiki page and OSG Accounting Twiki Mailing list: osg-accounting@openscience.org Mailing list Any Questions, Comments, etc?
24
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 24 SPARE SLIDES
25
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 25
26
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 26 Prob e Collector Repository of Accounting Records Data Store Access Layer Resource Provider Site W SA PI Web Presenter Statistical Analyzer Prob e Collector Repository of Accounting Records Grid Operation Center Prob e Collector Repository of Accounting Records Data Store Access Layer VO Center Web Presenter Statistical Analyzer Prob e Data Store Access Layer Web Presenter Statistical Analyzer Overview
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.