Thinking about Accounting Matteo Melani SLAC Open Science Grid
8/13/2004Open Science Grid2 What is Accounting? (in the Grid context) Grid accounting is the process of maintaining a (consistent) Grid-wide view of VO members' resource utilization.[1] [1] Accounting in Grid Environments, by Peter Gardfjäll, Department of Computing Science, Umeå University,Sweden
8/13/2004Open Science Grid3 Why do we want an accounting system? Resource providers (SLAC, Fermilab…) want to perform cost-benefits analysis Resource providers wants to improve planning Resource providers want better security Resource providers want to improve QoS (priorities, debugging…) Support a Grid “Economic Model”
8/13/2004Open Science Grid4 What is the real problem (solution)? Nobody talked about “Grid economy” Do we really want an Accounting system? Or maybe a monitoring system will do? Lets look at accounting and monitoring
8/13/2004Open Science Grid5 Accounting vs. Monitoring (1) A monitoring system: Purpose: monitoring system health, debugging, system profiling Gathers state information about the system resources Collects system events. It works like a DAQ system: as close as possible to the system, as less intrusive as possible
8/13/2004Open Science Grid6 Accounting vs. Monitoring (2) An accounting system: Purpose: keep track of resource usage, support resource consumption model (economic model) Accounting makes sense of the systems events and links them to the users. It operates at a higher level, (it depends on monitoring) It has accounts, banks, “currency” and support an economic model (policies)
8/13/2004Open Science Grid7 For Example: Monitoring at SLAC (1) What do we monitor: Network –Switches, routers status Internet Mbytes/sec in/out Computer Clusters –Batch systems, NFS and AFS servers, databases servers Storage Space –Disks usage, HPPS
8/13/2004Open Science Grid8 For Example: Monitoring at SLAC (2) Some metrics we use: –CPU utilization, Memory –Disk usage, Disk I/O –Various Networking metrics (Mbytes in/out of switches, routers, servers…) –Some primitive job submission results (LSF) All monitoring tools are custom made except Ganglia for Babar clusters
8/13/2004Open Science Grid9 For Example: Accounting at SLAC? The monitoring system cannot link resource usage to users/groups Maybe by looking into the logs and correlating the events…but a lot of work No Accounting at SLAC
8/13/2004Open Science Grid10 The 3 parts of Accounting Part 1: decide what need to be accounted for Part 2: how to map local accounts with Grid accounts Part 3: rules and policies to regulate transactions and definition of the economic model: “how much is for couple of GBytes of disk?” I can only think about part 1 for now!
8/13/2004Open Science Grid11 What I think we should track (in general) Job submission: –Priority in the batch queue –CPU-time –Wall clock time –Memory usage Storage –Disk usage, –Tape storage usage –Storage class (to be defined) Network data transfer –Network speed –Quantity of data transferred Special software usage, Operator/Administrator services…maybe later
8/13/2004Open Science Grid12 What _we_ think we should track for the storage service Assuming Storage Service has Filename owner VO size Snapshot every t minutes t owner VO GB/days Roll up every n (30?) days n VO GB/day
8/13/2004Open Science Grid13 What’s next? Read more Work at some use cases Add “meat” to the Accounting requirements document Talk to people and get some help!
8/13/2004Open Science Grid14 References Distributed Accounting on the Grid, William Thigpen (NASA IPG), Thomas J. Hacker (University of Michigan), Laura F. McGinnis (Pittsburgh Supercomputer Center), Brian D. Athey (University of Michigan). GridBank: A Grid Accounting Services Architecture (GASA) for Distributed Systems Sharing and Integration, Alexander Barmouta (University of Western Australia),Rajkumar Buyya (University of Melbourne) An Economy-based Accounting Infrastructure for the DataGrid, Albert Werbrouck (INFN of Turin), Rosario Piro (INFN of Turin), Andrea Guarise (INFN of Turin). An OGSA-Based Accounting System for Allocation Enforcement across HPC Centers, Erik Elmroth and Peter Gardfjäll, Umeå University, Sweden