December 1, 2004Rob Quick - iVDGL Grid Operations Center1 Grid Operations Rob Quick Grid Technologist Indiana University Open Science Grid.

Slides:



Advertisements
Similar presentations
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Torsten Antoni – LCG Operations Workshop, CERN 02-04/11/04 Global Grid User Support - GGUS -
Advertisements

Steve Lewis J.D. Edwards & Company
Dec 14, 20061/10 VO Services Project – Status Report Gabriele Garzoglio VO Services Project WBS Dec 14, 2006 OSG Executive Board Meeting Gabriele Garzoglio.
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Collaborative Campus Grid - Practices and experiences in Leiden University Campus Grid (LUCGrid) Hui Li Feb 4, 2005.
Dave Jent, PI Luke Fowler, Co-PI Ron Johnson, Co-PI
Computing and Data Infrastructure for Large-Scale Science Deploying Production Grids: NASA’s IPG and DOE’s Science Grid William E. Johnston
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
MyOSG: A user-centric information resource for OSG infrastructure data sources Arvind Gopu, Soichi Hayashi, Rob Quick Open Science Grid Operations Center.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
1 Dynamic Application Installation (Case of CMS on OSG) Introduction CMS Software Installation Overview Software Installation Issues Validation Considerations.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
Grid Computing - AAU 14/ Grid Computing Josva Kleist Danish Center for Grid Computing
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
Publication and Protection of Site Sensitive Information in Grids Shreyas Cholia NERSC Division, Lawrence Berkeley Lab Open Source Grid.
VOX Project Status T. Levshina. Talk Overview VOX Status –Registration –Globus callouts/Plug-ins –LRAS –SAZ Collaboration with VOMS EDG team Preparation.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
GGF12 – 20 Sept LCG Incident Response Ian Neilson LCG Security Officer Grid Deployment Group CERN.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
Apr 30, 20081/11 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Apr 30, 2008 Gabriele Garzoglio.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Incident Response Plan for the Open Science Grid Grid Operations Experience Workshop – HEPiX 22 Oct 2004 Bob Cowles – Work.
BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Grid Operations Lessons Learned Rob Quick Open Science Grid Operations Center - Indiana University.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Portal Update Plan Ashok Adiga (512)
OSG Integration Activity Report Rob Gardner Leigh Grundhoefer OSG Technical Meeting UCSD Dec 16, 2004.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
Status Organization Overview of Program of Work Education, Training It’s the People who make it happen & make it Work.
The OSG and Grid Operations Center Rob Quick Open Science Grid Operations Center - Indiana University ATLAS Tier 2-Tier 3 Meeting Bloomington, Indiana.
Jan 2010 OSG Update Grid Deployment Board, Feb 10 th 2010 Now having daily attendance at the WLCG daily operations meeting. Helping in ensuring tickets.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Certification and test activity ROC/CIC Deployment Team EGEE-SA1 Conference, CNAF – Bologna 05 Oct
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Site Authorization Service Local Resource Authorization Service (VOX Project) Vijay Sekhri Tanya Levshina Fermilab.
User Support of WLCG Storage Issues Rob Quick OSG Operations Coordinator WLCG Collaboration Meeting Imperial College, London July 7,
Opensciencegrid.org Operations Interfaces and Interactions Rob Quick, Indiana University July 21, 2005.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
Grid Deployment Technical Working Groups: Middleware selection AAA,security Resource scheduling Operations User Support GDB Grid Deployment Resource planning,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
1 Grid2003 Monitoring, Metrics, and Grid Cataloging System Leigh GRUNDHOEFER, Robert QUICK, John HICKS (Indiana University) Robert GARDNER, Marco MAMBELLI,
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
Open Science Grid Interoperability
Accessing the VI-SEEM infrastructure
Regional Operations Centres Core infrastructure Centres
Operations Interfaces and Interactions
Open Science Grid Progress and Status
Monitoring and Information Services Technical Group Report
Ian Bird GDB Meeting CERN 9 September 2003
Incident Response Plan for the Open Science Grid
LCG Operations Centres
Leigh Grundhoefer Indiana University
Supporting Grid Environments
Presentation transcript:

December 1, 2004Rob Quick - iVDGL Grid Operations Center1 Grid Operations Rob Quick Grid Technologist Indiana University Open Science Grid Operations Workshop December 1, 2004

Rob Quick - iVDGL Grid Operations Center. 2 Agenda Introduction to the Operations Effort at IU and the iGOC Efforts, Accomplishments and Lessons Learned Community Care Future Directions

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 3 iVDGL iGOC Mission  Deploy, maintain, and operate Grid3 as a NOC manages a inter- network, providing a single point of operations for configuration support, monitoring of status and usage (current and historical), problem management, support for users, developers and systems administrators, provision of grid services, security incident response, and maintenance of the Grid3 information repository. Staffing:  2 FTE at Indiana University, plus effort from University of Chicago (monitoring development), University Florida at Gainsville (Grid3catalog, web site, site verify script, etc.), and leveraged resources of the 24x7 NOC at Indiana University D. Pearson

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 4 iVDGL iGOC Proposed Areas of Research:  Access control and policy - Security  Trouble Ticket System - Problem coordination  Configuration and Information Services  Health and Status Monitoring  Experiment Scheduling D. Pearson

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 5 iVDGL/Grid3 Operations Approach The iVDGL Grid3 Operations group  Sets up and maintains a cooperative grid community  Facilitates work to and among responsible agents  Has no direct control: uses notification with follow-ups  Tunes services to the capabilities of the sites Cooperative and mentoring principles are employed:  Identifies community vision – i.e. the Project Plan (anchor)  Utilizes a participatory decision making process -- Taskforce  Makes clear agreements -- Service Descriptions and MOUs  Makes clear communication and conflict resolution a priority o Weekly operations (problem solving) and management teleconferences. D. Pearson

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 6 Agenda Introduction to the Operations Effort at IU and the iGOC Efforts, Accomplishments and Lessons Learned Community Care Future Directions

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 7 iGOC – Service Desk Activities  A common face to collaboratively-provided support  Facilitate and support communications: o Direct with site administrators and Grid users o Web page resources o Status reporting to mailing list  Monitor status of Grid resources  Coordinate and track: o Problems o Changes (software updates, resource additions) o Security incidents o Requests for assistance D. Pearson

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 8 iGOC – Service Desk Activities (continued)  Provide reports o Problem summaries, service desk activity  Maintain the repository of support and process information  User support, such as: o How to join a VO o How to get and maintain a cert o How to run an application o How to use monitoring tools o Troubleshooting application failures o Information about policies, etc. D. Pearson

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 9 iGOC – Engineering Activities  Maintain the grid-controlled software packages and cache  Provide site software not supported through VDT  Verify software compatibility  Provide ease-of-installation tools  Develop instructions on how to plug things together  Provide site installation and configuration support  End-to-end troubleshooting for resources  Provide and maintain common Grid services such as VOMS, GIIS, RLS, archives, and monitoring systems D. Pearson

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 11 Operations Enables Applications Provide operational services that provide Applications with the “instruments” to:  Publish site policies and environment  Know the status of grid middleware on sites  Know the job queue for compute resources  Know the status and load of grid resources  Access monitoring archives  Manage VO services  Keep apprised of security incidents in the collaborative D. Pearson

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 12 Resource Monitoring Ganglia: Open source tool to collect cluster monitoring information such as CPU and network load, memory and disk usage Mona LISA: Monitoring and Archiving tool to support resource discovery, access to information and gateway to other information gathering systems ACDC Job Monitoring System: Application using grid submitted jobs to query the job managers and collect information about jobs. This information is stored in a DB and available for aggregated queries and browsing. Metrics Data Viewer (MDViewer): analyzes and plots information collected by the different monitoring tools, such as the DBs at iGOC. Globus MDS: Grid3 Schema for Information Services and Index Services for Information services GridCat: Graphical display of middleware testing results, provides Site database repository also include extended functions for storage, retrievable configuration and human contacts. D. Pearson

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 13 Leveraging the NOC Global NOC at Indiana University  The Global NOC provides 24x7 network engineering and operations services for research and education networks and international interconnections, including Internet2 Abilene, National LambdaRail, TransPAC and AMPATH networks, the STAR TAP and MANLAN layer 3 international exchange points, and the STAR LIGHT optical exchange. In addition, the Global NOC supports activities of the iVDGL Grid Operations Center and the REN-ISAC cybersecurity Watch Desk. By virtue of the R&E network, grid, and cybersecurity activities, the Global NOC possesses a unique and embracing view of R&E cyberinfrastructure. D. Pearson

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 14 Leveraging the NOC 24x7 front line Monitoring (watch for red indicators) Problem management Management overhead D. Pearson

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 15 Analysis of Effort by Area Issues relating to resource owners and providers 60% Special issues for Virtual Organizations (VO’s)20% Issues relating to developers of applications and10% workflow environments (portals) Support to individuals using Grid resources10% D. Pearson

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 16 Provided 24x7 monitoring and problem discovery during Atlas DC2 Successfully interoperated with BNL Tier1 Support Center Provided research advancements toward Grid to VO operations coordination D. Pearson

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 17 iGOC Daily Use Case

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 18 Gridcat Tests Tests are run every 5 hours  authentication (globusrun) (insures that site is in grid map file, equivalent of doing a ping)  helloworld, via globus-job-run (through the fork job manager).  GITS; submit a long job; see if the submit works; if yes then query for that job in the batch queuing system; then cancel job  gsiftp data transfer to and from Test results are world viewable D. Pearson

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 19 Following up on a “Red” Status Test Time GITS Test

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 20 More than 800 tickets created since Jan open tickets

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 21 Ticket Creation since Nov CMS run Atlas run

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 22 Grid3 TT Handling by Type

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 23 Atlas DC2 TT Handling by Type

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 24 Catalog Site History Analysis Grid3 status collected since 08/19/04 B. Kim et al.,

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 25 Use of Grid3 – led by US LHC 7 Scientific applications and 3 CS demonstrators  A third HEP and two biology experiments also participated Over 100 users authorized to run on Grid3  Application execution performed by dedicated individuals  Typically ~few users ran the applications from a particular experiment

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 26 Usage of the Grid3 (6 months) cms dc04 atlas dc2 Sep 10 Usage: CPUs Mar 15

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 27 Lessons Learned Configuration management and assistance efforts in development and deployment are rewarded many times over during production. Middleware updates can be painless. Certificates are a hassle (just like all security) Not all resource information should be public A production monitoring infrastructure including people provides a significant problem solving advantage, esp. redundant monitoring. Resource providers and owners are more responsive and comfortable working with a central operations center. The GOC provides more than operations – it provides focus, continuity of effort, and community. D. Pearson

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 28 Agenda Introduction to the Operations Effort at IU and the iGOC Efforts, Accomplishments and Lessons Learned Community Care Future Directions

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 29 iGOC a General Practitioner for Grid3 General Practitioners provide a complete spectrum of care within the local community: dealing with problems that often combine physical, psychological, and social components. They increasingly work in teams with other professions, helping patients to take responsibility for their own health. (

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 30 iGOC – Division of Problems Physical  The birth  The checkup  The accident  The illness  The specialist referral  The death Psychological  Preventative Care/Corrective Action  The hypochondriac  The anti-hypochondriac Social  The disease  Health reporting  The community vision

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 31 The Birth Addition of a Grid3 Site or VO  Management Approval  Software Installation  Site Verify  Monitoring Setup  Announcement

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 32 The Routine Checkup Monitoring Vital Signs  GridCat  MonALISA  Ganglia  ACDC

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 33 The Accident and the Illness External Failure  Network  Hardware  Power Internal Failure  Grid Software  Grid Services o VOMS o Monitoring o Web Services

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 34 The Specialist Referral When a problem is found and the iGOC does not have the proper access/knowledge to handle it they can make a referral to the group who can fix the issue. This often happens at site and software levels. The iGOC can also watch after fixes are made to be sure there are no negative after effects.

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 35 The Death Site Removal  Removal from Monitoring  Announcement

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 36 Psychological Problems Preventative Care/Corrective Action  Possibility of Upcoming Problems  Are there alternative (better) steps to fix the problem  It will work better if you try this… The Hypochondriac  “The Grid is dying!”  Usually finds problems before others The Anti-Hypochondriac  “Put a Band-Aid on it, it’ll be fine.”

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 37 Social Responsibilities Outbreaks  Security  Software Problems Health Notifications  Heavy Job Loads  Site Effecting Bugs Organizing Community Response  Experts Lists  Upgrade Notifications

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 38 Duties of the iGOC (List of General Practitioners Duties) Diagnose and treat a variety of illnesses Executes tests to provide information about a patients condition Analyzes findings from tests Inoculates, vaccinates, and immunizes patients Advises on diet, hygiene, and disease prevention Provides care for mother and newborns before, during, and after birth Reports statistics (Birth, Death, Disease, etc.) to governmental agencies Refers patients to specialists Performs minor surgery Makes emergency house calls

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 39 Agenda Introduction to the Operations Effort at IU and the iGOC Efforts, Accomplishments and Lessons Learned Community Care Future Directions

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 40 OSG deployment landscape Arch Ops MIS Storage Policy Security OSG deployment site admins TG Security TG Policy TG Mon&Info TG Storage VOs & apps Activity TG Support Centers Chairs Integration R. Gardner

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 41 Support Centers Technical Group is responsible for discussing and coordinating the OSG activities that relate to support centers and services. These services include:  definition of the support model for user, infrastructure, service and technology support.  communication and publication of information for support helpdesk and trouble ticket infrastructures.  communication and interoperation with other grid infrastructures, in particular the LCG/EGEE.

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 42 Challenges “OSG is a project with little central control or resources – almost everything has to be done by the sites or the VOs” The GOC is demonstrated as a valuable central entity, minimally to facilitate, coordinate, establish software caches, monitor, assist in site installation, etc. How to bring these two facts together?

December 1, 2004 Rob Quick - iVDGL Grid Operations Center. 43