Progress on TeraGrid Stability for the LEAD project.

Slides:



Advertisements
Similar presentations
IVOA Interoperablity meeting KyotoMasahiro Tanaka (NAOJ) 1 JVO use of Globus Toolkit Masahiro Tanaka (NAOJ)
Advertisements

Scaling TeraGrid Access A Testbed for Attribute-based Authorization and Leveraging Campus Identity Management
SAN DIEGO SUPERCOMPUTER CENTER Inca 2.0 Shava Smallen Grid Development Group San Diego Supercomputer Center June 26, 2006.
Test harness and reporting framework Shava Smallen San Diego Supercomputer Center Grid Performance Workshop 6/22/05.
OSG/TeraGrid Interopations: The Authz Perspective Von Welch (NCSA) Presenting work by Christopher A. Baumbauer (Purdue U.) Greg Cross (U. Chicago) Stuart.
Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Quality Assurance (QA) Working Group Update February 11, 2010 Kate Ericson (SDSC) Shava Smallen (SDSC)
Status Report: JLDG ( T. Yoshie for JLDG) AGENDA 1. Current Status of JLDG 2. Reconfiguration/Extension Plan 3. Funding.
How to Write Grants Version 2009.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Education in the Science 2.0 Era.
IS 421 Information Systems Management James Nowotarski 16 September 2002.
Addressing software engineering issues in student software projects across different curricula Dušanka Bošković Computing and Informatics Bachelor Programme.
EE694v-Verification-Lect5-1- Lecture 5 - Verification Tools Automation improves the efficiency and reliability of the verification process Some tools,
TeraGrid Science Gateway AAAA Model: Implementation and Lessons Learned Jim Basney NCSA University of Illinois Von Welch Independent.
Development and Quality Plans
Load Test Planning Especially with HP LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Network, Operations and Security Area Tony Rimovsky NOS Area Director
Monitoring and Accounting on the NGS Guy Warner NeSC TOE Team.
GRC Asset Management System Implementation and Integration
Module CC3002 Post Implementation Issues Lecture for Week 4 AY 2013 Spring.
GIG Software Integration: Area Overview TeraGrid Annual Project Review April, 2008.
TeraGrid Information Services December 1, 2006 JP Navarro GIG Software Integration.
Scaling Account Creation and Management through the TeraGrid User Portal Contact: Eric Roberts
TeraGrid Information Services John-Paul “JP” Navarro TeraGrid Grid Infrastructure Group “GIG” Area Co-Director for Software Integration and Information.
LBTO IssueTrak User’s Manual Norm Cushing version 1.3 August 8th, 2007.
INFSO-RI Enabling Grids for E-sciencE SRMv2.2 experience Sophie Lemaitre WLCG Workshop.
TeraGrid Information Services JP Navarro, Lee Liming University of Chicago TeraGrid Architecture Meeting September 20, 2007.
Dynamic Firewalls and Service Deployment Models for Grid Environments Gian Luca Volpato, Christian Grimm RRZN – Leibniz Universität Hannover Cracow Grid.
Service Request Desk How we can help each other, help each other.
MyFloridaMarketPlace MyFloridaMarketPlace Change Request Board August 30, 2007.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Copyright 2007, Information Builders. Slide 1 So You Just Bought WebFOCUS… Dan Schultz Director June, 2008.
UFP/CS Update David Hart. Highlights Sept xRAC results POPS Allocations RAT follow-up User News AMIE WebSphere transition Accounting Updates Metrics,
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Data Transfers in the Grid: Workload Analysis of Globus GridFTP Nicolas Kourtellis, Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi University of South.
June 6, 2007TeraGrid '071 Clustering the Reliable File Transfer Service Jim Basney and Patrick Duda NCSA, University of Illinois This material is based.
Where to go and what to do next — resources, funding, mentors Thursday afternoon Ruth Pordes, Executive Director Fermilab.
TeraGrid CTSS Plans and Status Dane Skow for Lee Liming and JP Navarro OSG Consortium Meeting 22 August, 2006.
Turning Software Projects into Production Solutions Dan Fraser, PhD Production Coordinator Open Science Grid OU Supercomputing Symposium October 2009.
SAN DIEGO SUPERCOMPUTER CENTER Inca TeraGrid Status Kate Ericson November 2, 2006.
OSG Technology Area Brian Bockelman Area Coordinator’s Meeting February 15, 2012.
Reliable File Transfer: Lessons Learned Bill Allcock, ANL Ravi Madduri, ANL.
© 2009 IBM Corporation Maximize Cost Savings While Improving Visibility Into Lines of Business Wendy Tam, CDC Product Marketing Manager
SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008.
Portal Update Plan Ashok Adiga (512)
August 30, 2002Jerry Gieraltowski Launching ATLAS Jobs to either the US-ATLAS or EDG Grids using GRAPPA Goal: Use GRAPPA to launch a job to one or more.
Status Organization Overview of Program of Work Education, Training It’s the People who make it happen & make it Work.
Data, Visualization and Scheduling (DVS) TeraGrid Annual Meeting, April 2008 Kelly Gaither, GIG Area Director DVS.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
SAN DIEGO SUPERCOMPUTER CENTER Welcome to the 2nd Inca Workshop Sponsored by the NSF September 4 & 5, 2008 Presenters: Shava Smallen
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE4004 ATLAS CMS LHCb Totals
FCT Refresher: Getting the Support You Need By: Lauren Stanisic.
Attribute-based Authentication for Gateways Jim Basney Terry Fleury Stuart Martin JP Navarro Tom Scavo Nancy Wilkins-Diehr.
TeraGrid QA/INCA Turnover Jeff Koerner Q meeting December 8, 2010.
Quality Assurance (QA) Working Group Update July 1, 2010 Kate Ericson (SDSC) Shava Smallen (SDSC)
CTSS Rollout update Mike Showerman JP Navarro April
Preparation of the Body. Hello, Welcome, Failte, Aloha! Complete the “Continuous Training “ sheet for your notes: - Three things you know about Continuous.
Monitoring Guy Warner NeSC Training.
TG Quarterly Meeting Breckenridge, CO Apr 11, 2007 NCSA TG RP Update 1Q07.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
TeraGrid’s Process for Meeting User Needs. Jay Boisseau, Texas Advanced Computing Center Dennis Gannon, Indiana University Ralph Roskies, University of.
TeraGrid Software Integration: Area Overview (detailed in 2007 Annual Report Section 3) Lee Liming, JP Navarro TeraGrid Annual Project Review April, 2008.
Quality Assurance Working Group Doru Marcusiu, NCSA QA Working Group Lead TeraGrid Annual Review April, 2009.
The FAST Report Scheduler
LEAD-VGrADS Day 1 Notes.
1 VO User Team Alarm Total ALICE ATLAS CMS
System Review – The Forgotten Implementation Step
Leave the driving to Us with the Progress Managed Database Service
Starfish Training Erie Community College
Presentation transcript:

Progress on TeraGrid Stability for the LEAD project

History Reliability problems for LEAD –2006 Unidata workshop –Spring 2007 Weather Challenge –Continued “heroics” needed by staff every time more than a handful of users used the gateway 10/25/08 ARCH call topic raised by Dane “I think it's time to raise this discussion in the broader venue of the ARCH meeting. We have been raising the profile of this investigation and trying to come to a persistent resolution of the problem. We're putting attention on this among the management and it should be reflected to the working teams. There also continue to be misunderstanding and different expectations and it would be good to set those clearly.”

Gateway-debug calls initiated 10/25/07 Goal –Stable systems for LEAD to conduct student Weather Challenge with 67 universities Runs start 1/28/08 –Improve stability of grid services for all users at all TeraGrid sites Eliminate need for staff heroics

Get the right staff on the gateway- debug calls Original request for –knowledgeable LEAD rep –knowledgeable Globus rep –knowledgeable NCSA RP rep –knowledgeable IU RP rep –knowledgeable GRAM rep –knowledgeable gridftp rep –knowledgeable Inca rep –knowledgeable TG operations rep

gateway-debug activities Understand the problems –Suresh creates

With some humor Overloaded GridFTP servers m/v/4wp3m1vg06Q&hl =enhttp:// m/v/4wp3m1vg06Q&hl =en

Create testbed where we can implement solutions rapidly –Only at sites LEAD was trying to use ANL, NCSA, IU Software and hardware configuration changes on the testbed –Non-striped GridFTP servers –Globus which includes GRAM scalability improvements –RFT improvements Develop tests that simulate what LEAD does –GRAM, GridFTP, javaCOG

Inca Use Inca to run LEAD tests –Inca run once per day on production sites Version tests, limited functionality tests –Frequency greatly increased for testbed Every 5 min. “are you alive” tests Once an hour “can I get a job into a queue” test –These can be tuned, back off when a service proves it is stable –Automatic admin notification –These last two were the key!!

Inca results reviewed at each call 085/cgi-bin/lead.cgihttp://cuzco.sdsc.edu:8 085/cgi-bin/lead.cgi –Still lots of errors this past week Summary sent before gateway-debug –Issues addressed on the call –Follow-up on actions from previous week

Gateway-debug work moving to ops-wg Maintain testbed –For now, maintain as stable infrastructure for LEAD Having trouble today with testbed stability –In the future Use testbed and Inca structure to verify reliability of new versions of CTSS before it goes into production Improve simulated scalability tests and produce benchmark (before asking Users/Gateways to participate) Turn focus on production systems –Increase testing frequency enough to be able to determine stability Once per day is not enough –Automatic notification of sys admins

Let’s learn from this experience Increased testing Automatic sys admin notification Having the right staff on the calls as needed Weekly reviews of test The above items are what moved us along We need to continue paying attention if we expect to have a stable environment for Gateways and users of grid services Stay tuned for progress in ops-wg

Thank You To lots of folks, but especially Suresh Marru Doru Marcusiu Kate Ericson, Shava Smallen Derek Simmel, Robert Budden Mike Lowe, Jenette Tillotson Stu Martin, Dan Fraser Raj Kettimuthu, John Bresnahan Ravi Madduri