TeraGrid-Wide Operations DRAFT #2 Mar 31 Von Welch.

Slides:



Advertisements
Similar presentations
Scaling TeraGrid Access A Testbed for Attribute-based Authorization and Leveraging Campus Identity Management
Advertisements

1 US activities and strategy :NSF Ron Perrott. 2 TeraGrid An instrument that delivers high-end IT resources/services –a computational facility – over.
University of Florida Incident Tracking and Reporting Kathy Bergsma
Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:
Science Gateway Security Recommendations Jim Basney Von Welch This material is based upon work supported by the.
(e)Science-Driven, Production- Quality, Distributed Grid and Cloud Data Infrastructure for the Transformative, Disruptive, Revolutionary, Next-Generation.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
TeraGrid Science Gateway AAAA Model: Implementation and Lessons Learned Jim Basney NCSA University of Illinois Von Welch Independent.
Sergey Belov, Tatiana Goloskokova, Vladimir Korenkov, Nikolay Kutovskiy, Danila Oleynik, Artem Petrosyan, Roman Semenov, Alexander Uzhinskiy LIT JINR The.
Core Services I & II David Hart Area Director, UFP/CS TeraGrid Quarterly Meeting December 2008.
User Services. Services Desktop Support Technical Support Help Desk User Services Customer Relationship Management.
Network, Operations and Security Area Tony Rimovsky NOS Area Director
SOA – Development Organization Yogish Pai. 2 IT organization are structured to meet the business needs LOB-IT Aligned to a particular business unit for.
NOS Objectives, YR 4&5 Tony Rimovsky. 4.2 Expanding Secure TeraGrid Access A TeraGrid identity management infrastructure that interoperates with campus.
GIG Software Integration: Area Overview TeraGrid Annual Project Review April, 2008.
Scaling Account Creation and Management through the TeraGrid User Portal Contact: Eric Roberts
GridChem-- User Support Kent Milfeld Supported by the NSF NMI Program under Award # Oct. 10, 2005.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
1 TeraGrid ‘10 August 2-5, 2010, Pittsburgh, PA State of TeraGrid in Brief John Towns TeraGrid Forum Chair Director of Persistent Infrastructure National.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
1 Preparing Your Application for TeraGrid Beyond 2010 TG09 Tutorial June 22, 2009.
Data Area Report Chris Jordan, Data Working Group Lead, TACC Kelly Gaither, Data and Visualization Area Director, TACC April 2009.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
1 PY4 Project Report Summary of incomplete PY4 IPP items.
National Computational Science National Center for Supercomputing Applications National Computational Science NCSA-IPG Collaboration Projects Overview.
Federated Environments and Incident Response: The Worst of Both Worlds? A TeraGrid Perspective Jim Basney Senior Research Scientist National Center for.
TeraGrid Privacy Policy: What is it and why are we doing it… Von Welch TeraGrid Quarterly Meeting March 6, 2008.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 TeraGrid and the Path to Petascale John Towns Chair, TeraGrid Forum Director, Persistent Infrastructure National Center for Supercomputing Applications.
TeraGrid Operations Overview Mike Pingleton NCSA TeraGrid Operations December 2 nd, 2004.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Summary of AAAA Information David Kelsey Infrastructure Policy Group, Singapore, 15 Sep 2008.
Leveraging the InCommon Federation to access the NSF TeraGrid Jim Basney Senior Research Scientist National Center for Supercomputing Applications University.
TeraGrid NOS Turnover Jeff Koerner Q meeting December 8, 2010.
TeraGrid Quarterly Meeting Arlington, VA Sep 6-7, 2007 NCSA RP Status Report.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
Sergiu April 2006June 2006 Overview of TeraGrid Resources and Services Sergiu Sanielevici, TeraGrid Area Director for User.
NOS Report Jeff Koerner Feb 10 TG Roundtable. Security-wg In Q a total of 11 user accounts and one login node were compromised. The Security team.
User-Facing Projects Update David Hart, SDSC April 23, 2009.
Data Area Report Chris Jordan, Data Working Group Lead, TACC Kelly Gaither, Data and Visualization Area Director, TACC April 2009.
Network, Operations and Security Area Tony Rimovsky NOS Area Director
TeraGrid-Wide Operations Von Welch Area Director for Networking, Operations and Security NCSA, University of Illinois April, 2009.
Remote & Collaborative Visualization. TACC Remote Visualization Systems Longhorn – Dell XD Visualization Cluster –256 nodes, each with 48 GB (or 144 GB)
TeraGrid User Portal Migration Project Summery Jeff Koerner Director of Operations TeraGrid GIG Matt Heinzel Director TeraGrid GIG September 2009.
SAN DIEGO SUPERCOMPUTER CENTER Welcome to the 2nd Inca Workshop Sponsored by the NSF September 4 & 5, 2008 Presenters: Shava Smallen
Quality Assurance (QA) Working Group Update July 1, 2010 Kate Ericson (SDSC) Shava Smallen (SDSC)
Leveraging the InCommon Federation to access the NSF TeraGrid Jim Basney, Terry Fleury, Von Welch TeraGrid Round Table Update May 21, 2009.
TeraGrid Program Year 5 Overview John Towns Chair, TeraGrid Forum Director, Persistent Infrastructure National Center for Supercomputing Applications University.
TG ’08, June 9-13, State of TeraGrid John Towns Co-Chair, TeraGrid Forum Director, Persistent Infrastructure National Center for Supercomputing.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
TeraGrid’s Process for Meeting User Needs. Jay Boisseau, Texas Advanced Computing Center Dennis Gannon, Indiana University Ralph Roskies, University of.
TeraGrid Software Integration: Area Overview (detailed in 2007 Annual Report Section 3) Lee Liming, JP Navarro TeraGrid Annual Project Review April, 2008.
Quality Assurance Working Group Doru Marcusiu, NCSA QA Working Group Lead TeraGrid Annual Review April, 2009.
TeraGrid Accounting System Progress and Plans David Hart July 26, 2007.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign This material is based upon work supported by the National Science.
TeraGrid User Portal and Online Presence David Hart, SDSC Area Director, User-Facing Projects and Core Services TeraGrid Annual Review April 6, 2009.
Bob Jones EGEE Technical Director
Ian Bird GDB Meeting CERN 9 September 2003
POW MND section.
Stephen Pickles Technical Director, GOSC
THE STEPS TO MANAGE THE GRID
GGUS Partnership between FZK and ASCC
Patrick Dreher Research Scientist & Associate Director
Leigh Grundhoefer Indiana University
Federated Environments and Incident Response: The Worst of Both Worlds
The National Grid Service Mike Mineter NeSC-TOE
TeraGrid Identity Federation Testbed Update I2MM April 25, 2007
Presentation transcript:

TeraGrid-Wide Operations DRAFT #2 Mar 31 Von Welch

Highlights TeraGrid surpassed 1 petaflops of aggregate computing. –Aggregate compute power available is 3.5x times from 2007 to –Primarily result of Track 2 systems at TACC and NICS coming online. –NUs used and allocated is ~4x times from 2007 to Significant improvement in the instrumentation, including tracking of grid usage and data transfers. Inca providing historical tracking of software and service reliability along with a new interface for both users and administrators. An international security incident touched TeraGrid, resulting in a very strong incident response as well as improved procedures for a new attack vector. Improvements in authentication procedures and cross- resource single-sign-on.

Big Picture Resource Changes Sun Constellation Cluster (Ranger) at TACC, Feb ’08 –Initially 504 Tflops; upgraded in July 2008 to ~63,000 compute cores and 580 Tflops Cray XT4 (Kraken) at NICS, Aug ’08 –166 Tflops and 18,000 computing core cores Additional resources that entered production in 2008: –Two Dell PowerEdge 1950 clusters: 668-node system at LONI (QueenBee) and the 893-node system at Purdue (Steele) –PSC’s SGI Altix 4700 shared-memory NUMA system (Pople) –FPGA-based resource at Purdue (Brutus) –Remote visualization system at TACC (Spur) Other improvements: –Condor Pool at Purdue also grew from 7,700 to more than 22,800 processor cores. –Indiana integrated its Condor resources with the Purdue flock, simplifying use. Decommissioned systems: –NCSA’s Tungsten, PSC’s Rachel, Purdue’s Lear, SDSC’s DataStar and Blue Gene, and TACC’s Maverick.

TeraGrid HPC Usage, B NUs in Q Kraken, Aug Ranger, Feb B NUs in 2007 In 2008, Aggregate HPC power increased by 3.5x NUs requested and awarded quadrupled NUs delivered increased by 2.5x In 2008, Aggregate HPC power increased by 3.5x NUs requested and awarded quadrupled NUs delivered increased by 2.5x

TeraGrid Operations Center Created 7,762 tickets Resolved 2,652 tickets (34%) Took 675 phone calls Resolved 454 phone calls (67%) Manage TG ticket system and 24x7 toll-free call center Respond to all users and provide front-line resolution if possible -34% resolution rate Route remaining tickets to RP sites and other second-tier resolution centers Maintain situational awareness across the TG project (upgrades, maintenance, etc.)

Instrumentation and Monitoring Monitoring and statistics gathering for TG services –E.g. Backbone, Grid Services (GRAM, GridFTP) Used for measuring adoption, detecting problems, resource provisioning.

Inca Grid Monitoring System Automated, user-level testing to improves reliability by detecting Grid infrastructure problems. –Provides detailed information about tests and their execution to aid in debugging problems. Originally designed for TeraGrid, used in other large-scale projects including ARCS, DEISA, and NGS. Improvements in 2008 include new version of the Inca Web server, which provides for custom views of latest results. –The TeraGrid User Portal uses custom view of SSH and batch job tests in resources viewer. Added notification upon test failures. New historical views were created to summarize overall data trends. Developed a plug-in that allows Inca to recognize scheduled downtimes 20 new tests written and 77 TeraGrid tests were modified. 2,538 pieces of test data are being collected.

TeraGrid Backbone Network Provides dedicated high-speed interconnect between TG high-end resources. TeraGrid 10 Gb/s backbone runs from Chicago to to Denver to Los Angeles. Contracted from NLR. Dedicated 10 Gb/s link(s) from each RP to one of the three core routers. Map image from Indiana University

Security Gateway Summit to develop understanding of security needs of RPs and Gateways. –Co-organized with Science Gateways team –30 attendees for RP sites and Gateways User Portal Password Reset Procedure Risk Assessments for Science Gateways and User Portal TAGPMA participation and leadership Uncovered large-scale attack in collaboration with EU Grid partners. –Established secure communications: Secure Wiki, SELS

Single Sign-on Java-based GSI-SSHTERM application added to User Portal –Consistently in top 5 apps. –Augments command-line functionality already in place. Replicating MyProxy CA at PSC to provide catastrophic failover for server at NCSA. –Implemented client changes on RPs and User Portal for failover. Developed a set of guidelines for management of grid identities (X.509 distinguished names) in the TeraGrid Central Database (TGCDB) and RP sites. –Tests written for TGCDB; Inca tests for RPs will follow. Started technical implementation of Shibboleth support for User Portal –TeraGrid now member of InCommon (as a service provider) –Will transfer to new Internet Framework.

END OF PRESENTATION Reference material and future plans slides for Towns follow.

Allocation Statistics

New Resources for 2009 NICS Kraken system was upgraded in February 2009 to a 66,048-core, 600-Tflops Cray XT5 system. NCSA placed the 192-node GPU-accelerated Dell PowerEdge 1950 cluster, Lincoln, into production. Further planned additions for 2009 include NCAR’s a Sun Ultra 40 system dedicated to data analysis and visualization.

Inca Plans for 2009 Integration of Inca into Internet Framework. Create interface for RP administratorss to execute tests on-demand. Integrate with ticket systems to connect tickets to tests. Start work on a Knowledge Base for errors, causes and solutions. Development and maintain views based on needs and output of QA and CUE groups.

SSO Plans for 2009 Complete PSC deployment of backup MyProxy service. Complete integration of Shibboleth support into Internet Framework –Develop full trust model for TeraGrid/Campuses –Start recruiting campuses and growing usage Work on bridging authorization with OSG and EGEE to support other activities.

Other Continuing Tasks TOC –24x7x365 point of contact –Trouble ticket creation and management Helpdesk –First tier support at RP sites integrated with TOC Instrumentation services Backbone network and network coordination Security Coordination, TAGPMA, etc.