Miron Livny Computer Sciences Department University of Wisconsin-Madison The Role of Scientific Middleware in the Future of HEP Computing.

Slides:



Advertisements
Similar presentations
SWITCH Visit to NeSC Malcolm Atkinson Director 5 th October 2004.
Advertisements

Andrew McNab - Manchester HEP - 22 April 2002 EU DataGrid Testbed EU DataGrid Software releases Testbed 1 Job Lifecycle Authorisation at your site More.
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini
High Performance Computing Course Notes Grid Computing.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
EInfrastructures (Internet and Grids) US Resource Centers Perspective: implementation and execution challenges Alan Blatecky Executive Director SDSC.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Miron Livny Computer Sciences Department University of Wisconsin-Madison From Compute Intensive to Data.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
National Center for Supercomputing Applications The Computational Chemistry Grid: Production Cyberinfrastructure for Computational Chemistry PI: John Connolly.
10/20/05 LIGO Scientific Collaboration 1 LIGO Data Grid: Making it Go Scott Koranda University of Wisconsin-Milwaukee.
Patrick R Brady University of Wisconsin-Milwaukee
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
CoG Kit Overview Gregor von Laszewski Keith Jackson.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
Condor Team Welcome to Condor Week #10 (year #25 for the project)
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor : A Concept, A Tool and.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
VDT 1 The Virtual Data Toolkit 7.th EU DataGrid Internal Project Conference Heidelberg / Germany Todd Tannenbaum (Miron Livny) (Alain.
Perspectives on Grid Technology Ian Foster Argonne National Laboratory The University of Chicago.
Introduction to Grid Computing Ed Seidel Max Planck Institute for Gravitational Physics
Bob Jones Technical Director CERN - August 2003 EGEE is proposed as a project to be funded by the European Union under contract IST
Evolution of Grid Projects and what that means for WLCG Ian Bird, CERN WLCG Workshop, New York 19 th May 2012.
Grid Middleware Tutorial / Grid Technologies IntroSlide 1 /14 Grid Technologies Intro Ivan Degtyarenko ivan.degtyarenko dog csc dot fi CSC – The Finnish.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Alain Roy Computer Sciences Department University of Wisconsin-Madison Packaging & Testing: NMI & VDT.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Open Science Grid Open Science Grid: Beyond the Honeymoon Dane Skow Fermilab September 1, 2005.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Middleware for Campus Grids Steven Newhouse, ETF Chair (& Deputy Director, OMII)
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
G Z LIGO's Physics at the Information Frontier Grant and OSG: Update Warren Anderson for Patrick Brady (PIF PI) OSG Executive Board Meeting Caltech.
Alain Roy Computer Sciences Department University of Wisconsin-Madison Condor & Middleware: NMI & VDT.
VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
EGEE is a project funded by the European Union under contract IST NA3 Strategy for Future Grids Malcolm Atkinson Director of the National e-Science.
2005 GRIDS Community Workshop1 Learning From Cyberinfrastructure Initiatives Grid Research Integration Development & Support
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
U.S. Grid Projects and Involvement in EGEE Ian Foster Argonne National Laboratory University of Chicago EGEE-LHC Town Meeting,
NSF Middleware Initiative Purpose To design, develop, deploy and support a set of reusable, expandable set of middleware functions and services that benefit.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
The National Grid Service Mike Mineter.
1 An update on the Open Science Grid for IHEPCCC Ruth Pordes, Fermilab.
Towards deploying a production interoperable Grid Infrastructure in the U.S. Vicky White U.S. Representative to GDB.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Accessing the VI-SEEM infrastructure
Stephen Pickles Technical Director, GOSC
Grid Means Business OGF-20, Manchester, May 2007
Leigh Grundhoefer Indiana University
Status of Grids for HEP and HENP
Presentation transcript:

Miron Livny Computer Sciences Department University of Wisconsin-Madison The Role of Scientific Middleware in the Future of HEP Computing e

This talk focuses on the how rather than on the what

Can we do it? Does the scientific community (scientists from different disciplines and the funding agencies) have the know how, the resources and the will to develop, maintain, document, evolve and support a common (and shared) suite of production quality middleware that meets the needs and expectations of the HEP community?

Why is Grid Middleware Such a Challenge? (the road from CHEP2000 to CHEP2003) My CHEP 03 talk

Because developing good software is not easy and Distributed Computing is a hard problem! (do not try it at home!) My CHEP 03 talk

Since CHEP 03 we … › Delivered distributed (grid) computing and data management functionality to HEP applications via Grid-3, LCG-1, NurdoGrid, SAM Grid, … › Secured funding for a middleware effort as part of EGEE, the OMII activity and the continuation of the PPDG work and the NMI-GRIDS center

Observations › The Virtual Data Toolkit (VDT) provides a common foundation for most HEP deployments › Recognition of the value and price of interdisciplinary collaboration › No “silver bullets” › Closer coordination in the development of interfaces (e.g. SRM) and end-to-end software stacks (e.g. gLite) › Efforts to leverage, harmonize and/or share build and test infrastructure and capabilities › Closer relationships with other scientific domains (sharing of capabilities and requirements) › Sharing and coordination of experiences and requirements in Authentication, Authorization and Accounting areas

VDT Growth VDT 1.0 Globus 2.0b Condor VDT 1.1.3, & pre-SC 2002 VDT Switch to Globus 2.2 VDT Grid2003 VDT First real use by LCG

The Build Process Sources (CVS) Patching GPT src bundles NMI Build & Test Condor pool (~40 computers) … Build Test Package VDT Build Contributors (VDS, etc.) Build Pacman cache RPMs Binaries Test Hope use NMI processes soon Note patches

Tools in the VDT Condor Group  Condor/Condor-G  Fault Tolerant Shell  ClassAds Globus Alliance  Job submission (GRAM)  Information service (MDS)  Data transfer (GridFTP)  Replica Location (RLS) EDG & LCG  Make Gridmap  Certificate Revocation List Updater  Glue Schema/Info prov. ISI & UC  Chimera & Pegasus NCSA  MyProxy  GSI OpenSSH  UberFTP LBL  PyGlobus  Netlogger Caltech  MonaLisa VDT  VDT System Profiler  Configuration software Others  KX509 (U. Mich.)  DRM 1.2  Java  FBSng job manager Components built by NMI

Tools in the VDT Condor Group  Condor/Condor-G  Fault Tolerant Shell  ClassAds Globus Alliance  Job submission (GRAM)  Information service (MDS)  Data transfer (GridFTP)  Replica Location (RLS) EDG & LCG  Make Gridmap  Certificate Revocation List Updater  Glue Schema/Info prov. ISI & UC  Chimera & Pegasus NCSA  MyProxy  GSI OpenSSH  UberFTP LBL  PyGlobus  Netlogger Caltech  MonaLisa VDT  VDT System Profiler  Configuration software Others  KX509 (U. Mich.)  DRM 1.2  Java  FBSng job manager Components built by contributors

Tools in the VDT Condor Group  Condor/Condor-G  Fault Tolerant Shell  ClassAds Globus Alliance  Job submission (GRAM)  Information service (MDS)  Data transfer (GridFTP)  Replica Location (RLS) EDG & LCG  Make Gridmap  Certificate Revocation List Updater  Glue Schema/Info prov. ISI & UC  Chimera & Pegasus NCSA  MyProxy  GSI OpenSSH  UberFTP LBL  PyGlobus  Netlogger Caltech  MonaLisa VDT  VDT System Profiler  Configuration software Others  KX509 (U. Mich.)  DRM 1.2  Java  FBSng job manager Components built by VDT

(unique) Challenges › Name branding and distribution of credit and recognition › Building and maintaining teams of middleware developers (this includes support, documentation, testing, …) › Longevity and stability of funding

The Condor Experience (86-04)

Yearly Condor usage at UW-CS 10,000,000 8,000,000 6,000,000 4,000,000 2,000,000

Slides used by UWCS with permission of Micron Technology, Inc. Condor at Micron processors in 11 “pools” Linux, Solaris, Windows <50 th Top500 Rank 3+ TeraFLOPS Centralized governance Distributed management 16+ applications Self developed Micron’s Global Grid

Slides used by UWCS with permission of Micron Technology, Inc.  The Chief Officer value proposition ■ Info Week 2004 IT Survey includes Grid questions!  Makes our CIO look good by letting him answer yes  Micron’s 2003 rank: 23 rd ■ Without Condor we only get about 25% of PC value today  Did’t tell our CFO a $1000 PC really costs $4000!  Doubling utilization to 50% doubles CFO’s return on capital  Micron’s goal: 66% monthly average utilization ■ Providing a personal supercomputer to every engineer  CTO appreciates the cool factor  CTO really “gets it” when his engineer’s say: I don’t know how I would have done that without the Grid Condor at Micron

Slides used by UWCS with permission of Micron Technology, Inc. Condor at Micron Example Value job hours / 24 / 30 = 103 Solaris boxes 103 * $10,000/box = $1,022,306 And that’s just for one application not considering decreased development time, increased uptime, etc. Chances are if you have Micron memory in your PC, it was processed by Condor!

Condor at Oracle Condor is used within Oracle's Automated Integration Management Environment (AIME) to perform automated build and regression testing of multiple components for Oracle's flagship Database Server product. Each day, nearly 1,000 developers make contributions to the code base of Oracle Database Server. Just the compilation alone of these software modules would take over 11 hours on a capable workstation. But in addition to building, AIME must control repository labelling/tagging, configuration publishing, and last but certainly not least, regression testing. Oracle is very serious about the stability and correctness about their products. Therefore, the AIME daily regression test suite currently covers 90,000 testable items divided into over 700 test packages. The entire process must complete within 12 hours to keep development moving forward. About five years ago, Oracle selected Condor as the resource manager underneath AIME because they liked the maturity of Condor's core components. In total, ~3000 CPUs at Oracle are managed by Condor today.

Condor at Noregon Noregon has entered into a partnership with Targacept Inc. to develop a system to efficiently perform molecular dynamics simulations. Targacept is a privately held pharmaceutical company located in Winston-Salem's Triad Research Park whose efforts are focused on creating drug therapies for neurological, psychiatric, and gastrointestinal diseases. … Using the Condor® grid middleware, Noregon is designing and implementing an ensemble Car-Parrinello simulation tool for Targacept that will allow a simulation to be distributed across a large grid of inexpensive Windows® PC’s. Simulations can be completed in a fraction of the time without the use of high performance (expensive) hardware. >At 10:14 AM 7/15/ , xxx wrote: >Dr. Livny: >I wanted to update you on our progress with our grid computing >project. We have about 300 nodes deployed presently with the ability to >deploy up to 6,000 total nodes whenever we are ready. The project has >been getting attention in the local press and has gained the full support >of the public school system and generated a lot of excitement in the >business community.

In other words, I believe that the answer is – “Yes, we can do it!”

Should we do it? Should the scientific community invest the resources needed to build and maintain the infrastructure for developing, maintaining and supporting production quality middleware or should we expect commercial entities to sell us the middleware and services we need?

Innovation Access to production quality middleware enables experimentation with new (out of the box) ideas and paradigms.  “real” users,  “real” applications and  “realistic” scales (time, distribution, number of resources, number of files, number of jobs, … ).

Physics Computer Science Trillium A meeting point of two sciences

The CS Perspective u Application needs are instrumental in the formulation of new frameworks and Information Technologies (IT) u Scientific applications are an excellent indicator to future IT trends u The physics community is at the leading edge of IT u Experimentation (quantitative evaluation) is fundamental to the scientific process u Requires robust software materialization of new technology u Requires an engaged community of consumers u Multi disciplinary teams hold the key to advances in IT u Collaboration across CS disciplines and projects (intra-CS) u Collaboration with domain scientists

The Scientific Method u Deployment of end-to-end capabilities u Advance the computational and or data management capabilities of a community u Based on coordinated design and implementation u Teams of domain and computer scientists u May span multiple CS areas u Mission focused u From design to deployment

Impact on Computer Science u Grid-2003 has a profound impact on the members of the CS team because it u encourages and facilitates collaboration across CS groups, u requires work in closely coordinated CS-Physics teams, u provides an engaged user community, u enables real-life deployment and experimentations, u facilitates a structured feedback process, u exposes the constraints of “real-life” fabric and u provides long term and evolving operational horizon

Some (non traditional) challenges u Packaging, distribution and deployment of middleware and application software u Trouble shooting of deployed software stack u Monitoring of Hardware and Software components u Multi VO allocation policies for processing and storage resources.

Education and Training Involvement in the development, maintenance and support of production quality middleware provides an environment in which students learn “hands on” the do and do-not of software engineering and experience the life-cycle of software.

A view on Education “The Condor project provides unique and irreplaceable educational opportunities that in turn are leveraged by commercial operations like Micron's. Also, the opportunity to hire graduates skilled in the tools and methods of HPC are unparalleled through efforts like the Condor project.” From a recent NSF review.

A view on Training “In terms of training, the organization of this effort (the Condor project) relies on students in CS who are working in the area of distributed computing related topics. They get the exposure to the research, but it comes with the price of participating in the production software engineering activities. The result is a well-rounded training experience and graduates whose skills are in much demand.” From a recent NSF review

Yes, we can and should do it!