Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor- a Project and a System.

Slides:



Advertisements
Similar presentations
Grid Computing at The Hartford OGF22 February 27, 2008 Robert Nordlund
Advertisements

IBM Software Group ® Integrated Server and Virtual Storage Management an IT Optimization Infrastructure Solution from IBM Small and Medium Business Software.
© Pearson Prentice Hall 2009
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
High Performance Computing Course Notes Grid Computing.
GPU Computing with Hartford Condor Week 2012 Bob Nordlund.
Capacity and Chargeback Virtual Appliance for VMware ESX October 23, 2007 Alex Bakman.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Risk Modeling with Condor at The Hartford Condor Week March 15, 2005 Bob Nordlund The Hartford
Increasing Your Impact Through Digital Technology Rae Davies Communities 2.0 Circuit Rider.
MIGRATING INTO A CLOUD P. Sai Kiran. 2 Cloud Computing Definition “It is a techno-business disruptive model of using distributed large-scale data centers.
Miron Livny Computer Sciences Department University of Wisconsin-Madison From Compute Intensive to Data.
Peter Couvares Computer Sciences Department University of Wisconsin-Madison High-Throughput Computing With.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Welcome to CW 2007!!!. The Condor Project (Established ‘85) Distributed Computing research performed by.
Information ITIL Technology Infrastructure Library ITIL.
Large, Fast, and Out of Control: Tuning Condor for Film Production Jason A. Stowe Software Engineer Lead - Condor CORE Feature Animation.
Integrating Security Design Into The Software Development Process For E-Commerce Systems By: M.T. Chan, L.F. Kwok (City University of Hong Kong)
Miron Livny Computer Sciences Department University of Wisconsin-Madison Taking stock of Grid technologies - accomplishments and challenges.
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
Miron Livny Center for High Throughput Computing Computer Sciences Department University of Wisconsin-Madison Open Science Grid (OSG)
TeamCluster Project Real time project management solutions Harry Hvostov April 27, 2002.
Condor In Flight at The Hartford 2006 Transformations Condor Week 2007 Bob Nordlund.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Condor Team Welcome to Condor Week #10 (year #25 for the project)
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor : A Concept, A Tool and.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
1 IBM TIVOLI Business Continuance Seminar Training Document.
Condor week – April 2006Artyom Sharov, Technion, Haifa1 Adding High Availability to Condor Central Manager Artyom Sharov Technion – Israel Institute of.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Enterprise Cloud Computing
Miron Livny Computer Sciences Department University of Wisconsin-Madison The Role of Scientific Middleware in the Future of HEP Computing.
Nov 22/26 Tech Forum 2015 Roberto Trinconi Cloud the New Path to the Business Leadership.
1876 Canterbury Drive, Las Vegas, NV Book New Customers with Online Global Marketplace.
Condor Services for the Global Grid: Interoperability between OGSA and Condor Clovis Chapman 1, Paul Wilson 2, Todd Tannenbaum 3, Matthew Farrellee 3,
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Welcome!!! Condor Week 2006.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Introduction.
Since computing power is everywhere, how can we make it usable by anyone? (From Condor Week 2003, UW)
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Peter Idoine Managing Director Oracle New Zealand Limited.
Chapter 2 Operating Systems
Information ITIL Technology Infrastructure Library ITIL.
Condor A New PACI Partner Opportunity Miron Livny
SAP in ERP – A Bird’s Eye View
The Marshall University Experience with Implementing Project Server 2003 August 9, 2005 Presented by: Chuck Elliott, M.S. Associate Director, Customer.
Using MIS 2e Chapter 11 Information Systems Management
Responsibilities & Tasks Week 2
Session 1 What is Strategy?
Enterprise Resource Planning (ERP)
Semiconductor Manufacturing (and other stuff) with Condor
Networks Software.
Dean Martin Cadwallader Dean of the Graduate School
© Pearson Prentice Hall 2009
Example of usage in Micron Italy (MIT)
Basic Grid Projects – Condor (Part I)
Optena: Enterprise Condor
CAD DESK PRIMAVERA PRESENTATION.
Grid Laboratory Of Wisconsin (GLOW)
GLOW A Campus Grid within OSG
Presentation transcript:

Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor- a Project and a System

The Condor Project (Established ‘85) Distributed Computing research performed by a team of ~40 faculty, full time staff and students who  face software/middleware engineering challenges in a UNIX/Linux/Windows/OS X environment,  involved in national and international collaborations,  interact with users in academia and industry,  maintain and support a distributed production environment (more than 3300 CPUs at UW),  and educate and train students. Funding – DoE, NASA, NIH, NSF, EU, INTEL, Micron, Microsoft and the UW Graduate School

Functionality Research SupportSupport

our answer to High Throughput MW Computing on commodity resources

Novel

Matchmaker The Layers of Condor Submit (client) Customer Agent (schedD) Application Application Agent Owner Agent (startD) Execute (service) Remote Execution Agent Local Resource Manager Resource

Yearly Condor usage at UW-CS 10,000,000 8,000,000 6,000,000 4,000,000 2,000,000

Yearly Condor CPUs at UW

Flexible

PSE or User SchedD (Condor G) G-app Local Remote Condor C-app MM Grid Tools PBSLSF Condor MM StartD (Glide-in) StartD (Glide-in) StartD (Glide-in) Condor MM C-app SchedD (Condor C) SchedD (Condor C) SchedD (Condor C) MM

Robust

X86/Linux X86/Windows Sparc/SunOSPowerPC/OSX Downloads per month

Seeking the massive computing power needed to hedge a portion of its book of annuity business, Hartford Life, a subsidiary of The Hartford Financial Services Group (Hartford; $18.7 billion in 2003 revenues), has implemented a grid computing solution based on the University of Wisconsin's (Madison, Wis.) Condor open source software. Hartford Life's SVP and CIO Vittorio Severino notes that the move was a matter of necessity. "It was the necessity to hedge the book," owing in turn to a tight reinsurance market that is driving the need for an alternative risk management strategy, he says. The challenge was to support the risk generated by clients opting for income protection benefit riders on popular annuity products.

Resource: How did you complete this project—on your own or with a vendors help? Severino : We completed this project very much on our own. As a matter of fact it is such a new technology in the insurance industry, that others were calling us for assistance on how to do it. So it was interesting because we were breaking new ground and vendors really couldn’t help us. We eventually chose grid computing software from the University of Wisconsin called Condor; it is open source software. We chose the Condor software because it is one of the oldest grid computing software tools around; so it is mature. We have a tremendous amount of confidence in the Condor software

Condor at Micron 10,000+ processors in 12 “pools” Linux, Solaris, Windows <50 th Top500 Rank 3+ TeraFLOPS Centralized governance Distributed management 16+ applications Self developed Micron’s Global Grid

Condor at Oracle Condor is used within Oracle's Automated Integration Management Environment (AIME) to perform automated build and regression testing of multiple components for Oracle's flagship Database Server product. Each day, nearly 1,000 developers make contributions to the code base of Oracle Database Server. Just the compilation alone of these software modules would take over 11 hours on a capable workstation. But in addition to building, AIME must control repository labelling/tagging, configuration publishing, and last but certainly not least, regression testing. Oracle is very serious about the stability and correctness about their products. Therefore, the AIME daily regression test suite currently covers 90,000 testable items divided into over 700 test packages. The entire process must complete within 12 hours to keep development moving forward. About five years ago, Oracle selected Condor as the resource manager underneath AIME because they liked the maturity of Condor's core components. In total, over 3,500 machines at Oracle are managed by Condor.

Laboratory of Molecular and Computational Genomics University of Wisconsin-Madison Our research laboratory focuses on the chemistry, biology and physics of single DNA molecules as a means of genomic analysis.

Local GLOW CS Grid Laboratory Of Wisconsin (GLOW) 6 disciplines ~1000 CPUs ~80 TB of disk

Session 4: Reports from the Field, Part One Semiconductor Manufacturing (and other stuff) with Condor Boorklin Gore, Micron Technology Risk Modeling with Condor at The HartfordBob Nordlund, The Hartford Large, Fast, and Out of Control: Tuning Condor for Film Production Jason Stowe, C.O.R.E. Feature Animation Optena: Enterprise Condor Surendra Reddy, Optena Corporation Introduction to gridMatrix and Condor Gita Karipineni, Cadence Design Systems Session 5: Reports from the Field, Part Two The Use of Condor in the gLite Grid MiddlewareErwin Laure, EGEE CMS Data Grid, Open Science Grid, and Condor-C Ian Fisk, Fermi National Laboratory Condor Usage at Brookhaven National LabBrookhaven National Laboratory Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio, Fermi National Laboratory Using Condor for Large Scale Data Analysis within the LIGO Scientific Collaboration Duncan Brown, LIGO Using Condor for On-line Data Analysis within the LIGO Scientific Collaboration Kipp Cannon, LIGO

Powerful

Resource Allocation A limited assignment of the “ownership” of a resource  Owner is charged for allocation regardless of actual consumption  Owner can allocate resource to others  Owner has the right and means to revoke an allocation  Allocation is governed by an “agreement” between the client and the owner  Allocation is a “lease”  Tree of allocations

“ We present some principles that we believe should apply in any compute resource management system. The first, P1, speaks to the need to avoid “resource leaks” of all kinds, as might result, for example, from a monitoring system that consumes a nontrivial number of resources. P1 - It must be possible to monitor and control all resources consumed by a CE—whether for “computation” or “management.” Our second principle is a corollary of P1: P2 - A system should incorporate circuit breakers to protect both the compute resource and clients. For example, negotiating with a CE consumes resources. How do we prevent an eager client from turning into a denial of service attack? “ Ian Foster & Miron Livny, " Virtualization and Management of Compute Resources: Principles and Architecture ", A working document (February 2005)

Work Delegation A limited assignment of the responsibility to perform the work  Delegation involved a definition of these “responsibilities”  Responsibilities my be further delegated  Delegation consumes resources  Delegation is a “lease”  Tree of delegations