Grid Computing: Technology and Sociology at Large Scales Douglas Thain University of Notre Dame 5 November 2004.

Slides:



Advertisements
Similar presentations
© University of Reading David Spence 20 April 2014 E-Science Update.
Advertisements

4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.
Beowulf Supercomputer System Lee, Jung won CS843.
SUPERCOMPUTER TO THE RESCUE Justin Curry EKU, Dept. of Technology, CEN/CET)
1–1 MPO699 MANAGING PEOPLE IN ORGANIZATION TOPIC 05 – MANAGING UP.
Research Issues in Cooperative Computing Douglas Thain
Developing an ICT Strategy: A Managers Toolkit Dr Simon N Davey Managing Associate Preponderate.network “Making it easier for you.
Nooks: an architecture for safe device drivers Mike Swift, The Wild and Crazy Guy, Hank Levy and Susan Eggers.
What is Grid Computing? Grid Computing is applying the resources of many computers in a network to a single entity at the same time;  Usually to a scientific.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Cooperative Computing for Data Intensive Science Douglas Thain University of Notre Dame NSF Bridges to Engineering 2020 Conference 12 March 2008.
An Introduction to Grid Computing Research at Notre Dame Prof. Douglas Thain University of Notre Dame
Virtual Machines for HPC Paul Lu, Cam Macdonell Dept of Computing Science.
Performance Evaluation
Virtual Supercomputing Cam Macdonell Dept of Computing Science.
The Condor Data Access Framework GridFTP / NeST Day 31 July 2001 Douglas Thain.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 1: Software and Software Engineering.
Random access memory.
The Atomic Model: Hit and Miss Science Standards Academy 2014.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
CRACKING THE CODING INTERVIEW Nitish Upreti. Nitish
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
Program Design CMSC 201. Motivation We’ve talked a lot about certain ‘good habits’ we’d like you guys to get in while writing code. There are two main.
Hao Wang Computer Sciences Department University of Wisconsin-Madison Security in Condor.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
HCI-631: Software Architectures for User Interface Scott Hudson Office: NSH 3523 Office Hours: Tues 3:00-4:00 (and by appointment)
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
The First in GPON Verification Classic Mistakes Verification Leadership Seminar Racheli Ganot FlexLight Networks.
Supercomputing Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.
Grid MP at ISIS Tom Griffin, ISIS Facility. Introduction About ISIS Why Grid MP? About Grid MP Examples The future.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 1: Software and Software Engineering.
1 Fundamentals of Applying to Grad School Fall 2011 Prof. Krste Asanovic Computer Science UC Berkeley with some slides from Profs. Ras Bodik and Joe Hellerstein.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Turning Software Projects into Production Solutions Dan Fraser, PhD Production Coordinator Open Science Grid OU Supercomputing Symposium October 2009.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
 Actuaries  Computer Programmer  Computer and Information Scientists, Research.
CS591x -Cluster Computing and Parallel Programming
Identify Your System The best way to protect you against computer attack Irvan
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
Company small business cloud solution Client UNIVERSITY OF BEDFORDSHIRE.
Introduction TO Network Administration
Society & Computers PowerPoint
Oxford eScience OxGrid: Virtualisation at Oxford Rhys Newman Manager of Interdisciplinary Grid Development, Oxford University Campus Grid Workshop – Edinburgh.
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
Operating Systems Morrison / WellsCLB: A Comp Guide to IC 3 3E 1 Morrison / Wells.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 1: Software and Software Engineering.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Getting Ready for OHS Open Heart Surgery!. What’s a PC? Personal Computer –These computers were originally designed by IBM in the 70s. Your computer at.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Introduction.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
NASA Earth Exchange (NEX) Earth Science Division/NASA Advanced Supercomputing (NAS) Ames Research Center.
1 Lesson 8 Operating Systems Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
David P. Anderson Space Sciences Laboratory University of California – Berkeley A Million Years of Computing.
Benefits of a Virtual SIL
Network and hardware revision
Fundamentals of Computer Systems
Job Scheduling in a Grid Computing Environment
Popular Operating Systems
US CMS Testbed.
Computer Security for Businesses
Grid Means Business OGF-20, Manchester, May 2007
Semiconductor Manufacturing (and other stuff) with Condor
COP 5611: Operating Systems
CSCI1600: Embedded and Real Time Software
CSCI1600: Embedded and Real Time Software
Software Lesson 3.
Presentation transcript:

Grid Computing: Technology and Sociology at Large Scales Douglas Thain University of Notre Dame 5 November 2004

Computing Needs of Big Science Available Computing Power Computing Research

Computing Power is Everywhere!

The Top 500 Supercomputers 1 - Earth Simulator 1 - Earth Simulator 5120 * NEC SX-6 (35860 GFLOPS) 5120 * NEC SX-6 (35860 GFLOPS) 2 - Thunder 2 - Thunder 4096 * Itanium Tiger (19940 GFLOPS) 4096 * Itanium Tiger (19940 GFLOPS) 3 - ASCI Q 3 - ASCI Q 8192 * Alpha (13880 GFLOPS) 8192 * Alpha (13880 GFLOPS) 4 - IBM BlueGene/L Prototype 4 - IBM BlueGene/L Prototype 8192 * PowerPC (11680 GFLOPS) 8192 * PowerPC (11680 GFLOPS) 5 - NCSA Tungsten 5 - NCSA Tungsten 2400 * Intel Xeon (9819 GFLOPS) 2400 * Intel Xeon (9819 GFLOPS) 445 – Notre Dame BoB 445 – Notre Dame BoB 212 * Intel Xeon 212 * Intel Xeon “Retailer B” “Retailer B” 184 * PowerPC (684 GFLOPS) 184 * PowerPC (684 GFLOPS)

The Bad News Rag-Tag Computers are Hard to Use Rag-Tag Computers are Hard to Use Differing shapes, sizes, reliability. Differing shapes, sizes, reliability. Issues of machine-user trust. Issues of machine-user trust. Have to re-write software to fit. Have to re-write software to fit. Big Supercomputers are Also Hard to Use Big Supercomputers are Also Hard to Use For exactly the same reasons! For exactly the same reasons!

The Grid Ian Foster, University of Chicago: Suppose that big computing facilities were as easy to use as electrical power!

The Grid = Internet + Facilities

Is the Grid Real? THE GRID – not yet. But, many groups fairly claim to have built A GRID for a given purpose.

Grid Computing is not Easy! Security Security Keeping out the bad guys, identifying the good guys. Keeping out the bad guys, identifying the good guys. Performance Performance A problem of mapping the right jobs to the right resources. A problem of mapping the right jobs to the right resources. Reliability Reliability The Internet is not known for its 24/7 reliability. The Internet is not known for its 24/7 reliability. Accountability Accountability You used 100 hours of compute time at $1000/hour! You used 100 hours of compute time at $1000/hour! Debugging Debugging Who is to blame when a program crashes? Who is to blame when a program crashes? Social Effects Social Effects At large scales, computers have human problems! At large scales, computers have human problems!

Seti

Users5,233,380 Results received1,622,392,472 Total CPU time2,113,893 years Performance68520 GFLOPS/s

The Social Issues As a scientist, can you trust a random user? As a scientist, can you trust a random user? So, you must duplicate work units. So, you must duplicate work units. What is the motivation to participate? What is the motivation to participate? Fame! (Not Fortune) Fame! (Not Fortune) How do users maximize their enjoyment? How do users maximize their enjoyment? Get on the leader board in any way possible! Get on the leader board in any way possible! Virus that changes the identity of the sender. Virus that changes the identity of the sender. Hack the code to run faster. (Ollie,Microsoft) Hack the code to run faster. (Ollie,Microsoft) Name Results received CPU timetime/work unit 1)The Ministry of Serendipity years 5 hr 52 min 55.8 sec 2) Sneezy years 5 hr 40 min 06.0 sec 3) Pigalak years 7 hr 13 min 29.1 sec

Auditing of Results Work Unit First, I checked Galaxy 1, and it only rated a 5. Then, I checked Galaxy 2, and it rated a 10, so I did the more detailed examination of the lower quadrant, but there was no signal there. No aliens here.

What if you are doing good science, but it doesn’t have a glamorous story?

AMANDA A “Time Telescope” A “Time Telescope” Distant Cosmic Sources Distant Cosmic Sources Neutrinos Travel Far Neutrinos Travel Far Neutrino+Earth = Muon Neutrino+Earth = Muon Detector in Ice Detector in Ice

Independent Simulation

How do you calibrate a new measuring device?

The Answer: Simulate! x=123 y=456 x=123 y=457 x=223 y=450 x=305 y=904 x=123 y=456

I need some Windows machines in order to do my senior thesis! I need a LOT of small machines for AMANDA. I need TEN Linux machines for one week. Anyone can use these machines, but ND users have priority These machines can only be used at night by only Jane and Betty. Match Maker

Condor 50,000 CPUs 1000 sites

Social Concerns The Owner is BOSS! The Owner is BOSS! Solution: Submit lots of independent jobs. Solution: Submit lots of independent jobs. Solution: Save your work at short intervals. Solution: Save your work at short intervals. Users compete for popular machines. Users compete for popular machines. Solution: Program for less common machines. Solution: Program for less common machines. Unusual Requests may be Rejected! Unusual Requests may be Rejected! “I need a large, fast, machine that is available for one full year and isn’t in the Western hemisphere...” “I need a large, fast, machine that is available for one full year and isn’t in the Western hemisphere...”

A Fundamental Problem of Grid Computing: Why Don’t You Love Me?

But There is More! Summary so far: Summary so far: The Grid: Computing Power on Demand The Grid: Computing Power on Demand Big Science has Big Computing Needs. Big Science has Big Computing Needs. Key Problems are Social Interaction Key Problems are Social Interaction But there is more: But there is more: The Grid: Bringing people and equipment together. The Grid: Bringing people and equipment together. The Grid: Bringing lots of people together! The Grid: Bringing lots of people together!

NEESGrid An Earth-Shaking Grid Application Simulation of earthquakes: Simulation of earthquakes: Flexible, repeatable, cheap. Flexible, repeatable, cheap. Accurate at large scales. Accurate at large scales. Inaccurate for small objects. Inaccurate for small objects. Physical emulation of earthquakes: Physical emulation of earthquakes: Fixed, one-time, expensive. Fixed, one-time, expensive. Perfectly reproduce small items. Perfectly reproduce small items.

Modeling a Single Door! + +

Coordinator Interface Modeling a Single Door!

The Access Grid Experience

Take Home Message Grid Computing is... Grid Computing is......harnessing many computers in order to attack scientific problems of enormous scale....harnessing many computers in order to attack scientific problems of enormous scale....bringing large numbers of people and resources together over long distances....bringing large numbers of people and resources together over long distances. The Hardest Problem: The Hardest Problem: As computing systems grow to larger, social issues become more important than technical problems. As computing systems grow to larger, social issues become more important than technical problems.