Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Computing: Technology and Sociology at Large Scales Douglas Thain University of Notre Dame 5 November 2004.

Similar presentations


Presentation on theme: "Grid Computing: Technology and Sociology at Large Scales Douglas Thain University of Notre Dame 5 November 2004."— Presentation transcript:

1 Grid Computing: Technology and Sociology at Large Scales Douglas Thain University of Notre Dame 5 November 2004

2 Computing Needs of Big Science Available Computing Power Computing Research

3 Computing Power is Everywhere!

4 The Top 500 Supercomputers 1 - Earth Simulator 1 - Earth Simulator 5120 * NEC SX-6 (35860 GFLOPS) 5120 * NEC SX-6 (35860 GFLOPS) 2 - Thunder 2 - Thunder 4096 * Itanium Tiger (19940 GFLOPS) 4096 * Itanium Tiger (19940 GFLOPS) 3 - ASCI Q 3 - ASCI Q 8192 * Alpha (13880 GFLOPS) 8192 * Alpha (13880 GFLOPS) 4 - IBM BlueGene/L Prototype 4 - IBM BlueGene/L Prototype 8192 * PowerPC (11680 GFLOPS) 8192 * PowerPC (11680 GFLOPS) 5 - NCSA Tungsten 5 - NCSA Tungsten 2400 * Intel Xeon (9819 GFLOPS) 2400 * Intel Xeon (9819 GFLOPS) 445 – Notre Dame BoB 445 – Notre Dame BoB 212 * Intel Xeon 212 * Intel Xeon 500 - “Retailer B” 500 - “Retailer B” 184 * PowerPC (684 GFLOPS) 184 * PowerPC (684 GFLOPS) http://www.top500.org

5 The Bad News Rag-Tag Computers are Hard to Use Rag-Tag Computers are Hard to Use Differing shapes, sizes, reliability. Differing shapes, sizes, reliability. Issues of machine-user trust. Issues of machine-user trust. Have to re-write software to fit. Have to re-write software to fit. Big Supercomputers are Also Hard to Use Big Supercomputers are Also Hard to Use For exactly the same reasons! For exactly the same reasons!

6 The Grid Ian Foster, University of Chicago: Suppose that big computing facilities were as easy to use as electrical power! http://www.globus.org

7 The Grid = Internet + Facilities

8 Is the Grid Real? THE GRID – not yet. But, many groups fairly claim to have built A GRID for a given purpose.

9 Grid Computing is not Easy! Security Security Keeping out the bad guys, identifying the good guys. Keeping out the bad guys, identifying the good guys. Performance Performance A problem of mapping the right jobs to the right resources. A problem of mapping the right jobs to the right resources. Reliability Reliability The Internet is not known for its 24/7 reliability. The Internet is not known for its 24/7 reliability. Accountability Accountability You used 100 hours of compute time at $1000/hour! You used 100 hours of compute time at $1000/hour! Debugging Debugging Who is to blame when a program crashes? Who is to blame when a program crashes? Social Effects Social Effects At large scales, computers have human problems! At large scales, computers have human problems!

10 Seti http://setiathome.ssl.berkeley.edu/

11 SETI@Home Users5,233,380 Results received1,622,392,472 Total CPU time2,113,893 years Performance68520 GFLOPS/s

12 The Social Issues As a scientist, can you trust a random user? As a scientist, can you trust a random user? So, you must duplicate work units. So, you must duplicate work units. What is the motivation to participate? What is the motivation to participate? Fame! (Not Fortune) Fame! (Not Fortune) How do users maximize their enjoyment? How do users maximize their enjoyment? Get on the leader board in any way possible! Get on the leader board in any way possible! Virus that changes the identity of the sender. Virus that changes the identity of the sender. Hack the code to run faster. (Ollie,Microsoft) Hack the code to run faster. (Ollie,Microsoft) Name Results received CPU timetime/work unit 1)The Ministry of Serendipity 64442224327 years 5 hr 52 min 55.8 sec 2) Sneezy41643902694 years 5 hr 40 min 06.0 sec 3) Pigalak 31829802625 years 7 hr 13 min 29.1 sec

13 Auditing of Results Work Unit First, I checked Galaxy 1, and it only rated a 5. Then, I checked Galaxy 2, and it rated a 10, so I did the more detailed examination of the lower quadrant, but there was no signal there. No aliens here.

14 What if you are doing good science, but it doesn’t have a glamorous story?

15 AMANDA A “Time Telescope” A “Time Telescope” Distant Cosmic Sources Distant Cosmic Sources Neutrinos Travel Far Neutrinos Travel Far Neutrino+Earth = Muon Neutrino+Earth = Muon Detector in Ice Detector in Ice http://amanda.berkeley.edu

16 Independent Simulation

17 How do you calibrate a new measuring device?

18 The Answer: Simulate! x=123 y=456 x=123 y=457 x=223 y=450 x=305 y=904 x=123 y=456

19 http://www.cs.wisc.edu/condor I need some Windows machines in order to do my senior thesis! I need a LOT of small machines for AMANDA. I need TEN Linux machines for one week. Anyone can use these machines, but ND users have priority These machines can only be used at night by only Jane and Betty. Match Maker

20 Condor 50,000 CPUs 1000 sites

21 Social Concerns The Owner is BOSS! The Owner is BOSS! Solution: Submit lots of independent jobs. Solution: Submit lots of independent jobs. Solution: Save your work at short intervals. Solution: Save your work at short intervals. Users compete for popular machines. Users compete for popular machines. Solution: Program for less common machines. Solution: Program for less common machines. Unusual Requests may be Rejected! Unusual Requests may be Rejected! “I need a large, fast, machine that is available for one full year and isn’t in the Western hemisphere...” “I need a large, fast, machine that is available for one full year and isn’t in the Western hemisphere...”

22 A Fundamental Problem of Grid Computing: Why Don’t You Love Me?

23 But There is More! Summary so far: Summary so far: The Grid: Computing Power on Demand The Grid: Computing Power on Demand Big Science has Big Computing Needs. Big Science has Big Computing Needs. Key Problems are Social Interaction Key Problems are Social Interaction But there is more: But there is more: The Grid: Bringing people and equipment together. The Grid: Bringing people and equipment together. The Grid: Bringing lots of people together! The Grid: Bringing lots of people together!

24 NEESGrid An Earth-Shaking Grid Application Simulation of earthquakes: Simulation of earthquakes: Flexible, repeatable, cheap. Flexible, repeatable, cheap. Accurate at large scales. Accurate at large scales. Inaccurate for small objects. Inaccurate for small objects. Physical emulation of earthquakes: Physical emulation of earthquakes: Fixed, one-time, expensive. Fixed, one-time, expensive. Perfectly reproduce small items. Perfectly reproduce small items. http://www.neesgrid.org

25 Modeling a Single Door! + +

26 Coordinator Interface Modeling a Single Door!

27 http://www.accessgrid.org

28

29 The Access Grid Experience

30 Take Home Message Grid Computing is... Grid Computing is......harnessing many computers in order to attack scientific problems of enormous scale....harnessing many computers in order to attack scientific problems of enormous scale....bringing large numbers of people and resources together over long distances....bringing large numbers of people and resources together over long distances. The Hardest Problem: The Hardest Problem: As computing systems grow to larger, social issues become more important than technical problems. As computing systems grow to larger, social issues become more important than technical problems.


Download ppt "Grid Computing: Technology and Sociology at Large Scales Douglas Thain University of Notre Dame 5 November 2004."

Similar presentations


Ads by Google