Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large-Scale Computing with Grids Jeff Templon KVI Seminar 17 February 2004.

Similar presentations


Presentation on theme: "Large-Scale Computing with Grids Jeff Templon KVI Seminar 17 February 2004."— Presentation transcript:

1 Large-Scale Computing with Grids Jeff Templon KVI Seminar 17 February 2004

2 Jeff Templon – Seminar, KVI, 2004.02.17 - 2 Information I intend to transfer u Why are Grids interesting? Grids are solutions so I will spend some time talking about the problem and show how Grids are relevant. Solutions should solve a problem. u What are Computational Grids? u How are Grids being used, in HEP as well as in other fields? u What are some examples of current Grid “research topics”?

3 Jeff Templon – Seminar, KVI, 2004.02.17 - 3 Our Problem  Place event info on 3D map  Trace trajectories through hits  Assign type to each track  Find particles you want  Needle in a haystack!  This is “relatively easy” case

4 Jeff Templon – Seminar, KVI, 2004.02.17 - 4 More complex example

5 interactive physics analysis batch physics analysis batch physics analysis detector event summary data raw data event reprocessing event reprocessing event simulation event simulation analysis objects (extracted by physics topic) Data Handling and Computation for Physics Analysis event filter (selection & reconstruction) event filter (selection & reconstruction) processed data

6 Jeff Templon – Seminar, KVI, 2004.02.17 - 6 Computational Aspects u To reconstruct and analyze 1 event takes about 90 seconds u Maybe only a few out of a million are interesting. But we have to check them all! u Analysis program needs lots of calibration; determined from inspecting results of first pass. u  Each event will be analyzed several times!

7 Jeff Templon – Seminar, KVI, 2004.02.17 - 7 online system multi-level trigger filter out background reduce data volume level 1 - special hardware 40 MHz (40 TB/sec) level 2 - embedded processors level 3 - PCs 75 KHz (75 GB/sec) 5 KHz (5 GB/sec) 100 Hz (100 MB/sec) data recording & offline analysis One of the four LHC detectors

8 Jeff Templon – Seminar, KVI, 2004.02.17 - 8 Computational Implications (2) u 90 seconds per event to reconstruct and analyze u 100 incoming events per second u To keep up, need either: n A computer that is nine thousand times faster, or n nine thousand computers working together u Moore’s Law: wait 20 years and computers will be 9000 times faster (we need them in 2007!)

9 Jeff Templon – Seminar, KVI, 2004.02.17 - 9 More Computational Issues u Four LHC experiments – roughly 36k CPUs needed u BUT: accelerator not always “on” – need fewer u BUT: multiple passes per event – need more! u BUT: haven’t accounted for Monte Carlo production – more!! u AND: haven’t addressed the needs of “physics users” at all!

10 Jeff Templon – Seminar, KVI, 2004.02.17 - 10 Large Dynamic Range Computing u Clear that we have enough work to keep somewhere between 50 and 100 k CPUs busy u Most of the activities are “bursty”, so doing them with dedicated facilities is inefficient u Need some meta-facility that can serve all the activities n Reconstruction & reprocessing n Analysis n Monte-Carlo n All four LHC experiments n All the LHC users

11 Jeff Templon – Seminar, KVI, 2004.02.17 - 11 LHC User Distribution Putting all computers in one spot leads to traffic jams Which spot is willing to pay for & maintain 100k CPUs?

12 Jeff Templon – Seminar, KVI, 2004.02.17 - 12 Classic Motivation for Grids u Large Scales: 50k CPUs, petabytes of data n (if we’re only talking ten machines, who cares?) u Large Dynamic Range: bursty usage patterns n Why buy 25k CPUs if 60% of the time you only need 900 CPUs? u Multiple user groups on single system n Can’t “hard-wire” the system for your purposes u Wide-area access requirements n Users not in same lab or even continent

13 Jeff Templon – Seminar, KVI, 2004.02.17 - 13 Solution using Grids u Large Scales: 50k CPUs, petabytes of data n Assemble 50k+ CPUs and petabytes of mass storage n Don’t need to be in the same place! u Large Dynamic Range: bursty usage patterns n When you need less than you have, others use excess capacity n When you need more, use others’ excess capacities u Multiple user groups on single system n “Generic” grid software services (think web server here) u Wide-area access requirements n Public Key Infrastructure for authentication & authorization

14 How do Grids Work? Public Key Infrastructure & Generic Grid Software Services

15 Jeff Templon – Seminar, KVI, 2004.02.17 - 15 Public Key Infrastructure – Single Login u Based on asymmetric cryptography: public/private key pairs u Need network of trusted Certificate Authorities (e.g. Verisign) u CAs issue certificates to users, computers, or services running on computers (typically valid for two years) u Certificates carry n an “identity” /O=dutchgrid/O=users/O=kvi/CN=Nasser Kalantar n a public key n identity of the issuing CA n a digital signature (transformation on cert text using CA private key) u Need to tighten this part up, takes too long

16 What is a Grid? (what are these Generic Grid Software Services?)

17 Jeff Templon – Seminar, KVI, 2004.02.17 - 17 C = DS = = Grid software service (like http server) Information System C C C C C C C Information System is Central Nervous System of Grid Info system defines grid

18 Jeff Templon – Seminar, KVI, 2004.02.17 - 18 C = DS = = Grid software service I.S. C C C C C C C DS D.M.S Data Grid

19 Jeff Templon – Seminar, KVI, 2004.02.17 - 19 C = DS = = Grid software service I.S. C C C C C C C DS D.M.S Computing Task Submission W.M.S. proxy + command; (data); Get fresh, detailed info Coarse Requirements Candidate Clusters

20 Jeff Templon – Seminar, KVI, 2004.02.17 - 20 DS List of best locations C = DS = = Grid software service I.S. C D.M.S Computing Task Execution W.M.S. proxy + command; (data); logger Where is my data? proxy Find DMS

21 Jeff Templon – Seminar, KVI, 2004.02.17 - 21 DS C = = = Grid software service I.S. C D.M.S Computing Task Execution W.M.S. logger How to contact O.D.S.? Where do I put the data? proxy + data Register output Done

22 Jeff Templon – Seminar, KVI, 2004.02.17 - 22 Can do a bit more with data…

23 Jeff Templon – Seminar, KVI, 2004.02.17 - 23 C = DS = = Grid software service I.S. C C C C C C C DS D.M.S Computing Task Submission with data W.M.S. proxy + command; (data file spec); Get fresh, detailed info Coarse Requirements Candidate Clusters Find copies of data

24 Jeff Templon – Seminar, KVI, 2004.02.17 - 24 DS List of best locations C = DS = = Grid software service I.S. C D.M.S Computing Task Execution W.M.S. proxy + command; (data file spec); logger Where is my data? Find DMS Local access

25 Jeff Templon – Seminar, KVI, 2004.02.17 - 25 Grids are working now u See it in action See it in action

26 Jeff Templon – Seminar, KVI, 2004.02.17 - 26 Applications D0 Run II Data Reprocessing Biomedical Image Comparison Ozone Profile Computation, Storage, & Access

27 Jeff Templon – Seminar, KVI, 2004.02.17 - 27 Grid Research Topics u “push” vs. “pull” model of operation (you just saw “push”) n We know how to do fine-grained priority & access control in “push” n We think “pull” is more robust operationally u Software distribution n HEP has BIG programs with many versions … “pre” vs. “zero” install u Data distribution n Similar issues (software is just executable data!) n A definite bottleneck in D0 reprocessing s (1 Gbit/100/2/2=modem!) s Speedup limited to 92% of theoretical s About 1 hr CPU lost per job u Optimization in general is hard, and is it worth it????

28 Jeff Templon – Seminar, KVI, 2004.02.17 - 28

29 Jeff Templon – Seminar, KVI, 2004.02.17 - 29 Didn’t talk about u Grid projects n European Data Grid – ends this month, 10 M€ / 3y n VL-E – ongoing, 20 M€ / 5y VL-E n EGEE – starts in April, 30 M€ / 2y u Gridifying HEP software & computing activities u Transition from supercomputing to Grid computing (funding) u Integration with other technologies n Submit, monitor, control computing work from your PDA or mobile!

30 Jeff Templon – Seminar, KVI, 2004.02.17 - 30 Conclusions u Grids are working now for workload management in HEP u Big developments in next two years with large-scale deployments of LCG & EGEE projects u Seems Grids are catching on (IBM / Globus alliance) u Lots of interesting stuff to try out!!!

31 Jeff Templon – Seminar, KVI, 2004.02.17 - 31 What’s There Now? u Job Submission n Marriage of Globus, Condor-G, EDG Workload Manager n Latest version reliable at 99% level u Information System n New System (R-GMA) – very good information model, implementation still evolving n Old System (MDS) – poor information model, poor architecture, but hacks allow 99% “uptime” u Data Management n Replica Location Service – convenient and powerful system for locating and manipulating distributed data; mostly still user-driven (no heuristics) u Data Storage n “Bare gridFTP server” – reliable but mostly suited to disk-only mass store n SRM – no mature implementations

32 Jeff Templon – Seminar, KVI, 2004.02.17 - 32 Grids are Information Information System Information System A-Grid B-Grid


Download ppt "Large-Scale Computing with Grids Jeff Templon KVI Seminar 17 February 2004."

Similar presentations


Ads by Google