An Introduction to The Grid Mike Wilde Mathematics and Computer Science Division Argonne National Laboratory Oak Park River Forest High School - 2002.0522.

Slides:



Advertisements
Similar presentations
International Grid Communities Dr. Carl Kesselman Information Sciences Institute University of Southern California.
Advertisements

The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
Introduction to Grid Computing Slides adapted from Midwest Grid School Workshop 2008 (
High Performance Computing Course Notes Grid Computing.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
GriPhyN & iVDGL Architectural Issues GGF5 BOF Data Intensive Applications Common Architectural Issues and Drivers Edinburgh, 23 July 2002 Mike Wilde Argonne.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
The Globus Toolkit™: and its application to GryPhyN Carl Kesselman Director of the Center for Grid Technologies Information Sciences Institute University.
Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Other servers Java client, ROOT (analysis tool), IGUANA (CMS viz. tool), ROOT-CAVES client (analysis sharing tool), … any app that can make XML-RPC/SOAP.
Vladimir Litvin, Harvey Newman, Sergey Schevchenko Caltech CMS Scott Koranda, Bruce Loftis, John Towns NCSA Miron Livny, Peter Couvares, Todd Tannenbaum,
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Grid and e-Science Technologies Simon Cox Technical Director Southampton Regional e-Science Centre.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Workload Management Massimo Sgaravatto INFN Padova.
Grids and Globus at BNL Presented by John Scott Leita.
Simo Niskala Teemu Pasanen
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.
The Grid as Infrastructure and Application Enabler Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Vladimir Litvin, Harvey Newman Caltech CMS Scott Koranda, Bruce Loftis, John Towns NCSA Miron Livny, Peter Couvares, Todd Tannenbaum, Jamie Frey Wisconsin.
Introduction to Grid Computing Ann Chervenak and Ewa Deelman USC Information Sciences Institute.
Peer to Peer & Grid Computing Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University.
Grid Security Steve Tuecke Argonne National Laboratory.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Virtual Data Tools Status Update ATLAS Grid Software Meeting BNL, 6 May 2002 Mike Wilde Argonne National Laboratory An update on work by Jens Voeckler,
Grid Computing - AAU 14/ Grid Computing Josva Kleist Danish Center for Grid Computing
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
File and Object Replication in Data Grids Chin-Yi Tsai.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
GriPhyN Status and Project Plan Mike Wilde Mathematics and Computer Science Division Argonne National Laboratory.
Development Timelines Ken Kennedy Andrew Chien Keith Cooper Ian Foster John Mellor-Curmmey Dan Reed.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
Major Grid Computing Initatives Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Grid Physics Network & Intl Virtual Data Grid Lab Ian Foster* For the GriPhyN & iVDGL Projects SCI PI Meeting February 18-20, 2004 *Argonne, U.Chicago,
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Perspectives on Grid Technology Ian Foster Argonne National Laboratory The University of Chicago.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Authors: Ronnie Julio Cole David
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
High Energy Physics and Grids at UF (Dec. 13, 2002)Paul Avery1 University of Florida High Energy Physics.
US CMS Centers & Grids – Taiwan GDB Meeting1 Introduction l US CMS is positioning itself to be able to learn, prototype and develop while providing.
29/1/2002A.Ghiselli, INFN-CNAF1 DataTAG / WP4 meeting Cern, 29 January 2002 Agenda  start at  Project introduction, Olivier Martin  WP4 introduction,
Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job.
Virtual Data Management for CMS Simulation Production A GriPhyN Prototype.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Middleware and the Grid Steven Tuecke Mathematics and Computer Science Division Argonne National Laboratory.
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
] Open Science Grid Ben Clifford University of Chicago
The Data Grid: Towards an architecture for Distributed Management
Peter Kacsuk – Sipos Gergely MTA SZTAKI
CS258 Spring 2002 Mark Whitney and Yitao Duan
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Presentation transcript:

An Introduction to The Grid Mike Wilde Mathematics and Computer Science Division Argonne National Laboratory Oak Park River Forest High School

2 Topics l Grids in a nutshell –What are Grids –Why are we building Grids? l What Grids are made of l The Globus Project and Toolkit l How Grids are helping (big) Science

3 A Grid to Share Computing Resources

4 Authenticate once Submit a grid computation (code, resources, data,…) Locate resources Negotiate authorization, acceptable use, etc. Select and acquire resources Initiate data transfers, computation Monitor progress Steer computation Store and distribute results Account for usage Grid Applications

5 Natural Science drives Computer Science

6 Scientists write software to probe the nature of the universe

7 Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents Image courtesy Harvey Newman, Caltech

8 The Grid l Emerging computational and networking infrastructure –Pervasive, uniform, and reliable access to remote data, computational, sensor, and human resources l Enable new approaches to applications and problem solving –Remote resources the rule, not the exception l Challenges –Many different computers and operating systems –Failures are common – something is always broken –Different organizations have different rules for security and computer usage

9 Motivation Sharing the computing power of multiple organizations to help virtual organizations solve big problems

Elements of the Problem l Resource sharing –Computers, storage, sensors, networks, … –Sharing always conditional: issues of trust, policy, negotiation, payment, … l Coordinated problem solving –Beyond client-server: distributed data analysis, computation, collaboration, … l Dynamic, multi-institutional virtual orgs –Community overlays on classic org structures –Large or small, static or dynamic

Size of the problem l Terflops of compute power –Equal to n,000 1GHz Pentiums l Petabytes of data per year per experiment –1 PB = 25, GB Disks l 40 Gb/sec of network bandwidth – Mb/sec LAN cables (streched across the country and the Atlantic)

Program B IP network IP network Sockets – the basic building block Program A send recv send

Server: Web Server IP network IP network Services are built on Sockets Client: Web Browser send recv send Protocol: http

Server: Web Server IP network IP network Client-Server Model recv send Protocol: http Client: Web Browser send recv Client: Web Browser send recv Client: Web Browser send recv Client: Web Browser send recv Client: Web Browser send recv Client: Web Browser send recv Client: Web Browser send recv Client: Web Browser send recv Client: Web Browser send recv Client: Web Browser send recv Client: Web Browser send recv

Familiar Client-Server Apps l –Protocols: POP, SMTP l File Copying –Protocol: FTP l Logging in to remote computers –Protocol: Telnet

IP network IP network Peer-to-Peer Model Protocol: gnutella limewire send recv limewire send recv limewire send recv limewire send recv limewire send recv limewire send recv

Familiar Peer-to-Peer Apps l File (music) Sharing –Protocols: Napster, Gnutella l Chat (sort of) –Protocols: IRC, Instant Messenger l Video Conferencing –Protocols: H323

The Globus Project and The Globus Toolkit

The Globus Toolkit: Four Main Components l Grid Security Infrastructure –A trustable digital ID for every user and computer l Information Services –Find are all the computers and file servers I can use l Resource Management –Select computers and run programs on them l Data Management –Fast and secure data transfer (parallel) –Making and tracking replicas (copies) of files l …plus Common Software Infrastructure –Libraries for writing Grid software applications

Running Programs on the Grid Globus Security Infrastructure Job Manager GRAM client API calls to request resource allocation and process creation. MDS client API calls to locate resources Query current status of resource Create RSL Library Parse Request Allocate & create processes Process Monitor & control Site boundary ClientMDS: Grid Index Info Server Gatekeeper MDS: Grid Resource Info Server Local Resource Manager MDS client API calls to get resource info GRAM client API state change callbacks

The Grid Information Problem l Large numbers of distributed “sensors” with different properties l Need for different “views” of this information, depending on community membership, security constraints, intended purpose, sensor type

Grid Information Service OS

GridFTP Ubiquitous, Secure, High Performance Data Access Protocol l Common transfer protocol –all systems can exchange files with each other l VERY Fast –Send files faster than 1 Gigabit per second l Secure –Makes important data hard to damage or intercept l Applications can tailor it to their needs –Building in security or “on the fly” processing l Interfaces to many storage systems –Disk Farms, Tape Robots

Striped GridFTP Server Parallel File System (e.g. PVFS, PFS, etc.) MPI-IO … Plug-in Control GridFTP Server Parallel Backend GridFTP server master mpirun GridFTP client Plug-in Control Plug-in Control Plug-in Control … MPI (Comm_World) MPI (Sub-Comm) To Client or Another Striped GridFTP Server Control socket GridFTP Control ChannelGridFTP Data Channels

Striped GridFTP Application: Video Server

Replica Catalog Structure Logical File Parent Logical File Jan 1998 Logical Collection C02 measurements 1998 Replica Catalog Location jupiter.isi.edu Location sprite.llnl.gov Logical File Feb 1998 Size: Filename: Jan 1998 Filename: Feb 1998 … Filename: Mar 1998 Filename: Jun 1998 Filename: Oct 1998 Protocol: GridFTP UrlConstructor: GridFTP://jupiter.isi.edu/ nfs/v6/climate Filename: Jan 1998 … Filename: Dec 1998 Protocol: ftp UrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi Logical Collection C02 measurements 1999

Programming with Globus l UNIX based – Windows coming soon –Used by rest of Globus Toolkit –User can use for portability & convenience –Windows, UNIX, and Macintosh computers can all join the Grid –Portable programming very important l Event Driving Programming –A way of writing programs that handle many things at once l Parallel Programs –Wiriting programs that can utilize many computers to solve a single problem –MPI – A popular Message Passing Interface developed at Argonne and other laboratories

Grids and Applications

Hunting for Gravity Waves

Grid Communities and Applications: Network for Earthquake Eng. Simulation l NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other l On-demand access to experiments, data streams, computing, archives, collaboration NEESgrid: Argonne, Michigan, NCSA, UIUC, USC

The 13.6 TF TeraGrid: Computing at 40 Gb/s HPSS 5 UniTree External Networks Site Resources NCSA/PACI 8 TF 240 TB SDSC 4.1 TF 225 TB CaltechArgonne TeraGrid/DTF: NCSA, SDSC, Caltech, Argonne

iVDGL Map Circa Tier0/1 facility Tier2 facility 10+ Gbps link 2.5 Gbps link 622 Mbps link Other link Tier3 facility

Whats it like to Work on the Grid? l A fascinating problem on the frontiers of computer science l Work with people from around the world and many branches of science l Local Labs and Universities at the forefront –Argonne, Fermilab –Illinois (UIC and UIUC), U of Chicago, Northwestern –Wisconsin also very active!

Access Grid l Collaborative work among large groups l ~50 sites worldwide l Use Grid services for discovery, security l See also Ambient mic (tabletop) Presenter mic Presenter camera Audience camera Access Grid: Argonne, others

Come Visit and Explore l Argonne and Fermilab are right in our own backyard! –Visits –Summer programs

Supplementary Material

Executor Example: Condor DAGMan l Directed Acyclic Graph Manager l Specify the dependencies between Condor jobs using DAG data structure l Manage dependencies automatically –(e.g., “Don’t run job “B” until job “A” has completed successfully.”) l Each job is a “node” in DAG l Any number of parent or children nodes l No loops Job A Job BJob C Job D Slide courtesy Miron Livny, U. Wisconsin

Executor Example: Condor DAGMan (Cont.) l DAGMan acts as a “meta-scheduler” –holds & submits jobs to the Condor queue at the appropriate times based on DAG dependencies l If a job fails, DAGMan continues until it can no longer make progress and then creates a “rescue” file with the current state of the DAG –When failed job is ready to be re-run, the rescue file is used to restore the prior state of the DAG DAGMan Condor Job Queue C D B C B A Slide courtesy Miron Livny, U. Wisconsin

Virtual Data in CMS Virtual Data Long Term Vision of CMS: CMS Note 2001/047, GRIPHYN

CMS Data Analysis 100b 200b 5K 7K 100K 50K 300K 100K 50K 100K 200K 100K 100b 200b 5K 7K 100K 50K 300K 100K 50K 100K 200K 100K Tag 2 Jet finder 2 Jet finder 1 Reconstruction Algorithm Tag 1 Calibration data Raw data (simulated or real) Reconstructed data (produced by physics analysis jobs) Event 1 Event 2Event 3 Uploaded dataVirtual dataAlgorithms Dominant use of Virtual Data in the Future

Data: 0.5 MB 175 MB 275 MB 105 MB SC2001 Demo Version: pythia cmsim writeHits writeDigis 1 run = 500 events 1 run 1 event CPU: 2 min 8 hours 5 min 45 min truth.ntpl hits.fz hits.DB digis.DB Production Pipeline GriphyN-CMS Demo

GriPhyN: Virtual Data Tracking Complex Dependencies l Dependency graph is: –Files: 8 < (1,3,4,5,7), 7 < 6, (3,4,5,6) < 2 –Programs: 8 < psearch, 7 < summarize, (3,4,5) < reformat, 6 < conv, (1,2) < simulate simulate – t 10 … file1 file2 reformat – f fz … file1 File3,4,5 psearch – t 10 … conv – I esd – o aod file6 summarize – t 10 … file7 file8 Requested file

Re-creating Virtual Data l To recreate file 8: Step 1 –simulate > file1, file2 simulate – t 10 … file1 file2 reformat – f fz … file1 File3,4,5 psearch – t 10 … conv – I esd – o aod file6 summarize – t 10 … file7 file8 Requested file

Re-creating Virtual Data l To re-create file8: Step 2 –files 3, 4, 5, 6 derived from file 2 –reformat > file3, file4, file5 –conv > file 6 simulate – t 10 … file1 file2 reformat – f fz … file1 File3,4,5 psearch – t 10 … conv – I esd – o aod file6 summarize – t 10 … file7 file8 Requested file

Re-creating Virtual Data l To re-create file 8: step 3 –File 7 depends on file 6 –Summarize > file 7 simulate – t 10 … file1 file2 reformat – f fz … file1 File3,4,5 psearch – t 10 … conv – I esd – o aod file6 summarize – t 10 … file7 file8 Requested file

Re-creating Virtual Data l To re-create file 8: final step –File 8 depends on files 1, 3, 4, 5, 7 –psearch file 8 simulate – t 10 … file1 file2 psearch – t 10 … reformat – f fz … conv – I esd – o aod file1 File3,4,5 file6 summarize – t 10 … file7 file8 Requested file

Virtual Data Catalog Conceptual Data Structure TRANSFORMATION /bin/physapp1 version 1.2.3b(2) created on 12 Oct 1998 owned by physbld.orca DERIVATION ^ paramlist ^ transformation FILE LFN=filename1 PFN1=/store1/ PFN2=/store9/ PFN3=/store4/ ^derivation FILE LFN=filename2 PFN1=/store1/ PFN2=/store9/ ^derivation PARAMETER LIST PARAMETER i filename1 PARAMETER O filename2 PARAMETER E PTYPE=muon PARAMETER p -g

pythia_input pythia.exe cmsim_input cmsim.exe writeHits writeDigis begin v /usr/local/demo/scripts/cmkin_input.csh file i ntpl_file_path file i template_file file i num_events stdout cmkin_param_file end begin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms121.exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_file end begin v /usr/local/demo/scripts/cmsim_input.csh file i ntpl_file file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_file end begin v /usr/local/demo/binaries/cms121.exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file file o hbook_file end begin v /usr/local/demo/binaries/writeHits.sh condor getenv=true pre orca_hits file i fz_file file i detinput file i condor_writeHits_log file i oo_fd_boot file i datasetname stdout writeHits_log file o hits_db end begin v /usr/local/demo/binaries/writeDigis.sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_writeDigis_log stdout writeDigis_log file o digis_db end CMS Pipeline in VDL

Virtual Data for Real Science: A Prototype Virtual Data Catalog Virtual Data Catalog (PostgreSQL) Local File Storage Virtual Data Language VDL Interpreter (VDLI) GSI Job Execution Site U of Chicago GridFTP Client Globus GRAM CondorPool Job Execution Site U of Wisconsin GridFTP Client Globus GRAM CondorPool Job Execution Site U of Florida GridFTP Client Globus GRAM CondorPool Job Sumission Sites ANL, SC, … Condor-G Agent Globus Client GridFTP Server Grid testbed Simulate Physics Simulate CMS Detector Response Copy flat-file to OODBMS Simulate Digitization of Electronic Signals Production DAG of Simulated CMS Data: Architecture of the System:

NCSA Linux cluster 5) Secondary reports complete to master Master Condor job running at Caltech 7) GridFTP fetches data from UniTree NCSA UniTree - GridFTP- enabled FTP server 4) 100 data files transferred via GridFTP, ~ 1 GB each Secondary Condor job on WI pool 3) 100 Monte Carlo jobs on Wisconsin Condor pool 2) Launch secondary job on WI pool; input files via Globus GASS Caltech workstation 6) Master starts reconstruction jobs via Globus jobmanager on cluster 8) Processed objectivity database stored to UniTree 9) Reconstruction job reports complete to master Early GriPhyN Challenge Problem: CMS Data Reconstruction Scott Koranda, Miron Livny, others

GriPhyN-LIGO SC2001 Demo

GriPhyN CMS SC2001 Demo         Full Event Database of ~100,000 large objects Full Event Database of ~40,000 large objects “Tag” database of ~140,000 small objects Request Parallel tuned GSI FTP Bandwidth Greedy Grid-enabled Object Collection Analysis for Particle Physics

iVDGL l International Virtual-Data Grid Laboratory –A place to conduct Data Grid tests at scale –Concrete manifestation of world-wide grid activity –Continuing activity that will drive Grid awareness l Scale of effort –For national, intl scale Data Grid tests, operations –Computation & data intensive computing l Who –Initially US-UK-Italy-EU; Japan, Australia –& Russia, China, Pakistan, India, South America? –StarLight and other international networks vital U.S. Co-PIs: Avery, Foster, Gardner, Newman, Szalay

iVDGL Map Circa Tier0/1 facility Tier2 facility 10+ Gbps link 2.5 Gbps link 622 Mbps link Other link Tier3 facility

Summary l “Grids”: Resource sharing & problem solving in dynamic virtual organizations –Many projects now working to develop, deploy, apply relevant technologies l Common protocols and services are critical –Globus Toolkit a source of protocol and API definitions, reference implementations l Rapid progress on definition, implementation, and application of Data Grid architecture –Harmonizing U.S. and E.U. efforts important