Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

Slides:



Advertisements
Similar presentations
Virtual Data and the Chimera System* Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science.
Advertisements

SALSA HPC Group School of Informatics and Computing Indiana University.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
R. Cavanaugh GriPhyN Analysis Workshop Caltech, June, 2003 Virtual Data Toolkit.
Experience with ATLAS Data Challenge Production on the U.S. Grid Testbed Kaushik De University of Texas at Arlington CHEP03 March 27, 2003.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Near Earth Objects Near-Earth Objects (NEOs) are comets and asteroids that have been nudged by the gravitational attraction of nearby planets into orbits.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Grappa: Grid access portal for physics applications Shava Smallen Extreme! Computing Laboratory Department of Physics Indiana University.
XCAT Science Portal Status & Future Work July 15, 2002 Shava Smallen Extreme! Computing Laboratory Indiana University.
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
Experiment Requirements for Global Infostructure Irwin Gaines FNAL/DOE.
Grid Testbed Activities in US-CMS Rick Cavanaugh University of Florida 1. Infrastructure 2. Highlight of Current Activities 3. Future Directions NSF/DOE.
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
LIGO- G E LIGO Grid Applications Development Within GriPhyN Albert Lazzarini LIGO Laboratory, Caltech GriPhyN All Hands.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
GriPhyN Status and Project Plan Mike Wilde Mathematics and Computer Science Division Argonne National Laboratory.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
Grid Scheduler: Plan & Schedule Adam Arbree Jang Uk In.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Atlas Grid Status - part 1 Jennifer Schopf ANL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
GriPhyN Virtual Data System Grid Execution of Virtual Data Workflows Mike Wilde Argonne National Laboratory Mathematics and Computer Science Division.
Metadata Mòrag Burgon-Lyon University of Glasgow.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
High Energy Physics and Grids at UF (Dec. 13, 2002)Paul Avery1 University of Florida High Energy Physics.
ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.
LIGO-G E LIGO Scientific Collaboration Data Grid Status Albert Lazzarini Caltech LIGO Laboratory Trillium Steering Committee Meeting 20 May 2004.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Pegasus-a framework for planning for execution in grids Karan Vahi USC Information Sciences Institute May 5 th, 2004.
Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.
LIGO Plans for OSG J. Kent Blackburn LIGO Laboratory California Institute of Technology Open Science Grid Technical Meeting UCSD December 15-17, 2004.
April 2003 Iosif Legrand MONitoring Agents using a Large Integrated Services Architecture Iosif Legrand California Institute of Technology.
State of LSC Data Analysis and Software LSC Meeting LIGO Hanford Observatory November 11 th, 2003 Kent Blackburn, Stuart Anderson, Albert Lazzarini LIGO.
GriPhyN Project Paul Avery, University of Florida, Ian Foster, University of Chicago NSF Grant ITR Research Objectives Significant Results Approach.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Magda Distributed Data Manager Prototype Torre Wenaus BNL September 2001.
- GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
LIGO-G Z1 Using Condor for Large Scale Data Analysis within the LIGO Scientific Collaboration Duncan Brown California Institute of Technology.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Planning Session. ATLAS(-CMS) End-to-End Demo Kaushik De is the Demo Czar Need to put team together Atlfast production jobs –Atlfast may be unstable over.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
Magda Distributed Data Manager Torre Wenaus BNL October 2001.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
EDG Project Conference – Barcelona 13 May 2003 – n° 1 A.Fanfani INFN Bologna – CMS WP8 – Grid Planning in CMS Outline  CMS Data Challenges  CMS Production.
U.S. ATLAS Grid Production Experience
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
CMS report from FNAL demo week Marco Verlato (INFN-Padova)
LCG middleware and LHC experiments ARDA project
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Initial job submission and monitoring efforts with JClarens
Presentation transcript:

Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee 13 January, 2002

GriPhyN/iVDGL and ATLAS Argonne, Boston, Brookhaven, Chicago, Indiana, Berkeley, Texas

EAC Review3 ATLAS at SC2002 l Grappa Manages the overall grid experience Magda Distributed data management and replication Pacman Defines and produces software environments Dc1 production with grat Data challenge simulations for Atlas Instrumented athena Grid monitoring of Atlas analysis applications vo-gridmap Virtual organization management Gridview Monitoring U.S. Atlas resources Worldgrid World-wide US/EU grid infrastructure

EAC Review4 Pacman at SC2002 l How did we install our software for this demo? % pacman –get iVDGL:WorldGrid ScienceGrid l Pacman lets you define how a mixed tarball/rpm/gpt/native software environment is Fetched Installed Setup Updated l This can be figured out once and exported to the rest of the world via caches % pacman –get atlas_testbed

EAC Review5 The caches you have decided to trust Installed software, pointer to local documentation Dependencies are automatically resolved Pacman at SC2002 l How did we install our software for this demo? % pacman –get iVDGL:WorldGrid ScienceGrid l Pacman lets you define how a mixed tarball/rpm/gpt/native software environment is Fetched Installed Setup Updated l This can be figured out once and exported to the rest of the world via caches % pacman –get atlas_testbed

EAC Review6 Grappa at SC2002 l Web-based interface for Athena job submission to Grid resources l Based on XCAT Science Portal technology developed at Indiana l EDG JDL backend to Grappa l Common submission to US gatekeepers and EDG resource broker (through EDG “user interface” machine)

EAC Review7 Grappa Portal Machine: XCAT tomcat server Web Browsing Machine (JavaScript) Netscape/Mozilla/Int.Expl/PalmScape https - JavaScript http: JavaScriptCactus framework Script-Based Submisson interactive or cron-job Resource A Resource Z... MAGDA: registers file/location registers file metadata Compute Resources browse catalogue CoG : Submission, Monitoring CoG : Data Copy Data Storage: - Data Disk - HPSS Magda (spider) Input files Grappa Communications Flow

EAC Review8 Instrumented Athena at SC2002 l Part of SuperComputing 2002 ATLAS demo l Prophesy ( An Infrastructure for Analyzing & Modeling the Performance of Parallel & Distributed Applications Normally a Parse & auto- instrument approach (C & FORTRAN). l NetLogger ( didc.lbl.gov/NetLogger/) End-to-End Monitoring & Analysis of Distributed Systems C, C++, Java, Python, Perl, Tcl APIs Web Service Activation

GriPhyN/iVDGL and CMS Caltech, Fermilab, Florida, San Diego, Wisconsin

EAC Review10 Bandwidth Gluttony at SC2002 l "Grid-Enabled" particle physics analysis application l issued remote database selection queries; prepared data object collections, l moved collections across the WAN using specially enhanced TCP/IP stacks l rendered the results in real time on the analysis client workstation in Baltimore.

EAC Review11 MonaLisa at SC2002 l MonaLisa (Caltech) –Deployed on the US-CMS Test-bed –Dynamic information/resource discovery mechanism using agents –Implemented in >Java / Jini with interfaces to SNMP, MDS, and Ganglia >WDSL / SOAP with UDDI –Proved critical during live CMS production runs Pictures taken from Iosif Legrand

EAC Review12 MOP and Clarens at SC2002 l Simple, robust grid planner integrated with CMS production software l 1.5 million simulated CMS events produced over 2 months (~30 CPU years) VDT Client VDT Server 1 MCRunJob DAGMan/ Condor-G Condor GridFTP VDT Server N Condor GridFTP mop-submitter LinkerScriptGen Config Req. Self Des Master Clarens Client Clarens Server Clarens Server

EAC Review13 Chimera Production at SC2002 l Used VDL to describe virtual data products and their dependencies l Used the Chimera Planners to map abstract workflows onto concrete grid resources l Implemented a WorkRunner to continously schedule jobs across all grid sites Generator Simulator Formator Reconstructor Ntuple Production Analysis params exec. data Stage File In Execute Job Stage File Out Register File Example CMS concrete DAG

EAC Review14 mass = 200 decay = WW stability = 1 event = 8 mass = 200 decay = WW stability = 1 plot = 1 mass = 200 decay = WW plot = 1 mass = 200 decay = WW event = 8 mass = 200 decay = WW stability = 1 mass = 200 decay = WW stability = 3 mass = 200 decay = WW mass = 200 decay = ZZ mass = 200 plot = 1 mass = 200 event = 8 A virtual space of simulated data is created for future use by scientists... Data Provenance at SC2002

EAC Review15 mass = 200 decay = WW stability = 1 event = 5 mass = 200 decay = WW stability = 1 plot = 1 mass = 200 decay = WW plot = 1 mass = 200 decay = WW event = 8 mass = 200 decay = WW stability = 1 mass = 200 decay = WW stability = 3 mass = 200 decay = WW mass = 200 decay = ZZ mass = 200 plot = 1 mass = 200 event = 8 Search for WW decays of the Higgs Boson and where only stable, final state particles are recorded: mass = 200; decay = WW; stability = 1 Data Provenance at SC2002

EAC Review16 mass = 200 decay = WW stability = 1 LowPt = 20 HighPt = mass = 200 decay = WW stability = 1 event = 8 mass = 200 decay = WW stability = 1 plot = 1 mass = 200 decay = WW plot = 1 mass = 200 decay = WW event = 8 mass = 200 decay = WW stability = 1 mass = 200 decay = WW stability = 3 mass = 200 decay = WW mass = 200 decay = ZZ mass = 200 plot = 1 mass = 200 event = 8...The scientist adds a new derived data branch... and continues to investigate ! Data Provenance at SC2002

ISI, Caltech, Milwaukee GriPhyN and LIGO (Laser Interferometer Gravitational-wave Observatory)

EAC Review18 LIGO’s Pulsar Search Long time frames Store raw channels Short time frames Hz Time Single Frame Extract channel transpose Time-frequency Image Find Candidate event DB archive Interferom eter Short Fourier Transform Extract frequency range Construct image 30 minutes

EAC Review19 l Developed at ISI as part of the GriPhyN project l Configurable system that can map and execute complex workflows on the Grid l Integrated with the GriPhyN Chimera system It Receives an abstract workflow (AW) description from Chimera, produces a concrete workflow (CW) Submits the CW to DAGMan for execution. Optimizations of CW are done from the point of view of Virtual Data. l Can perform AW planning based on application-level metadata attributes. l Given attributes such as time interval, frequency of interest, location in the sky, etc., Pegasus is currently able to produce any virtual data products present in the LIGO pulsar search Pegasus: Planning for Execution in Grids

EAC Review20 Metadata Driven Configuration

EAC Review21 LIGO’s pulsar search at SC2002 l The pulsar search conducted at SC 2002 Used LIGO’s data collected during the first scientific run of the instrument Targeted a set of 1000 locations of known pulsar as well as random locations in the sky Results of the analysis were published via LDAS (LIGO Data Analysis System) to the LIGO Scientific Collaboration performed using LDAS and compute and storage resources at Caltech, University of Southern California, University of Wisconsin Milwaukee.

EAC Review22 Results SC 2002 demo l Over 58 pulsar searches l Total of 330 tasks 469 data transfers 330 output files l The total runtime was 11:24:35 To date l 185 pulsar searches l Total of 975 tasks 1365 data transfers 975 output files l Total runtime 96:49:47

Virtual Galaxy Cluster System: An Application of the GriPhyN Virtual Data Toolkit to Sloan Digital Sky Survey Data Chicago, Argonne, Fermilab

EAC Review24 The Brightest Cluster Galaxy Pipeline Interesting intermediate data reuse made possible by Chimera: maxBcg is a series of transformations Cluster finding works well with 1 Mpc radius apertures. If one instead was looking for the sites of gravitational lensing, one would rather use a 1/4 Mpc radius. This would start at transformation 3. l 1: extracts galaxies from the full tsObj data set. l 2: filter the field for Bright Red Galaxies. l 3: calculate the weighted BCG likelihood for each galaxy, most expensive. l 4: is this galaxy the most likely galaxy in the neighborhood? l 5: remove extraneous data, and store in a compact format.

EAC Review25 BRG Core Cluster Catalog The DAG

EAC Review26 A DAG for 50 Fields l 744 files, 387 nodes, 40 minutes

EAC Review27 With Jim Annis & Steve Kent, FNAL Galaxy cluster size distribution DAG Example: Sloan Galaxy Cluster Analysis Sloan Data

EAC Review28 Conclusion l Built a virtual cluster system based on Chimera and SDSS cluster finding. l Described the five stages and data dependencies in VDL. l Tested the system on a virtual data grid. l Conducting performance analysis. l Helped improve Chimera.

EAC Review29 Some CMS Issues/Challenges l How to generate more buy-in from the experiments? Sociological trust problem, not technical. l More exploition of (virtual) collections of objects and further use of web services (work already well underway). l What is required to store the complete provenance of data generated in a grid environment? l Creation of collaborative peer-to-peer environments. l Data Challenge : generate and analyze 5% of the expected data at startup (~1/2 year of continuous production). l What is the relationship between WorldGRID and the LCG? l Robust, portable applications! Virtual Organization Management and Policy Enforcement.

EAC Review30 Some ATLAS Issues/Challenges l How to generate more buy-in from the experiments? Sociological trust problem, not technical. Fleshing out the notion of Pacman "Projects" and prototyping them l What is the best integration path for chimera infrastructure with international atlas catalog systems? Need standardized Virtual Data API? l Packaging and distribution of ATLAS SW releases for each step in the production/analysis chain: gen, sim, reco, analysis. l LCG SW application development env. is now SCRAM: ATLAS evaluating possible migration from CMT to SCRAM

EAC Review31 SDSS Challenges l Cluster Finding Distribution of clusters in the universe Evolution of the mass function Balanced I/O and compute l Power Spectrum Distribution of galaxies in the universe Direct constraints on cosmological parameters Compute intensive, prefer MPI systems Premium on discovering similar results l Analyses based on pixel data Weak lensing analysis of the SDSS coadded southern survey data Near Earth asteroid searches Galaxy morphological properties: NVO Galaxy Morphology Demo All involve moving around terabytes of data Or choosing not to

EAC Review32 LIGO Challenges