LIGO-G020232-00-E Generic LabelLIGO Laboratory at Caltech 1 Distributed Computing for LIGO Data Analysis The Aspen Winter Conference on Gravitational Waves,

Slides:



Advertisements
Similar presentations
High Performance Computing Course Notes Grid Computing.
Advertisements

Data Grids Darshan R. Kapadia Gregor von Laszewski
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
Seminar Grid Computing ‘05 Hui Li Sep 19, Overview Brief Introduction Presentations Projects Remarks.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Distributed Heterogeneous Data Warehouse For Grid Analysis
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Grid Services at NERSC Shreyas Cholia Open Software and Programming Group, NERSC NERSC User Group Meeting September 17, 2007.
Simo Niskala Teemu Pasanen
Globus Computing Infrustructure Software Globus Toolkit 11-2.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
LIGO-G E ITR 2003 DMT Sub-Project John G. Zweizig LIGO/Caltech Argonne, May 10, 2004.
XCAT Science Portal Status & Future Work July 15, 2002 Shava Smallen Extreme! Computing Laboratory Indiana University.
LIGO-G9900XX-00-M LIGO and GriPhyN Carl Kesselman Ewa Deelman Sachin Chouksey Roy Williams Peter Shawhan Albert Lazzarini Kent Blackburn CaltechUSC/ISI.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Long Term Ecological Research Network Information System LTER Grid Pilot Study LTER Information Manager’s Meeting Montreal, Canada 4-7 August 2005 Mark.
LIGO- G E LIGO Grid Applications Development Within GriPhyN Albert Lazzarini LIGO Laboratory, Caltech GriPhyN All Hands.
Patrick R Brady University of Wisconsin-Milwaukee
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
GriPhyN Status and Project Plan Mike Wilde Mathematics and Computer Science Division Argonne National Laboratory.
Ruth Pordes, Fermilab CD, and A PPDG Coordinator Some Aspects of The Particle Physics Data Grid Collaboratory Pilot (PPDG) and The Grid Physics Network.
LDAS L IGO Data Analysis System (LDAS) PAC 11 Meeting California Institute of Technology November 29, 2001 James Kent Blackburn LIGO Laboratory, Caltech.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
LIGO Z LIGO Scientific Collaboration -- UWM 1 LSC Data Analysis Alan G. Wiseman (LSC Software Coordinator) LIGO Scientific Collaboration.
Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
LIGO- GXXXXXX-XX-X GriPhyN Kickoff Meeting LSC Member Institution (UT Brownsville) 1 GriPhyN Outreach Program.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Some Grid Science California Institute of Technology Roy Williams Paul Messina Grids and Virtual Observatory Grids and and LIGO.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
High Energy Physics and Grids at UF (Dec. 13, 2002)Paul Avery1 University of Florida High Energy Physics.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
LIGO-G E LIGO Scientific Collaboration Data Grid Status Albert Lazzarini Caltech LIGO Laboratory Trillium Steering Committee Meeting 20 May 2004.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Junwei Cao March LIGO Document ID: LIGO-G Computing Infrastructure for Gravitational Wave Data Analysis Junwei Cao.
LIGO Plans for OSG J. Kent Blackburn LIGO Laboratory California Institute of Technology Open Science Grid Technical Meeting UCSD December 15-17, 2004.
State of LSC Data Analysis and Software LSC Meeting LIGO Hanford Observatory November 11 th, 2003 Kent Blackburn, Stuart Anderson, Albert Lazzarini LIGO.
Virtual Data Management for CMS Simulation Production A GriPhyN Prototype.
LSC Meeting LIGO Scientific Collaboration - University of Wisconsin - Milwaukee 1 Software Coordinator Report Alan Wiseman LIGO-G Z.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
LIGO-G Z1 Using Condor for Large Scale Data Analysis within the LIGO Scientific Collaboration Duncan Brown California Institute of Technology.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
Summary of OSG Activities by LIGO and LSC LIGO NSF Review November 9-11, 2005 Kent Blackburn LIGO Laboratory California Institute of Technology LIGO DCC:
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Distributed Systems Architectures. Topics covered l Client-server architectures l Distributed object architectures l Inter-organisational computing.
Presentation transcript:

LIGO-G E Generic LabelLIGO Laboratory at Caltech 1 Distributed Computing for LIGO Data Analysis The Aspen Winter Conference on Gravitational Waves, (GWADW) at Isola d’Elba, Italia 22 May 2002 Albert Lazzarini LIGO Laboratory Caltech

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 2 LIGO Laboratory Data Analysis System (LDAS) A distributed network of resources within LIGO Laboratory and its Collaboration ( Geographically Dispersed Laboratory plus Collaboration Institutional Facilities Distributed Computing Has Been Necessarily Part of the LIGO Design from the Beginning

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 3 LIGO is in the GriPhyN Collaboration GriPhyN = Science Applications + CS + Grids GriPhyN = Grid Physics Network (NSF Program) »US-CMSHigh Energy Physics »US-ATLASHigh Energy Physics »LIGO/LSCGravitational wave research »SDSSSloan Digital Sky Survey »Strong partnership with computer scientists Design and implement production-scale grids »Develop common infrastructure, tools and services »Integration into the 4 experiments »Application to other sciences via “Virtual Data Toolkit” (VDT) Multi-year project »GriPhyN - R&D for grid architecture : 5 years, starting 2000 »iVDGL - implementation of initial Tier 2 Centers for LIGO: 5 years, starting 2001 –Integrate Grid infrastructure into experiments through VDT middleware software

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 4 Institutions working on LIGO grid research LIGO Laboratory »Tier 1 Center - GriPhyN, iVDGL –Caltech -- main archive, data center –MIT - laboratory-operated Tier 2 Center –Observatories - data generation centers LIGO Scientific Collaboration (LSC) »Tier 2 Centers –University of Wisconsin at Milwaukee - GriPhyN, iVDGL –Pennsylvania State University - iVDGL »Tier 3 Centers & outreach –University of Texas at Brownsville (Hispanic minorities) - GriPhyN, iVDGL –Salish-Kootenai College, Montana (Native American tribal college) - iVDGL Computer Science »University of Southern California/ISI (Kesselman et al.)

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 5 Tiered Grid Hierarchical Model for LIGO (Grid Physics Network Project - Database node Database node Ineta 2 WAN Network node Compute nodes Grid Tier 2 Node: N compute + Database + Network OC12 LSC Tier 2 First two centers: UWM, PSU Hanford Livingston Caltech MIT INet2 Abilene T1 (at present) OC3 (planned) OC48 Tier 1 T1 (at present) OC3 (planned)

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 6 LIGO and LSC Computing Resources Serve Multiple Uses Scientific & infrastructure Software Development Data Archival & Reducttion Data Analysis Grid R&D General Computing Resources within LIGO

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 7 LIGO Data Analysis System Block Diuagram LIGO Software Model: Layered Modular Maps onto distributed H/W architecture

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 8 Preliminary GriPhyN Data Grid Architecture Application Planner Executor Catalog Services Info Services Policy/Security Monitoring Repl. Mgmt. Reliable Transfer Service Compute ResourceStorage Resource DAGMan, Condor-G GRAMGridFTP; GRAM; SRM GSI, CAS, MyProxy MDS MCAT; GriPhyN catalogs GDMP MDS Globus cDAG attributes New/modified in Prototype Standard Globus or Condor-G component aDAG

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 9 LIGO data & processing needs that can be fulfilled by the grid - data replication - LIGO archive replica »40TB today »300TB by »Transposed, reduced data sets archived remotely efficient access by collaboration users from second source –Tier 2 centers –Teragrid (Caltech/SDSC/ANL/NCSA) »Geographic separation from Tier 1 center at Caltech –Redundant access –Faster access for other U.S. regions

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 10 LIGO data & processing needs that can be fulfilled by the grid - extended computational resources on the grid - Massively parallel processing of GW channel »Inspiral searches to low mass –e.g.: [5-50 Mflop/byte] for inspiral search of GW channel –x [0.2 TB] total cleaned GW channel for LIGO I –Science analysis software maintained by Collaboration as a vetted, valdiated body of scientific software LAL -- LIGO Algorithm Library Dynamically loaded libraries (DLLs), shared objects (DSOs) Loaded at run time per script specification from CVS archive »Large-area search for unknown periodic sources –Long (coherent) Fourier Transforms For weakest signals e.g., 1 kHz for 10 days => ~ 10 9 point FFTs –Barycenter motion modulates signal uniquely for every point in the sky –The CW equivalent of the “filter bank”  Unlike other grid projects, LIGO has data NOW »Strategic use of US national computing resources to extend LIGO and Collaboration capabilities

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 11 Grid research within LIGO CY CY2002 Developed LIGO Virtual Data requirements LIGO/GriPhyN prototype Simple demonstration of Virtual Data Concepts (SuperComputing 2001 Convention) Data access transparency with respect to location Data access transparency with respect to materialization Provided a Globus interface to LDAS Basis for a secure access to LIGO resources Designed the Transformation Catalog Needed to find an appropriate executable binaries for a given H/W architecture Can be used in many systems Basic infrastructure for the development of Virtual Data concepts Foundation for Year 2

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 12 GriPhyN/LIGO prototype functionality Interpret an XML-specified request Acquire user’s proxy credentials Consult replica catalog to find available data Construct a plan to produce data not available Execute the plan Return requested data in Frame or XML format Year 1 Virtual Data Product: single channel frame: Extraction Compute resources running LIGO Data Analysis System at Caltech and UWM, storage resources at ISI, UWM and Caltech XML LIGO Data Request, Specification LIGO Data Product (Frame) GriPhyN Layer LIGO LDAS

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 13 HTTP frontend MyProxy server Replica Catalog Executor CondorG/ DAGMan Planner Monitoring Transformation Catalog GridFTPGRAM/LDAS LDAS GridCVS Logs Storage Resource GridFTP Compute Resource GRAM xml G-DAG (DAGMan) SC2001 Prototype Integration of LDAS + Grid Tools Web-based cgi Client interface

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 14 Query Replica Catalog File found at Location X File not found X= Y* X!= Y* Scenario 1 Scenario 2 Query Replica Catalog for Filename.F X= ISIX= UWM Scenario 3.1Scenario 3.2 * Y=User requested location X=ISI & UWM Error: File Does not exist Y=ISIY=UWM Scenario 3.1Scenario 3.2 LIGO GriPhyN Planner for virtual data requests Collection name Channel name Time interval Desired Loc Filename.F File found at Location X

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 15 Pulsar Search Mock Data Challenge Extending CY2001 Prototype Extend prototype beyond data access and requests »Large-area GW pulsar search, as a science focus »Use of virtual data (“SFTs”) »Request planning and execution of analysis on distributed resources Broaden the GRAM/LDAS interface »Richer variability and functionality in data access methods: –Short time Fourier transforms as virtual data (SFTs) –Concatenation, decimation and resampling or frame data Design a Data Discovery mechanism for discovery of data replicas. »Ability to interact with the LDAS Diskcache resources Implementation of the Data Discovery mechanism to support the pulsar search

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 16 Year 2 Research & Development Explore bulk data operations »Finding new available data »Registering data into catalogs Deepen the understanding of Virtual Data naming »How do you ask for what you want? Planning and Fault Tolerance »Need to specify model »Explore existing planning solutions »Examine fault tolerance issues at the system level Scalable pulsar search to scientifically interesting levels of sensitivity at SC’2002

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 17 THE CHALLENGE FOR LIGO: Integrating grid functionality within an existing framework LIGO Software is already on a production schedule! LDAS Release Timeline Jan 2002 Feb Dec Mar Jun May Apr Jul Aug Oct Sep Nov Jan 2003 FebDec Mar Jun May Apr Jul Aug Oct Sep Nov LDAS 0.2.0LDAS 0.3.0LDAS 0.4.0LDAS LDAS LDAS E7 E8 S1 and working!!!! ^

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 18 A “Modest Proposal” For inter-project development of grid infrastructure 1. NDAS is a working set of unix-level interfaces that effectively interleave the data of 5 different international efforts TODAY 2. Use the existing and growing interaction with NDAS as a first step to developing a GW international grid (EUGrid + iVDGL) »“BREAK” NDAS temporarily in order to migrate the infrastructure to grid-based utilities and tools: –Globus package for Secure, authorized data access and transmission Robust data transmission using the Gridftp protocol –Condor-G for (eventually) submitting analysis jobs across collaborations 3. Add people to the NDAS team who know and can implement these technologies

LIGO-G E LSC Meeting LIGO Laboratory at Caltech 19 FINIS