STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.

Slides:



Advertisements
Similar presentations
Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
Advertisements

1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific.
Aug Arie Shoshani Particle Physics Data Grid Request Management working group.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
STAR Software Walk-Through. Doing analysis in a large collaboration: Overview The experiment: – Collider runs for many weeks every year. – A lot of data.
Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani.
The Cactus Portal A Case Study in Grid Portal Development Michael Paul Russell Dept of Computer Science The University of Chicago
Distributed Application Management Using PLuSH Jeannie Albrecht, Christopher Tuttle, Alex C. Snoeren, and Amin Vahdat UC San Diego CSE {jalbrecht, ctuttle,
Chapter 1 and 2 Computer System and Operating System Overview
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
July, 2001 High-dimensional indexing techniques Kesheng John Wu Ekow Otoo Arie Shoshani.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
A User’s Introduction to the Grand Challenge Software STAR-GC Workshop Oct 1999 D. Zimmerman.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
A. Sim, CRD, L B N L 1 OSG Applications Workshop 6/1/2005 OSG SRM/DRM Readiness and Plan Alex Sim / Jorge Rodriguez Scientific Data Management Group Computational.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
Grid Workload Management Massimo Sgaravatto INFN Padova.
1 New developments in the HENP-GC HENP-GC Collaboration New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC.
The STAR Grid Collector and TBitmapIndex John Wu Kurt Stockinger, Rene Brun, Philippe Canal – TBitmapIndex Junmin Gu, Jerome Lauret, Arthur M. Poskanzer,
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Computer Science Research and Development Department Computing Sciences Directorate, L B N L 1 Storage Management and Data Mining in High Energy Physics.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Using Bitmap Index to Speed up Analyses of High-Energy Physics Data John Wu, Arie Shoshani, Alex Sim, Junmin Gu, Art Poskanzer Lawrence Berkeley National.
Grand Challenge and PHENIX Report post-MDC2 studies of GC software –feasibility for day-1 expectations of data model –simple robustness tests –Comparisons.
By Shanna Epstein IS 257 September 16, Cnet.com Provides information, tools, and advice to help customers decide what to buy and how to get the.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
PPDG update l We want to join PPDG l They want PHENIX to join NSF also wants this l Issue is to identify our goals/projects Ingredients: What we need/want.
A university for the world real R © 2009, Chapter 9 The Runtime Environment Michael Adams.
February 28, 2003Eric Hjort PDSF Status and Overview Eric Hjort, LBNL STAR Collaboration Meeting February 28, 2003.
January 26, 2003Eric Hjort HRMs in STAR Eric Hjort, LBNL (STAR/PPDG Collaborations)
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
Grand Challenge in MDC2 D. Olson, LBNL 31 Jan 1999 STAR Collaboration Meeting
STAR C OMPUTING Plans for Production Use of Grand Challenge Software in STAR Torre Wenaus BNL Grand Challenge Meeting LBNL 10/23/98.
1 fileCatalog, tagDB and GCA A. Vaniachine Grand Challenge STAR fileCatalog, tagDB and Grand Challenge Architecture A. Vaniachine presenting for the Grand.
SDM Center Coupling Parallel IO to SRMs for Remote Data Access Ekow Otoo, Arie Shoshani and Alex Sim Lawrence Berkeley National Laboratory.
1 Grid in STAR: Needs and Plans A. Vaniachine STAR Grid Components in STAR: Experience, Needs and Plans A. Vaniachine Lawrence Berkeley National Laboratory.
Linux Operations and Administration
March, 2002 Efficient Bitmap Indexing Techniques for Very Large Datasets Kesheng John Wu Ekow Otoo Arie Shoshani.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
PPDG meeting, July 2000 Interfacing the Storage Resource Broker (SRB) to the Hierarchical Resource Manager (HRM) Arie Shoshani, Alex Sim (LBNL) Reagan.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
1 Xrootd-SRM Andy Hanushevsky, SLAC Alex Romosan, LBNL August, 2006.
Microsoft ® Official Course Module 6 Managing Software Distribution and Deployment by Using Packages and Programs.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
1 Efficient Data Access for Distributed Computing at RHIC A. Vaniachine Efficient Data Access for Distributed Computing at RHIC A. Vaniachine Lawrence.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
1 Scientific Data Management Group LBNL SRM related demos SC 2002 DemosDemos Robust File Replication of Massive Datasets on the Grid GridFTP-HPSS access.
BIG DATA/ Hadoop Interview Questions.
The HENP Grand Challenge Project and initial use in the RHIC Mock Data Challenge 1 D. Olson DM Workshop SLAC, Oct 1998.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
U.S. ATLAS Grid Production Experience
Maximum Availability Architecture Enterprise Technology Centre.
CHAPTER 3 Architectures for Distributed Systems
Database Performance Tuning and Query Optimization
Chapter 11 Database Performance Tuning and Query Optimization
Presentation transcript:

STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National Lab In collaboration with Jerome Lauret, Victor Perevoztchikov, Valeri Faine, Jeff Porter, Sasha Vanyashin Brookhaven National Laboratory

STAR Collaboration, July 2004 A View of the Analysis Process Users want to analyze some events of interest Events are stored in millions of files Files are distributed on many storage systems To perform an analysis, a user needs to 1.Write the analysis code, run it 2.Specify the events of interest 3.Locate the files containing the events 4.Prepare disk space for the files 5.Transfer the files to the disks 6.Recover from any errors 7.Read the events of interest from files 8.Remove the files

STAR Collaboration, July 2004 Design Goals of Grid Collector Make analysts more productive by Reading only events of interest Automating the management of distributed files and disks

STAR Collaboration, July 2004 Approaches of Grid Collector Allow users to specify events of interest using meaningful physical quantities –numberOfPrimaryTracks > 1000 AND vectorSumOfPt > 20 –Simplify step 2 Automate file management tasks –Use File Catalogs to locate files –Use Storage Resource Manager to manage the disk space and file transfers –Remove steps

STAR Collaboration, July 2004 Storage Access Coordination System Strength – Allow user to specify events as range conditions Automate file management tasks Weakness – Designed for Objectivity data Access only one HPSS Query Estimator (QE) Cache Manager (CM) Query Monitor (QM) Query estimation / execution requests file caching request Caching Policy Module File Catalog (FC) Bitmap index file purging Disk Cache file caching User’s Application open, read, close

STAR Collaboration, July 2004 Grid Collector: Architecture Analysis code New query Event iterator Bitmap index In: conditions Out: logical files, object IDs File Locator In: logical name, Out: physical location Grid Collector Coordinator File Scheduler In: physical file DRM administrator Fetch tag file Load subset Rollback Commit Index Builder In: STAR tag file Out: bitmap index NFS, local disk File Catalog 1 File Catalog 2 HRM 1 HRM Clients Servers

STAR Collaboration, July 2004 GC vs. STAR Scheduler GC Select events with range conditions Read only selected events Automate all file and space management tasks Scheduler Specify a list of files on disk Read all events of the files Use Data Carousel for HPSS files Both can split large jobs to multiple machines

STAR Collaboration, July 2004 GC vs. STACS GC Use multiple File Catalogs and multiple Storage Systems Integrate index building functions into the server –Improves index building speed Make use of distributed disk caches, clients can have their own caches STACS Limit to only one File Catalog and one Storage System Use a separate Index Feeder to digest tag files –Has very low data transfer rate through CORBA Make use of one disk cache, clients must access the disk cache Both select events with range conditions Both automatically manage files and disks

STAR Collaboration, July 2004 This Year vs. Last Year This Year Process all files, including MuDST Build indices fast –Use automated file management functions –Indexing 15 million events took one week Interact with multiple File Catalogs Last Year Process event files, but not MuDST Build indices slowly –Index feeder requires manual file transfer –Indexing 5 million events took 10 weeks Interact with only one File Catalog

STAR Collaboration, July 2004 What Can Grid Collector Do For You If you gather statistics on lots of events –Grid Collector allows you to work with files not already on disk If you search for rare events, Grid Collector allows you to –Specify the events with ease –Access only relevant files –Read only selected events If you want to try some analysis ideas outside of the main computer centers, –Grid Collect manages file and space for you

STAR Collaboration, July 2004 How To Use The Grid Collector Must use StIOMaker –StIOMaker can now handle all files including MuDST Replace StFile with StGridCollector –StIOMaker requires a StFileI object –One currently uses “ new StFile(…) ” to create a StFileI object –Grid Collector provides a new way, “ StGridCollector::Create(SELECT geant, event WHERE …) ” Iterate through events as usual

STAR Collaboration, July 2004 How To Use -- More Details External dependencies –Globus, ROOT, STAR Software –Storage Resource Manager (DRM, HRM) –ORBACUS Servers –Main Grid Collector Coordinator –DRM/HRM –File Catalogs Client library –User need to load this in the macros

STAR Collaboration, July 2004 How To Select Events SELECT [MuDst|event|…] WHERE NV0>100 AND … The WHERE clause consist of range conditions joined with logical operators AND, OR, NOT. All tags and a few File Catalog key words can be used in the WHERE clause Variables with multiple values can be addressed with index, e.g., scaAnalysisMatrix[7]

STAR Collaboration, July 2004 Status Of Grid Collector One version in production mode at BNL An updated version in final testing stages Brave early adopters still needed Contact information Wei-Ming Zhang Jerome Lauret John Wu