Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani.

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

SDM Center Coupling Parallel IO with Remote Data Access Ekow Otoo, Arie Shoshani, Doron Rotem, and Alex Sim Lawrence Berkeley National Lab.
1 SRM-Lite: overcoming the firewall barrier for large scale file replication Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory April, 2007.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific.
Aug Arie Shoshani Particle Physics Data Grid Request Management working group.
EU-GRID Work Program Massimo Sgaravatto – INFN Padova Cristina Vistoli – INFN Cnaf as INFN members of the EU-GRID technical team.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL June 23, 2003 GAE workshop Caltech.
Workload Management Massimo Sgaravatto INFN Padova.
File Management Chapter 12.
STACS STACS: Storage Access Coordination of Tertiary Storage for High Energy Physics Applications Arie Shoshani, Alex Sim, John Wu, Luis Bernardo*, Henrik.
Agenda  Overview  Configuring the database for basic Backup and Recovery  Backing up your database  Restore and Recovery Operations  Managing your.
Searching Technology For a Large Number Of Objects Kurt Stockinger and John Wu Lawrence Berkeley National Laboratory.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
July, 2001 High-dimensional indexing techniques Kesheng John Wu Ekow Otoo Arie Shoshani.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Scientific Data Management (SDM)
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
A User’s Introduction to the Grand Challenge Software STAR-GC Workshop Oct 1999 D. Zimmerman.
A. Sim, CRD, L B N L 1 OSG Applications Workshop 6/1/2005 OSG SRM/DRM Readiness and Plan Alex Sim / Jorge Rodriguez Scientific Data Management Group Computational.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
Chapter 4 Realtime Widely Distributed Instrumention System.
1 New developments in the HENP-GC HENP-GC Collaboration New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC.
Grand Challenge MDC1 Plans Doug Olson Nuclear Science Division, Berkeley Lab for the HENP-GC Collaboration RCF Meeting September 24, 1998.
The STAR Grid Collector and TBitmapIndex John Wu Kurt Stockinger, Rene Brun, Philippe Canal – TBitmapIndex Junmin Gu, Jerome Lauret, Arthur M. Poskanzer,
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Computer Science Research and Development Department Computing Sciences Directorate, L B N L 1 Storage Management and Data Mining in High Energy Physics.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Bulk Data Movement: Components and Architectural Diagram Alex Sim Arie Shoshani LBNL April 2009.
Using Bitmap Index to Speed up Analyses of High-Energy Physics Data John Wu, Arie Shoshani, Alex Sim, Junmin Gu, Art Poskanzer Lawrence Berkeley National.
David Adams ATLAS DIAL status David Adams BNL November 21, 2002 ATLAS software meeting GRID session.
OSes: 3. OS Structs 1 Operating Systems v Objectives –summarise OSes from several perspectives Certificate Program in Software Development CSE-TC and CSIM,
SDM-Center talk Optimizing shared access to tertiary storage Arie Shoshani Alex Sim Alex Sim July 10, 2001 Scientific Data Management Group Computing Science.
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
John Wu Searching Large Scientific Data John Wu Scientific Data Management Lawrence Berkeley National Laboratory.
1 Grid File Replication using Storage Resource Management Presented By Alex Sim Contributors: JLAB: Bryan Hess, Andy Kowalski Fermi: Don Petravick, Timur.
Computing Sciences Directorate, L B N L 1 CHEP 2003 Standards For Storage Resource Management BOF Co-Chair: Arie Shoshani * Co-Chair: Peter Kunszt ** *
Scientific Data Management Research Group National Energy Research Scientific Computing Center, L B N L 1 Henrik Nordberg, June 1998 Query Estimator Henrik.
February 28, 2003Eric Hjort PDSF Status and Overview Eric Hjort, LBNL STAR Collaboration Meeting February 28, 2003.
STAR C OMPUTING STAR Analysis Operations and Issues Torre Wenaus BNL STAR PWG Videoconference BNL August 13, 1999.
FRANEC and BaSTI grid integration Massimo Sponza INAF - Osservatorio Astronomico di Trieste.
STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.
January 26, 2003Eric Hjort HRMs in STAR Eric Hjort, LBNL (STAR/PPDG Collaborations)
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
Grand Challenge in MDC2 D. Olson, LBNL 31 Jan 1999 STAR Collaboration Meeting
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
1 fileCatalog, tagDB and GCA A. Vaniachine Grand Challenge STAR fileCatalog, tagDB and Grand Challenge Architecture A. Vaniachine presenting for the Grand.
SDM Center Coupling Parallel IO to SRMs for Remote Data Access Ekow Otoo, Arie Shoshani and Alex Sim Lawrence Berkeley National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
PPDG meeting, July 2000 Interfacing the Storage Resource Broker (SRB) to the Hierarchical Resource Manager (HRM) Arie Shoshani, Alex Sim (LBNL) Reagan.
Magda Distributed Data Manager Prototype Torre Wenaus BNL September 2001.
1 Efficient Data Access for Distributed Computing at RHIC A. Vaniachine Efficient Data Access for Distributed Computing at RHIC A. Vaniachine Lawrence.
1 Scientific Data Management Group LBNL SRM related demos SC 2002 DemosDemos Robust File Replication of Massive Datasets on the Grid GridFTP-HPSS access.
The HENP Grand Challenge Project and initial use in the RHIC Mock Data Challenge 1 D. Olson DM Workshop SLAC, Oct 1998.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Magda Distributed Data Manager Torre Wenaus BNL October 2001.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
(on behalf of the POOL team)
Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
DRM Deployment Readiness Plan
Presentation transcript:

Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National Lab In collaboration with Jerome Lauret, Victor Perevoztchikov, Valeri Faine, Jeff Porter, Sasha Vanyashin Brookhaven National Laboratory

June 2003Grid Collector Goals Transparent object access –No need for analysts to manage files and disk space –No need for analysts to access remote mass storage systems Select objects based on their attribute values –E.g., production=P03ia & numberOfPrimaryTracks>200 Improve analysis system’s throughput by –Eliminating the need to read all objects in a file –Providing optimized disk space management and automatic garbage collection –Automating the retrieval of files from remote storage systems Interactive analysis of data distributed on the GRID –Providing quick partial answers –Enabling users to transparently share files in disk caches

June 2003Grid Collector Previous Work Storage Resource Access Coordination System (STACS) and the GCA client software for STAR Strength – Transparent event access for one storage site Efficient evaluation of selection conditions through bitmap index Interactive estimation of the selection size Weakness – GCA client software was designed for Objectivity data –Need to access ROOT files now STACS only accesses one HPSS –STAR data is to be distributed on the Grid Query Estimator (QE) Cache Manager (CM) Query Monitor (QM) Query estimation / execution requests file caching request Caching Policy Module File Catalog (FC) Bitmap index file purging Disk Cache file caching User’s Application open, read, close

June 2003Grid Collector Grid Collector is a collection of modules that include functionalities of STACS and GCA client New features – Integrate with STAR analysis framework to extract events from ROOT files Use Storage Resource Manager for disk (DRM) and HPSS (HRM) –GRID enabled, capable of accessing multiple sites More efficient implementation of bitmap index Disk Cache Event Iterator Analysis Logical Request Bitmap Index File scheduler DRM File Catalog HRM Disk Cache HRM Disk Cache BNL LBNL

June 2003Grid Collector The Building Blocks Bitmap Index –Indexes each event –Efficient for partial range queries Storage Resource Manager –Manages disk cache –Automatic retrieval of needed files from the Grid File Scheduler –Coordinates file accesses File Catalog –Provides location information about files Index Feeder –Digests ROOT files to extract information about events (tags) Event Iterator –Feeds events to analysis code in a stream

June 2003Grid Collector Using Grid Collector Existing practice –Specify a list of files or directories containing the desired events –Analyze all events in the files Reading more events than needed –Files have to be on disk before analysis User has to manage the files and space All files have to be present at the same time Using Grid Collector –Specify the conditions characterizing the desired events, such as “production=P03ia & numberOfPrimaryTracks>=200” –Analyze only events satisfying the conditions By reading only the events selected using the bitmap index –Files are retrieved and managed by the Grid Collector User does not have to know about the files Files are retrieved in a stream, reducing the disk space required

June 2003Grid Collector Detailed Use Case Using a sample analysis script called doEvents.C Analyzing first 100 events from production P03ia with 200 or more primary tracks –.x doEvents.C(100, “select production=P03ia & numberOfPrimaryTracks>=200”) To analyze all events, set the first argument to a negative integer To try different conditions without analyzing them, a separate command is available Creating your own script to use the Grid Collector –Load StGridCollector library –Create an object of type StGridCollector –Initialize the object with a select statement –Pass the object to StIOMaker just like a StFile object, the rest of the code is exactly the same as using StFile

June 2003Grid Collector Status and Future Plans Current state –Grid Collector is ready to be used –Currently (June, 2003), we are populating the bitmap index for a STAR user (John Amonett, Kent State University) to do flow analyses Future plans –Speed up the index building process –Enable parallel and distributed analyses for large jobs –Provide capability for users to analyze events in a specified order –Make it into a Grid-enabled service –Collaborate with other experiments (?) Contact information –John Wu –Wei-Ming Zhang –Jerome Lauret