1 CMS Virtual Data Overview Koen Holtman Caltech/CMS GriPhyN all-hands meeting, Marina del Rey April 9, 2001.

Slides:



Advertisements
Similar presentations
Andrew McNab - Manchester HEP - 24 May 2001 WorkGroup H: Software Support Both middleware and application support Installation tools and expertise Communication.
Advertisements

CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea.
A conceptual model of grid resources and services Authors: Sergio Andreozzi Massimo Sgaravatto Cristina Vistoli Presenter: Sergio Andreozzi INFN-CNAF Bologna.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
8.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Data Management for Physics Analysis in PHENIX (BNL, RHIC) Evaluation of Grid architecture components in PHENIX context Barbara Jacak, Roy Lacey, Saskia.
Current Monte Carlo calculation activities in ATLAS (ATLAS Data Challenges) Oxana Smirnova LCG/ATLAS, Lund University SWEGRID Seminar (April 9, 2003, Uppsala)
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Grappa: Grid access portal for physics applications Shava Smallen Extreme! Computing Laboratory Department of Physics Indiana University.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
CMS Software and Computing FNAL Internal Review of USCMS Software and Computing David Stickland Princeton University CMS Software and Computing Deputy.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
DISTRIBUTED COMPUTING
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
D. Newbold / D. BrittonGridPP Collaboration Meeting, 5/11/2001 CMS Status & Requirements Topics covered: CMS Grid Status CMSUK approach to Grid work First.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
1 Object Level Physics Data Replication in the Grid Koen Holtman Caltech/CMS ACAT’2000, Fermilab October 16-20, 2000.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
ANL/BNL Virtual Data Technologies in ATLAS Alexandre Vaniachine Pavel Nevski US-ATLAS Core/GRID software workshop Brookhaven National Laboratory May 6-7,
11 December 2000 Paolo Capiluppi - DataGrid Testbed Workshop CMS Applications Requirements DataGrid Testbed Workshop Milano, 11 December 2000 Paolo Capiluppi,
Ruth Pordes, Fermilab CD, and A PPDG Coordinator Some Aspects of The Particle Physics Data Grid Collaboratory Pilot (PPDG) and The Grid Physics Network.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
1 Grid Related Activities at Caltech Koen Holtman Caltech/CMS PPDG meeting, Argonne July 13-14, 2000.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
7April 2000F Harris LHCb Software Workshop 1 LHCb planning on EU GRID activities (for discussion) F Harris.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Alastair Duncan STFC Pre Coffee talk STFC July 2014 The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
Copyright © cs-tutorial.com. Overview Introduction Architecture Implementation Evaluation.
28 March 2001F Harris LHCb Software Week1 Overview of GGF1 (Global Grid Forum) and Datagrid meeting, NIKHEF, Mar 5-9 F Harris(Oxford)
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
High Energy Physics and Grids at UF (Dec. 13, 2002)Paul Avery1 University of Florida High Energy Physics.
INFSO-RI Enabling Grids for E-sciencE EGEE Review WISDOM demonstration Vincent Bloch, Vincent Breton, Matteo Diarena, Jean Salzemann.
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract INFSO-RI Grid Accounting.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL DOE/NSF Review of US LHC Software and Computing Fermilab Nov 29, 2001.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
StratusLab is co-funded by the European Community’s Seventh Framework Programme (Capacities) Grant Agreement INFSO-RI Demonstration StratusLab First.
14 June 2001LHCb workshop at Bologna1 LHCb and Datagrid - Status and Planning F Harris(Oxford)
Claudio Grandi INFN Bologna CSN1 - Perugia 11/11/2002 Gli esperimenti LHC hanno qualcosa in comune? (HEPCAL RTAG di LCG) C. Grandi INFN - Bologna.
Workload Management Workpackage
Memory COMPUTER ARCHITECTURE
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
US ATLAS Physics & Computing
Presentation transcript:

1 CMS Virtual Data Overview Koen Holtman Caltech/CMS GriPhyN all-hands meeting, Marina del Rey April 9, 2001

2 Introduction o CMS: o Like ATLAS o Now: Large-scale simulations of CMS physics o order of 10 sites, 1,000 CPUs, 10 TB o order of 10 production managers running simulations

3 Current CMS data situation o CMS currently has seamless object-level navigation in the local area (ORCA/CARF) o Supports private views, deep and shallow copies o Uses Objectivity/DB capabilities that do not scale to wide area o Therefore we need Grid to act as wide-area glue o How??  part of the research problem (Source: ORCA 4 tutorial, part II October 2000) CD CH MD MH TH MC UF.boot MyFED.boot User Collection MD CD MC TD

4 CMS requirements status o With respect to all the grid projects (GriPhyN, EU DataGrid, PPDG,...) o Requirements: o CMS internal milestones are already being defined to rely on Grid project deliverables o Long term requirements picture for 2006 CMS Grid system: o CMS consensus emerging, documents nearing completion o (We reserve the right to change our mind later) o Shorter term needs: o Nothing much documented. Driving factors are: o The need to get more automation as the scale of our computing grows -- production managers already overloaded o The need to support data access by many more physicists  The need to grow de facto standard solutions for 2006 o But can't decide on the technology roadmap on our own!

CMS grid requirements o Specification strategy: describe a baseline for the 2006 production data grid system operated by CMS o Emphasize interfaces between box 1 and 2  Specify size and placement of hardware, data volumes, … o Specify numeric characteristics for interface calls o From this it follows what the Grid projects should build/research 2:GriPhyN, EU DG, PPDG +CMS 1: CMS non-Grid Software 3:GGF, Globus,.. … 4: Internet, OS,.. … Component Classification By producer

6 Data Model (simplified) 1 o Data processing chain, but under continuous development o Some data products are uploaded UID

7 Data Model (simplified) 2 o Algorithms A and B are registered, these define new virtual data products UID

8 Data Model (simplified) 3 o Job X can now be run to analyze some virtual data products......

9 Data Model (simplified) 4 o On the basis of the job X output, a new algorithm G is defined and registered, this creates new virtual data products......

10 Data Model (simplified) 5 o Data analysis is a cyclic effort of defining and analyzing products o Uploaded and virtual data products are read-only! o Virtual data products are small (1 KB -1 MB, <<files) o >> 10^15 defined, ~10^10 requested in a year o ,000 jobs per day......

11 Job Model (simplified) o Jobs are the way by which physicists get work done o Job = Data product request set (by product name) + Code o Grid needs to execute code and feed product values to it o Code is parallel, distributed program o Job output goes outside the grid......

12 Main research challenges o Main challenges to support long-term 2006 CMS needs: o Scalability (baseline is 31 tier 0-2 sites with 10,000 CPUs) o Fault tolerance (do not have much operator manpower!) o Support granularity of our data model o >> 10^15 products defined, 10^10 requested in a year o Fixed pre-determined mapping of 10^15 virtual products to 10^7(?) virtual files will not work! o Materialization and access requests will be too sparse with any conceivable fixed product-file mapping o So a flexible mapping from products to files is needed (SC2000 demo) o Every one of >>10^15 products needs to have independent materialization status o Overall research challenge: identify common grid components between experiments

13 Shorter-term ideas 1 o To support shorter term CMS needs and build towards 2006 o To automate CMS production more: GriPhyN involvement to build top-level layers of CMS production system in terms of virtual data. o Production use in 1-2 years on a few US CMS sites o Virtual data product = set of ~20 files with total size ~1-10 GB o Production = requests to materialize these products and store them in an archive CMSIM ORCA Digis ORCA Hits CMSIM ORCA Digis ORCA Hits CMSIM ORCA Digis ORCA Hits CMSIM ORCA Digis ORCA Hits

14 Shorter-term ideas 2 o The support data access by many more physicists: GriPhyN involvement to build a fine-granularity data replication tool in terms of the CMS data model o 2-D data model (events X event related objects) o Physicists select some objects for some event set and have these replicated to a more local installation for further interactive analysis o This activity could drive the R&D on developing fine-granularity virtual data services (>>10^15 virtual products!) o We believe that this 2-D data model would also fit ATLAS, SDSS, maybe even Ligo data o SC2000 demo... We would like to collaborate with others in developing this as a production service

15 Conclusions o Initial reaction to architecture documents out of GriPhyN and EU Datagrid: o Need for compatibility and dividing the work between the grid projects is obvious o Lot of work to be done in defining higher-level grid components that encapsulate common experiment needs o The requirements process is ongoing  Current draft document ` CMS virtual data requirements'’ available from o Need to decide what first prototype will do