Summary of the Persistency RTAG Report (heavily based on David Malon’s slides) - and some personal remarks Dirk Düllmann CCS Meeting,

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Database Planning, Design, and Administration
Ch:8 Design Concepts S.W Design should have following quality attribute: Functionality Usability Reliability Performance Supportability (extensibility,
OASIS Reference Model for Service Oriented Architecture 1.0
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
Fundamentals of Information Systems, Second Edition
Lecture Nine Database Planning, Design, and Administration
POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA.
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Developing Enterprise Architecture
SEAL V1 Status 12 February 2003 P. Mato / CERN Shared Environment for Applications at LHC.
Persistence Technology and I/O Framework Evolution Planning David Malon Argonne National Laboratory 18 July 2011.
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
LCG Persistency Workshop, June 5th 2002
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
Connector Types Interaction services broadly categorize connectors Many details are left unexplained. They fail to provide enough detail to be used in.
Chapter 2 The process Process, Methods, and Tools
Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design.
REVIEW OF NA61 SOFTWRE UPGRADE PROPOSAL. Mandate The NA61 experiment is contemplating to rewrite its fortran software in modern technology and are requesting.
DPS May 11/2002 USCMS CMS-CCS Status and Plans May 11, 2002 USCMS meeting David Stickland.
Database Planning, Design, and Administration Transparencies
Assessing the influence on processes when evolving the software architecture By Larsson S, Wall A, Wallin P Parul Patel.
L8 - March 28, 2006copyright Thomas Pole , all rights reserved 1 Lecture 8: Software Asset Management and Text Ch. 5: Software Factories, (Review)
CHEP 2003 March 22-28, 2003 POOL Data Storage, Cache and Conversion Mechanism Motivation Data access Generic model Experience & Conclusions D.Düllmann,
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
ROOT Application Area Internal Review September 2006.
Lesson Overview 3.1 Components of the DBMS 3.1 Components of the DBMS 3.2 Components of The Database Application 3.2 Components of The Database Application.
An RTAG View of Event Collections, and Early Implementations David Malon ATLAS Database Group LHC Persistence Workshop 5 June 2002.
Database Environment Chapter 2. Data Independence Sometimes the way data are physically organized depends on the requirements of the application. Result:
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
POOL Status and Plans Dirk Düllmann IT-DB & LCG-POOL Application Area Meeting 10 th March 2004.
SEAL: Common Core Libraries and Services for LHC Applications CHEP’03, March 24-28, 2003 La Jolla, California J. Generowicz/CERN, M. Marino/LBNL, P. Mato/CERN,
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
SEAL Core Libraries and Services CLHEP Workshop 28 January 2003 P. Mato / CERN Shared Environment for Applications at LHC.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
SEAL Project Core Libraries and Services 18 December 2002 P. Mato / CERN Shared Environment for Applications at LHC.
The POOL Persistency Framework POOL Project Review Introduction & Overview Dirk Düllmann, IT-DB & LCG-POOL LCG Application Area Internal Review October.
7/6/2004 CMS weekZhen Xie 1 POOL RDBMS abstraction layer status & plan Zhen Xie Princeton University.
Connecting with Computer Science2 Objectives Learn how software engineering is used to create applications Learn some of the different software engineering.
STAR Schema Evolution Implementation in ROOT I/O V. Perevoztchikov Brookhaven National Laboratory,USA.
23/2/2000Status of GAUDI 1 P. Mato / CERN Computing meeting, LHCb Week 23 February 2000.
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Pool Project and ROOT I/O Dirk Duellmann What is Pool? Component Breakdown Status and Plans.
Some Ideas for a Revised Requirement List Dirk Duellmann.
Rational Unified Process (RUP)
Data Management Overview David M. Malon Argonne U.S. ATLAS Physics and Computing Project Advisory Panel Meeting Berkeley, CA November 2002.
Data Management Overview David M. Malon Argonne U.S. LHC Computing Review Berkeley, CA January 2003.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Data Management Overview David M. Malon Argonne NSF/DOE Review of U.S. ATLAS Physics and Computing Project NSF Headquarters 20 June 2002.
12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.
Basic Concepts and Definitions
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Dictionary and POOL Dirk Duellmann.
Some Notes on Logical File Names and Related Interfaces David Malon ATLAS Database Group LHC Persistence Workshop 5 June 2002.
Summary of persistence discussions with LHCb and LCG/IT POOL team David Malon Argonne National Laboratory Joint ATLAS, LHCb, LCG/IT meeting.
G.Govi CERN/IT-DB 1GridPP7 June30 - July 2, 2003 Data Storage with the POOL persistency framework Motivation Strategy Storage model Storage operation Summary.
D. Duellmann - IT/DB LCG - POOL Project1 Internal Pool Release V0.2 Dirk Duellmann.
Follow-up to SFT Review (2009/2010) Priorities and Organization for 2011 and 2012.
D. Duellmann, IT-DB POOL Status1 POOL Persistency Framework - Status after a first year of development Dirk Düllmann, IT-DB.
12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.
Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.
POOL Based CMS Framework Bill Tanenbaum US-CMS/Fermilab 04/June/2003.
David Adams ATLAS Hybrid Event Store Integration with Athena/StoreGate David Adams BNL March 5, 2002 ATLAS Software Week Event Data Model and Detector.
LCG Applications Area Milestones
(on behalf of the POOL team)
More Interfaces Workplan
POOL persistency framework for LHC
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Presentation transcript:

Summary of the Persistency RTAG Report (heavily based on David Malon’s slides) - and some personal remarks Dirk Düllmann CCS Meeting, April 18 th 2002

SC2 mandate to the RTAG n Write the product specification for the Persistency Framework for Physics Applications at LHC n Construct a component breakdown for the management of all types of LHC data n Identify the responsibilities of Experiment Frameworks, existing products (such as ROOT) and as yet to be developed products n Develop requirements/use cases to specify (at least) the metadata /navigation component(s) n Estimate resources (manpower) needed to prototype missing components

Guidance from the SC2 n The RTAG may decide to address all types of data, or may decide to postpone some topics for other RTAGS, once the components have been identified. n The RTAG should develop a detailed description at least for the event data management. n Issues of schema evolution, dictionary construction and storage, object and data models should be addressed.

RTAG Composition n One member from each experiment, one from IT/DB, one from ROOT team: –Fons Rademakers (Alice) –David Malon (ATLAS) –Vincenzo Innocente (CMS) –Pere Mato (LHCb) –Dirk Düllmann (IT/DB) –Rene Brun (ROOT) Quoting Vincenzo’s report at CMS Week (6 March 02) “Collaborative, friendly atmosphere” “Real effort to define a common product” This is already an accomplishment.

Response of RTAG to mandate and guidance (excerpted from report) n Intent of this RTAG is to assume an optimistic posture regarding the potential for commonality among the LHC experiments in all areas related to data management n Limited time available to the RTAG precludes treatment of all components of a data management architecture at equal depth –will propose areas in which further work, and perhaps additional RTAGs, will be needed

Response of RTAG to mandate and guidance (excerpted from report) n Consonant with SC2 guidance, the RTAG has chosen to focus its initial discussions on the architecture of a persistence management service based upon a common streaming layer, and on the associated services needed to support it –Even if we cannot accomplish everything we aspire to, we want to ensure that we have provided a solid foundation for a near- term common project n While our aim is to define components and their interactions in terms of abstract interfaces that any implementation must respect, it is not our intention to produce a design that requires a clean-slate implementation

Response of RTAG to mandate and guidance (excerpted from report) n For the streaming layer and related services, we plan to provide a foundation for an initial common project that can be based upon the capabilities of existing implementations, and upon ROOT’s I/O capabilities in particular n While new capabilities required of an initial implementation should not be daunting, we do not wish at this point to underestimate the amount of repackaging and refactoring work required to support common project requirements

Status n Reasonable agreement on design criteria, e.g., –Component oriented, communication through abstract interfaces, no back channels, components make no assumptions about implementation technology of components with which they communicate –Persistence for C++ data models is the principal target, but our environments are already multilingual; should avoid constructions that make language migration and multi-language support difficult –Architecture should not preclude multiple persistence technologies –Experiments’ transient data models should not need compile- time/link-time dependencies on persistence technology in order to use persistence services

Status II n Reasonable agreement on design criteria, e.g., –Transient object types may have several persistent representations, the type of a transient object restored from a persistent one may be different than the type of the object that was saved, a persistent object cannot assume it “knows” what type of transient object will be built from it –…more… n Component discussions and requirement discussions have been uneven—extremely detailed and highly technical in some areas, with other areas neglected thus far for lack of time n Primary focus has been on issues and components involved in defining a common persistence service –Cache manager, persistence manager, storage manager, streamer service, placement service, dictionary service(s), … –Object identification, navigation, …

The proposed initial project n Charge to the initial project is to deliver the components of a common file-based streaming layer sufficient to support persistence for all four experiments’ event models, with management of the resulting files hosted in a relational layer –Elaboration of what this means appears in the report –Note that persistence service is intended to support all kinds of data—it is not specific to event data

Initial project components n The specification in the report describes agreed- upon common project components, including –persistence manager –placement service –streaming service –storage manager –externalizable technology-independent references –services to support event collections –connections to grid-provided replica management and LFN/PFN services

Initial project implementation technologies n Streaming layer should be implemented using the ROOT framework’s I/O services n Components with relational implementations should make no deep assumptions about the underlying technology –Nothing intentionally proposed that precludes implementation using such open source products as MySQL

RTAG’s First Component Diagram (under discussion)

Persistence Manager n Principal point of contact between experiment- specific frameworks and persistence services n Handles requests to store state of an object, returning a token (e.g., a persistent address) n Handles requests to retrieve state of an object corresponding to a token n Like Gaudi/Athena persistence service

Dictionary services n Manages descriptions of transient (persistence-capable) classes and their persistent representations n Interface to obtain reflective interface about classes n Entries in dictionary may be generated by disparate sources –Rootcint, ADL, LHCb XML, … n With automatic converter/streamer generation, likely that some persistent representations will be derivable from transient representation, but possibility of multiple persistent representations suggests separate dictionaries

Streaming or conversion service n Consults transient and persistent representation dictionaries to produce persistent (or “persistence-ready”) representation of a transient object, or vice versa

Placement service n Supports runtime control of physical placement, equivalent to “where” hints in ODMG n Intended to support physical clustering within in event, and to separate events written to different physics streams n Interface seen by experiments’ application frameworks is independent of persistence technology n Insufficiently specified in the RTAG report

References and reference services n Common class for encapsulating persistent addresses n Externalizable n Independent of storage technology, likely with an opaque payload that is technology-specific n In hybrid model, intent is that from a Ref, one can determine what “file” is needed without consulting the particular storage technology

Store manager n Stores and retrieves variable-length stream of bytes n Deals with issues at the file level

Ref  LFN service n Translates an Object Reference into the logical filename of the file containing the referenced objects n Expected that Ref will have some kind of file id, that can be used to determine logical file name in the grid sense

LFN  {PFN  }PosixName service n Translate a logical filename into the posix name of one physical replica of this file n Expect to get this from grid projects, though common project may need to deliver a service that hides several possible paths behind a single interface

Event collection services n Support for explicit event collections (not just collections by containment) –Support for collections of event references n Queryable collections: Like a list of events, together with queryable tags –Possibly indexed

MetaData Catalog DictionarySvcStreamerSvc PersistencyMgr IReflection StreamerSvcDictionarySvcStorageMgr IPReflection FileCatalog ICnv IReadWrite C++ CacheMgr ICache TFile, TDirectory TSocket TClass, etc. TBuffer, TMessage, TRef, TKey TGrid TTree TStreamerInfo IteratorSvc TChain TEventList TDSet IPers IFCatalog SelectorSvc IMCatalog PlacementSvc IPlacement TFile CustomCacheMgr IPers One possible mapping to a ROOT implementation (under discussion)

Clarify Resources, Responsibilities & Risks n Expected resources at project start and their evolution –Commitments from experiments, the ROOT team and IT n Who does what (and until when)? –Who develops which software component(s)? –Who maintains those components afterwards? –Who develops production services around those? n What is the procedure for dropping any of those services? –Any “in house” development involves a significant maintenance commitment by somebody and risk for somebody else –Need to agree on these commitments

Components & Interfaces n Need to rapidly agree on concrete component interfaces –Could we classify/prioritize interfaces by risk? n eg by damage caused by a later interface change –At least the major (aka external) interfaces need the official blessing of the Architects’ Forum and can not be modified without it’s agreement n No component infrastructure defined so far n Component inheritance hierarchy, component factories, component mapping to shared libraries etc. n LCG approach needs to be “compatible” with –several LCG sub-projects –several different experiment frameworks –several existing HEP packages eg ROOT I/O –several RDBMS implementations n May need to assume some instability until solid foundation is accepted for LCG applications area

Components & Interface cont. n Fast adoption of existing “interface” classes may be tempting but is also very risky –Should not just bless existing header files which were conceived as implementation headers –Should take the opportunity to (re-) design a minimal, but complete set of abstract interfaces –And then implement them using existing technology

Timescales, Functionality & Technologies of an Initial Prototype n Experiments have somewhat differing timescales for their first use of LCG components –Synchronization of initial release schedule would definitely improve chances of success n Experiments may favor different subsets of full functionality for a first prototype –Need to agree on main requirements for prototype s/w and associated services to guide implementation and technology choices –Synchronization of feature content and implementation technology is required n Which RDBMS backend? What are the deployment requirement? –“lightweight” system (end-user managed) - maybe reduced requirements on scalability and fault tolerance and even on functionality –Fully managed production system - based on established database services (incl. backup, recovery from h/w fault …) n May need prototype implementation for both

Summary n Persistency RTAG has delivered its final report to the SC2 –Significant agreement on requirements and a component breakdown have been achieved –Report does not define all components and their interaction in full or equal depth –A common project on object persistency is proposed n Possible next steps –Clarify available resources and responsibilities –Agree on scope, timescale and deployment model of first project prototype –Rapidly agree on concrete set of component interfaces and spawn work packages to implement them –Continue to resolve remaining open questions