POOL Collections and their Relational Incarnations, with Notes on Performance David M. Malon LCG Applications Area Meeting Geneva, Switzerland 24 September.

POOL Collections and their Relational Incarnations, with Notes on Performance David M. Malon LCG Applications Area Meeting Geneva, Switzerland 24 September 2003

David Malon, ANL LCG Applications Area Meeting 2 Background  Helmut Schmucker gave a nice introduction to POOL collections, MultiCollections, and ROOT collection implementations at an LCG Applications Area meeting last month  http://agenda.cern.ch/fullAgenda.php?ida=a032131  This talk complements Helmut’s, with a discussion  of collections philosophy,  of how collections fit into an overall computing model,  of the status and future of relational collection implementations,  …with a few performance observations

24 September 2003 David Malon, ANL LCG Applications Area Meeting 3 Credits and disclaimers  Many people have contributed to collections work in some way at some level  Steve Eckmann*, Chris Lain*, Kristo Karr, Sasha Vaniachine  Helmut Schmucker, Kuba Moscicki, Ioannis Papadopoulos  Julius Hrivnac* (JAVA prototyping)  Others at lower effort levels (apologies for any conspicuous omissions)  *not currently active in POOL  I have contributed no code: I am talking today about other people’s work  …and speaking for Kristo, who could not be here

24 September 2003 David Malon, ANL LCG Applications Area Meeting 4 Purpose  Provide a persistent locus for references to saved events  If one can save an event, one should be able to extract a reference to it, externalize it, save it somewhere  Otherwise, only iteration through events in a file (or file list) is possible  Collections are (one place) where these references go  Support direct navigational access to events scattered across many POOL files  Support selection of a subset of events via query predicates on optional associated attribute lists (event-level metadata)  Prototyping: since the POOL hybrid event store has a relational layer, let’s try to use that layer for what relational technologies do well, e.g., SQL query processing, indexing, … N.B.: while event collections are the motivation, the POOL infrastructure (of course) knows nothing about events

24 September 2003 David Malon, ANL LCG Applications Area Meeting 5 Collection statistics from event metadata… (JAS3/AIDA interface courtesy of J. Hrivnac)

24 September 2003 David Malon, ANL LCG Applications Area Meeting 6 Collections and the POOL mainstream  Collections are not the first things one needs when one is building a persistence layer  Some useful work can be done without them: most production being done in LHC experiments today involves generating specific physics samples, then simulating, digitizing, and reconstructing them  A “Read files, write files” model is perfectly appropriate  Not much need for collections by reference, for “tag” databases to support event selection, for extraction of events scattered over many files  This changes (in an ATLAS computing model, anyway) when one considers analysis of data coming from the detector Part of the 2004 ATLAS Data Challenge exercise--testing and validation of a reasonable prototype of an ATLAS computing model, including Tier 0 (prompt) reconstruction

24 September 2003 David Malon, ANL LCG Applications Area Meeting 7 Some ways to query collections  In increasing order of desirability?  User code-based selection  Query export  Possible server-side execution  Pre-selection  Before the job runs, and, ideally, before the job is scheduled

24 September 2003 David Malon, ANL LCG Applications Area Meeting 8 User code-based selection // User iterates over all events Collection ::Iterator iter = srcCollection.select(“"); while(iter.next()){ //user selects by examining attribute list if( mySelection( iter.attributeList() ) ){ //…process event here }  User processing of the attribute list is likely not the model of choice  Perhaps not so different than navigating into the event itself, and one must ask what was gained by “exporting” the event data into event-level metadata  Maybe clever physical clustering of event “tag” objects, part of the event but stored separately from event bulk data, would have achieved the same thing  Nonetheless there may be good reasons for this

24 September 2003 David Malon, ANL LCG Applications Area Meeting 9 Exporting the selection predicate // User exports the selection predicate Collection ::Iterator iter = srcCollection.select(“query predicate goes here"); // User iterates over all qualifying events while(iter.next()){ //…process event here }   Many potential advantages:  Server-side processing  Take advantage of tools optimized for query processing  Indexing of attributes is possible, and may dramatically improve performance  Less data transferred to executing program

24 September 2003 David Malon, ANL LCG Applications Area Meeting 10 Exporting the selection predicate  Of course, not every cut can be described as a boolean combination of attribute range specifications or another kind of SQL-expressible query  “Hybrid” selections may be appropriate, with first-order cuts done via exported query predicates, and second-order cuts via user computations  Possible optimization: When attribute lists are not needed once selections are made (e.g., when a cut can be expressed as an appropriate query predicate), one could return only the token string (Ref)  SELECT token_string rather than SELECT *  Significantly reduced data transfer volumes  Would require hint from user, but it’s easy to implement  we have not given this any thought yet

24 September 2003 David Malon, ANL LCG Applications Area Meeting 11 Pre-selection  Process the query before the computation is run; ideally, before the job is scheduled  Every grid project prefers this approach:  for optimization purposes (job placement, scheduling, …), resource brokers want to determine in advance which files a job will need  If the query can be executed in advance, the precise (reduced) file list can be obtained  Otherwise, resource brokers must assume that every file corresponding to any event in the collection is needed

24 September 2003 David Malon, ANL LCG Applications Area Meeting 12 Aside: determining the files needed  One can today easily build the list of unique POOL fileids associated with a collection by extracting the appropriate substring of the stringified Refs (token_string) SELECT DISTINCT SUBSTRING(token_string,1,41) FROM CollectionName;  This should be simpler/faster when link tables are implemented  (Much) smaller table scan, no substring operations  More on link tables later  For a subsample, SELECT DISTINCT SUBSTRING(token_string,1,41) FROM CollectionName WHERE ;

24 September 2003 David Malon, ANL LCG Applications Area Meeting 13 Notes on provenance  Consider a job J that does analysis of reconstructed events.  The job specification might look like (Gaudi/Athena-like notation) …input specification EventSelector.InputCollection=“BigReconSample”; EventSelector.SelectionCriteria=“ ”; //(equivalent to SQL-like “SELECT * from BigReconSample WHERE ”) ------------------------------------------------------------------------------------------ …job options to describe physics analysis recipe R ------------------------------------------------------------------------------------------ …output specification OutStream.CollectionName = “SmallAnalysisSample”; etc.

24 September 2003 David Malon, ANL LCG Applications Area Meeting 14 Notes on provenance  While it might be correct to say that SmallAnalysisSample = J(BigReconSample), combining the selection and the physics analysis recipe applied by J makes provenance tracking less clear:  Which input events were used to produce SmallAnalysisSample?  The situation is clearer when we partition J into the selection query Q and the analysis recipe R. Then SmallReconSample = Q(BigReconSample) SmallAnalysisSample = R(SmallReconSample) //…so SmallAnalysisSample = R(Q(BigReconSample))

24 September 2003 David Malon, ANL LCG Applications Area Meeting 15 Provenance and collections  Separating the description of the selection from the description of the physics (and the process of selection from the process of physics computation) suggests that there may be value in incarnating the collection Q(BigReconSample), i.e., that a (the) principal use case for queries on collections is to produce new collections: CREATE TABLE SmallReconSample SELECT * FROM BigReconSample WHERE Q;  Note that there is no need in principle to return any iterator or attribute lists to an application in order to create a subcollection.  The answer to the question, “Which input events were used to produce SmallAnalysisSample?” is  SmallReconSample, or, equivalently,  Q(BigReconSample) (virtual data representation)

24 September 2003 David Malon, ANL LCG Applications Area Meeting 16 A model 1.User interacts with collection registry to discover which collections are available, and selects one (possibly via a query on collection-level attributes); maybe more 2.User issues selection query  Gets number of events satisfying query, perhaps number of distinct files needed, …  With this information, many secondary, auxiliary query estimates are possible:  Bytes to be transferred, estimated data delivery time, …  Some projects (e.g., U.S. HENP Grand Challenge project) have done this kind of estimation  This process may be iterative, settling on a query Q 3.The result set (itself an event collection) is extracted/persistifed  And its provenance, i.e., the fact that NewCollection = Q(OldCollection), is recorded  More on extraction/persistification later… 4.File list associated with the specified sample is determined and added to input specification (JDL) for use by Resource Broker/scheduler  …the usual fileid/lfn/pfn machinery is invoked

24 September 2003 David Malon, ANL LCG Applications Area Meeting 17 A model (continued) 5.When job finally runs (perhaps many hours later)  files have been prestaged or are otherwise available  Program sees an iterator over events that satisfy the query  How? Either subcollection has been persistified/extracted, or query must be issued again  One plausible implementation might be to extract subcollection into a ROOT- based explicit collection, so that relational database access from the compute element is not needed 6.Long-term persistence of NewCollection may arguably be optional as long as its provenance (=Q(OldCollection)) is persistent  This is more an optimization and policy question than an architectural one  …usual comments about virtual data go here…

24 September 2003 David Malon, ANL LCG Applications Area Meeting 18 Some MySQL performance measurements  Caveat: I am not a MySQL expert, and I am reporting results second- hand  Measurements are principally thanks to Kristo Karr  Plus previous measurements by Helmut  Environment was uncontrolled—client execution on lxplus nodes, mysql server on lxshare070d, no exclusive access arrangements  Performance measurements are observations (often single samples), not benchmarks—no attempt to alter/optimize the code to improve MySQL performance  No data transfer or I/O measurements

24 September 2003 David Malon, ANL LCG Applications Area Meeting 19 Some MySQL performance-related observations  Space overhead: InnoDB table types (needed for transaction support) seem to require >45% additional space compared to MyISAM (recommended for nontransactional access)  Example: 10**6 events, 100 attributes—inherent size (100x4 bytes + sizeof(token_string)) x 10**6 = 556,000,000 bytes  569 000 000 MyISAM (type CHAR(156) for token_string)  588 000 000 MyISAM (type TEXT for token_string)  864 026 624 InnoDB (type TEXT for token_string)  (numbers cited are from “show table status;”)  MySQL transfer protocol is character-based: ints and floats are converted to characters and back  Extra processing, and more data transferred  N.B.: MySQL 4.1.x will have a binary transfer protocol  Have not experimented with any MySQL compression options for storage or transfer  Server operations that require full table scans (no indexing) behave as though they read at a relatively consistent sustained effective rate of 30-50 MB/second  “Effective rate” depends upon whether one views the InnoDB table size as the “inherent” size, or as “inherent” size plus InnoDB space overhead

24 September 2003 David Malon, ANL LCG Applications Area Meeting 20 Ref space requirements, and link tables  POOL tokens are externalized as strings: [DB=DAF83A9C-9DC5-D711-9201-000347F31C25][CNT=helm_Root_1000000_100_Container] [CLID=4E1F4DBB-1973-1974-1999-204F37331A01][TECH=00000202][OID=00000003-000CFA0F]   MySQL storage type is TEXT   More than 25% of the space in MySQL collections with 100 int/float attributes is occupied by the token string  (not using compression)   Collections could use “short refs” as in the storage service by replacing DB and CNT by a key into a separate table that records distinct {DB,CNT} pairs  This is what is meant by a link table—next task after performance tests  Substantial space savings should be possible  Note only CNT field has variable length   Other optimizations may be possible:  For typesafe collections, can CLID be an attribute of the collection itself?  TECH is likely to be unique, or selected from a very small set

24 September 2003 David Malon, ANL LCG Applications Area Meeting 21 Helmut’s tests

24 September 2003 David Malon, ANL LCG Applications Area Meeting 22 Comments on relational measurements  The FetchAll tests for 1000000 events presumably had difficulty due to caching—FetchAll tries to get the entire result set  Maybe well over 1 GB to be transferred and cached, depending on amount of data expansion in character-based transfer protocol  The number that looks impressively bad is 1924 seconds to read 100 attributes for 1000000 events  Presumably amount of data transferred is same as for the type check (241 seconds), so amount of time to build attribute list from a fetched row is ~2ms(!)

24 September 2003 David Malon, ANL LCG Applications Area Meeting 23 Building the attribute list  Some optimization is clearly possible in building the attribute list  Lots of string construction, string-based indexing, string conversion  No use of correspondence between i-th element of row and i-th element of attribute list specification  …  Kristo inherited this code, and has not yet tried to improve it

24 September 2003 David Malon, ANL LCG Applications Area Meeting 24 Additional measurements  Following slides are from Kristo’s spreadsheets  You’ll probably need to download them to read them  Due to my ineptitude with MSOffice and getting Excel tables to look good in PowerPoint—sorry  I’ll try to explain some of the numbers anyway  I hope Kristo is on the line (or in VRVS) to explain what I cannot

24 September 2003 David Malon, ANL LCG Applications Area Meeting 25 1,000,000 events: observed times

24 September 2003 David Malon, ANL LCG Applications Area Meeting 26 500,000 events: observed times

24 September 2003 David Malon, ANL LCG Applications Area Meeting 27 100,000 events: observed times

24 September 2003 David Malon, ANL LCG Applications Area Meeting 28 Comments on timings  Results are relatively consistent with Helmut’s  Additional tests added  Empty loop (as baseline for measurements that read attributes, build refs, derefence refs, …)  0.1% selection with indexes  Create new small collection by selection from a large one

24 September 2003 David Malon, ANL LCG Applications Area Meeting 29.1% selection query Time in seconds Vs. Events x 10**5

24 September 2003 David Malon, ANL LCG Applications Area Meeting 30 Comments on timings  Most times (unindexed, anyway) are quite linear in the number of events  Indexing improves performance by about a factor of 100 in practice  C * N/logN?  Building a sub-collection entirely on the server seems to be substantially faster than returning the qualifying events to a client that in turn adds them iteratively to a new collection

24 September 2003 David Malon, ANL LCG Applications Area Meeting 31 Futures?  After link tables and performance improvements…  Utilities for collection aggregation, for building new collections directly from queries, for extracting samples, e.g., a ROOT (sub)collection on my laptop extracted from a master relational collection, the way we might extract a small XML file catalog from a collaboration-wide database-hosted catalog  Collection registry, and collection management utilities?  Wider agreement on requirements and direction for hierarchical (Multi-)collections  Everyone agrees we need them  Utilities to extract file lists corresponding to samples, and perhaps to support (experiment- defined) content restriction  Can we treat our collections like our files, giving them unique ids associated with logical instances that may have one or more physical incarnations on one or more servers, even when they reside in databases  An important topic in ATLAS right now  More…There is plenty of work to do.

POOL Collections and their Relational Incarnations, with Notes on Performance David M. Malon LCG Applications Area Meeting Geneva, Switzerland 24 September.

Similar presentations

Presentation on theme: "POOL Collections and their Relational Incarnations, with Notes on Performance David M. Malon LCG Applications Area Meeting Geneva, Switzerland 24 September."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

POOL Collections and their Relational Incarnations, with Notes on Performance David M. Malon LCG Applications Area Meeting Geneva, Switzerland 24 September.

Similar presentations

Presentation on theme: "POOL Collections and their Relational Incarnations, with Notes on Performance David M. Malon LCG Applications Area Meeting Geneva, Switzerland 24 September."— Presentation transcript:

Similar presentations

About project

Feedback