Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oracle A Role in LHC Data Handling? Jamie Shiers, IT-DB Based on work with early releases of Oracle 9i by IT-DB + experiments.

Similar presentations


Presentation on theme: "Oracle A Role in LHC Data Handling? Jamie Shiers, IT-DB Based on work with early releases of Oracle 9i by IT-DB + experiments."— Presentation transcript:

1 Oracle A Role in LHC Data Handling? Jamie Shiers, IT-DB Based on work with early releases of Oracle 9i by IT-DB + experiments

2 The Story So Far… 1992: CHEP – DB panel, CLHEP K/O, CVS … 1994: start of OO projects 1997: proposal of ODBMS+MSS; BaBar 2001: CMS change of baseline Objy 2003: POOL ready for production –POOL production plans include use of EDG Replica Location Service –Deployment on Oracle 9iAS+9iRAC/DB at Tier0/1?

3 ODBMS BaBar (SLAC) claim probably largest DB in the world –681.8 TB stored in 473205 files CERN 300TB (COMPASS) + HARP (30TB) + CMS (300TB) + others Recently migrated 300TB@120MB/s out of ODBMS –Oracle + ‘flat file’ solution –Many high level similarities to LHC proposal –Time pressure required pragmatic solution – could not wait for POOL

4 Migration History - Data Rates http://lxshare075d:8888/

5 Data Processing Diagram LOG ORACLE Input disk pools 2x200GB Castor 9940 9940B Output disk pool Processing Node 10 MB/s overall data throughput per node Sustained rates of 120MB/s over 24 hour periods

6 Oracle for LHC? Numerous concrete examples for non- Physics data Machine construction / controls Detector construction / assembly Physics infrastructure (book-keeping, catalogues etc.)

7 Oracle & LHC – What is ~Clear Will continue to be used as part of EDMS service Will continue be used “à la LEP” for logging, monitoring, control of LHC Will continue to be used for detector construction / assembly / monitoring Total Data: ~10TB 2 nd Sun cluster for physics apps ~300GB disk –Growing maybe to 10TB by LHC startup

8 Oracle & LHC – What is Likely Will continue to be used as part of EDMS service Will continue be used “à la LEP” for logging, monitoring, control of LHC Will continue to be used for detector construction / assembly / monitoring Total Data: ~10TB 2 nd Sun cluster for physics apps ~300GB disk CERN Engineering Data Management System ~300,000 documents, many related to LHC construction

9 Oracle Usage: Some Examples Detector DB Conditions DB Run Catalogues The Grid Details in hidden slides

10 Detector DB Example: ALICE

11 Alice Detector Construction

12 Wiktor Peryt & Tomasz Traczyk in collaboration with Piotr Mazan, Dominik Tukendorf, Piotr Szarwas, Michal Janik, Dawid Jarosz, Bartek Pawlowski, Jacek Wojcieszuk Warsaw University of Technology Alice Detector DB Satellite databases –source data produced at laboratories delivered by manufacturers –working copies of data from central repository –partial copies of metadata (R/O) Central database@ CERN –central inventory of components –copies of data from laboratories –metadata, e.g. dictionaries Central database Satellite databases

13 DBMS Choice http://cern.ch/hep-proj-database/db_compar.htm Central database –Oracle RDBMS –Advantages –support for transactions –built-in procedural language –triggers –complex data types and BLOBS –support for VLDB, e.g. data partitioning –7 × 24 availability (on-line backup, etc.) –Disadvantages –quite expensive –complex and difficult to administer Satellite databases PostgreSQL Advantages  free of charge  quite easy to administer  support for transaction processing  built-in procedural language  triggers  support for complex data types and BLOB objects Disadvantages  not very fast  no support for data replication,  no support for heterogeneous systems  no support for VLDB

14 Requirements Access to Oracle server for testing and implementation of file catalogue portability and performance

15 Conditions DB Example: Re-implementation of “ex-BaBar” Objectivity/DB- based Conditions DB on Oracle

16 Conditions DB Overview Implementation of agreed ConditionDB interface, minimal changes for client code Changes in connect string DB Features:  Materialized views for frequently accessed data  Views for folder paths and folder sets paths  View for intervals that forms current head  Use of indexes  PL/SQL Stored Procedures  OCCI for Client  A better OCI (no object features; no object navigation)

17 Relational Design (ERD) Folders, Folder Sets and Condition Objects Folder # folder_id * name o description o attributes r parent_set_id Folder_set # folder_set_id * name o description o attributes r parent_set_id Condition_object # object_id * since * till * insertion_time * layer o description r data_id r folder_id Condition_data # data_id o data_value Possible data relation Necessary data relation One to many relation Foreign key is a part of primary key for that table # Attribute is a part of primary key * Attribute cannot be null o Null value allowed for this attribute r Attribute is a foreign key u Attribute is a part of Unique constraint

18 Relational Design (ERD) Tags Folder # folder_id * name o description o attributes r parent_set_id Folder_set # folder_set_id * name o description o attributes r parent_set_id Tag # tag_id u name * creation_time o description Folder_tag * assignment_time #r tag_id #r folder_id Condition_object # object_id * since * till * insertion_time * layer o description r folder_id r data_id Object_tag * assignment_time #r tag_id #r object_id

19 Run Catalogue Examples: ALICE, LHCb

20 AliEn Architecture AliEn in brief: File catalogue built on top of SQL DBMS with user interface that mimics the file system Authentication module which supports various authentication methods Task queue which holds commands to be executed in the system (commands, inputs and outputs are all registered in catalogue) Metadata catalogue Services that support above components C/C++/perl API DBD/DBI interface to DBMS 100% perl5 (95% reusable opens source modules) “Super” file system, batch queue, … but simple and consistent user interface

21 File catalogue ALICE USERS ALICE SIM Tier1 ALICE LOCAL |--./ | |--cern.ch/ | | |--user/ | | | |--a/ | | | | |--admin/ | | | | | | | | | |--aliprod/ | | | | | |--f/ | | | | |--fca/ | | | | | |--p/ | | | | |--psaiz/ | | | | | |--as/ | | | | | | | | |--dos/ | | | | | | | | |--local/ |--simulation/ | |--2001-01/ | | |--V3.05/ | | | |--Config.C | | | |--grun.C | |--36/ | | |--stderr | | |--stdin | | |--stdout | | |--37/ | | |--stderr | | |--stdin | | |--stdout | | |--38/ | | |--stderr | | |--stdin | | |--stdout | | | | | |--b/ | | | | |--barbera/ Files, commands (job specification) as well as job input and output and tags are stored in the catalogue

22 File organization [tbed0007d.cern.ch] /alice/simulation/2001-02/V3.06/00001/ > tree |--./ | |--00001/ | | |--galice.root | | |--00002/ | | |--galice.root | | ….. | |--Config.C | |--grun.C [tbed0007d.cern.ch] /proc/33608/ > tree |--./ | |--stderr | |--stdin | |--stdout Forgotten wisdom: by organizing files into directory structure one can already tell a lot about file content, define cleanup and access policy and optimize access performance

23 Tags The file catalogue on its own does not know anything about file content It is possible to add an additional information to describe file properties (metadata) In AliEn environment this can be achieved by attaching an arbitrary number of TAG table(s) to the corresponding directory table --./ | |--r3418_01-01.ds | |--r3418_02-02.ds | |--r3418_03-03.ds | |--r3418_04-04.ds | |--r3418_05-05.ds | |--r3418_06-06.ds | |--r3418_07-07.ds | |--r3418_08-08.ds | |--r3418_09-09.ds | |--r3418_10-10.ds | |--r3418_11-11.ds | |--r3418_12-12.ds | |--r3418_13-13.ds | |--r3418_14-14.ds | |--r3418_15-15.ds lfn://alien.cern.ch/alice/simulation/2001%/V3.05/%/galice.root?npart>1000#mytag The search will first select all tables on the basis of the file name selection and then locates all tables that correspond to “mytag” definition, apply selection and finally return only the list of files for which the attribute search has been successful D0 path dir hostIndex entryId char(255) integer(11) T2526 type dir name owner ctime comment content method methodArg gowner size char(4) integer(8) char(64) char(8) char(16) char(80) char(255) char(20) char(255) char(8) integer(11) T2527 type dir name owner ctime comment content method methodArg gowner size char(4) integer(8) char(64) char(8) char(16) char(80) char(255) char(20) char(255) char(8) integer(11)

24 The Grid Example: POOL file catalogue (based on EDG-RLS)

25 POOL File Catalogue Require ~10 6 entries / expt now Rising to ~10 8 / 10 9 (?) in 2008 / 2020 A few KB / entry; a few TB total Implementation based on EDG-RLS Deployed at Tier0/Tier1 on –Oracle 9iAS / Oracle9iRAC Have to demonstrate it can meet requirements (# concurrent users / transaction rate / manageability / cost of ownership) Fall back: 9iAS + non-RAC (Tomcat/MySQL at Tier2/3) Open question about event-level meta-data –COMPASS / HARP 100-200bytes/event –LEP “collaboration Ntuple” 200 columns = 1KB/event Could result in 100TB – 1PB data volumes

26 Oracle for Physics Data Focus on scalability issues: Current Very Large Database (VLDB) market in 1-50TB Can we really extend by 3 orders of magnitude?

27 Oracle for Physics Data Key Issues Complexity of data Oracle’s support for Objects? C++ binding (OCCI) Volume of dataVolume of data –Several hundred PB Oracle 9i technologies: –VLDB support –9iRAC

28 Oracle for Physics Data Key Issues  Complexity of data  Oracle’s support for Objects?  C++ binding  Oracle C++ Call Interface (OCCI)  Object Type Translator (OTT) Volume of dataVolume of data Oracle 9i technologies

29 OCCI / OTT Can handle HEP data models Define data model using SQL Generate C++ definitions & code using OTT Add user attributes & code in classes that inherit from generated ones Tested for a variety of non-trivial data models –Objects embedded by value and/or reference –Arrays of … –Polymorphic tables –Templated transient classes with multiple inheritance on the transient side

30 Oracle for Physics Data Key Issues Complexity of data Extensive use of Oracle’s support for Objects C++ binding (OCCI) Volume of data –Several hundred PB Oracle 9i technologies: –VLDB support –9iRAC

31 RAWRAW ESDESD AODAOD TAG random seq. 1PB/yr (1PB/s prior to reduction!) 100TB/yr 10TB/yr 1TB/yr Data Users Tier0 Tier1

32 LHC Data Volumes Data CategoryAnnual Total RAW1-3PB10-30PB Event Summary Data - ESD 100-500TB 1-5PB Analysis Object Data - AOD 10TB100TB TAG 1TB 10TB Total per experiment ~4PB~40PB Grand totals (15 years)~16PB~250PB Data CategoryAnnual Total RAW1-3PB10-30PB Event Summary Data - ESD 100-500TB 1-5PB Analysis Object Data - AOD 10TB100TB TAG 1TB 10TB Total per experiment ~4PB~40PB Grand totals (15 years)~16PB~250PB

33 Divide & Conquer  Split data from different experiments  Split different data categories –Different schema, users, access patterns,… Focus on mainstream technologies & low-risk solutions VLDB target: 100TB databases  How do we build 100TB databases?  How do we use 100TB databases to solve 100PB problem?

34 Why 100TB DBs? Possible today  Vendors must provide support  (See also hidden slides)  Expected to be mainstream within a few years

35 Decision Support (2000) *Database size = sum of user data + summaries and aggregates + indexes TerraSystemsSiemensInformix2.32Telecom Italia (DA) EMCSunOracle2.47NetZero EMCHPOracle2.54SK C&C LSINCR 2.83AT & T EMCNCR 3.08Office Depot EMCNCR 3.70FedEx Services HitachiIBM 3.71Telecom Italia (DWPT) EMCAmdahlProprietary4.25Dialog EMCIBMInformix4.50First Union Nat. Bank LSINCR 10.50SBC Storage Partner Server PartnerDBMS Partner DB Size* (TB) Company

36 Size of the Largest RDBMS in Commercial Use for DSS Source: Database Scalability Program 2000 Terabytes 3 50 100 199620002005 Projected By Respondents

37 BT Visit – July 2001 Oracle VLDB site: Enormous Proof of Concept test in 1999 –80TB disk, 40TB mirrored, 37TB usable –Performed using Oracle 8i, EMC storage –“Single instance” – i.e. not cluster Same techniques as being used at CERN Demonstrated > 3 years ago! No concerns for building 100TB today!

38 CERN DB Deployment Currently run 1-3TB / server –Dual processor Intel/Linux –Scale to ~10TB in a few years sounds plausible 10-node cluster: 100TB –~100 disks in 2005! Can we achieve close to linear scalability? –Fortunately, our data is write-once, read- many Should be good match for 9iRAC!

39 Oracle for Physics Data Key Issues Complexity of data Extensive use of Oracle’s support for Objects C++ binding (OCCI) Volume of dataVolume of data –Several hundred PB Oracle 9i technologies:Oracle 9i technologies: –9iRAC –VLDB support

40 Potential Benefits of 9iRAC Scalability –Allows 100TB databases to be supported using commodity h/w: Intel/Linux server nodes Manageability –Small number of RAC manageable with foreseeable resources: tens – hundreds of smaller single instances not Better Resource Utilization –Shared disk architecture avoids hot-spots and idle / overworked nodes –Shared cache improves performance for frequently accessed read-only data

41 LHC Data Volumes Data CategoryAnnual Total RAW1-3PB10-30PB Event Summary Data - ESD 100-500TB 1-5PB Analysis Object Data - AOD 10TB100TB TAG 1TB 10TB Total per experiment ~4PB~40PB Grand totals (15 years)~16PB~250PB Data CategoryAnnual Total RAW1-3PB10-30PB Event Summary Data - ESD 100-500TB 1-5PB Analysis Object Data - AOD 10TB100TB TAG 1TB 10TB Total per experiment ~4PB~40PB Grand totals (15 years)~16PB~250PB

42 100TB DBs & LHC Data Analysis data: 100TB ok for ~10 years One 9iRAC per experiment  Intermediate: 100TB ~1 year’s data – ~40 9iRACs  RAW data: 100TB = 1 month’s data –400 9iRACs to handle all RAW data 10 RACs / year, 10 years, 4 experiments

43 RAW Data: a few PB / year Access pattern: sequential Access frequency: ~once per year Use time partitioning + offline tablespaces –Historic data copied to “tape” –“Eventuellement” dropped from DB catalogue –Restored on demand 100TB = 10 day time window –Current data (1 RAC) historic data (2 nd RAC)

44 Partitions & Files: Limits Currently limited to 2 16 –179 years if 1 partition / day –~500TB DBs with ~10GB files Current practical limit is 38,000 files / DB Sufficient to build 100TB DBs Need to be raised at some stage in the future…

45 Event Summary Data (ESD) ~100-500TB / experiment / year Yotta-byte DBs predicted by 2020! –1000,000,000 TB ? Can RAC capabilities grow fast enough to permit just 1 RAC / experiment? –++500TB / year An open question …

46 Oracle Deployment DAQ cluster: current data – no history export tablespaces to RAW cluster to/from MSS ESD cluster: 1/year? 1? AOD/TAG 1 total? to RCs to/from RCs reconstructanalysis

47 VLDB issues Oracle addressing limits of current architecture –Already permits 2EB databases theoretically… Limits on e.g. # files, partitions etc are expected to be significantly increased beyond Oracle 9i –Limited to 2 16 architecturally, 38K measured An area of work, but not concern…

48 Storage Issues Oracle Number format (8 bytes) provides greater precision than IEEE double (22 B) Mapping 1000 classes with numeric data members to Oracle Number requires effort! Solutions being investigated to allow efficient storage of floats / doubles / ints without user specifying precision / range Target: next major Oracle release?

49

50 Oracle & CERN

51 If You Want to Know More… http://cern.ch/LCG/ http://cern.ch/db/ http://cern.ch/hep-proj-database/

52 Summary – Oracle for LHC A clear & important role to play Likely to be used for non-event data Hybrid solution (POOL) is the baseline for physics data RDBMS backend to POOL in progress


Download ppt "Oracle A Role in LHC Data Handling? Jamie Shiers, IT-DB Based on work with early releases of Oracle 9i by IT-DB + experiments."

Similar presentations


Ads by Google