Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dzero Data Handling and Databases

Similar presentations


Presentation on theme: "Dzero Data Handling and Databases"— Presentation transcript:

1 Dzero Data Handling and Databases
Lee Lueking Dzero Computing Review May 9, 2002

2 outline Overview Data Handling Software Mass Storage System Databases
Conclusions 11/30/2018 Lee Lueking - Dzero

3 Dzero Data Handling and Processing Architecture
Reconstruction Farm Station Enstore Mover Nodes Central Analysis Station STK Silo High Capacity Switch CLuED0 Analysis Cluster Station STK Silo ADIC Tape Robot Of-Site Analysis and Processing Stations Online Data Logger Misc. Analysis Stations Misc. Analysis Stations Misc. Analysis Stations 11/30/2018 Lee Lueking - Dzero

4 The Anatomy of The D0 Data Handling, DataBase, Data Grid
Core Lead: Lee Lueking Maintenance Production operation User support for D0 experiment Add needed D0 features Coordinate remote analysis sites Shift help from D0 experiment Physics Analysis Tools Lead:Wyatt Merritt Luminosity Physics Data Streams Interfaces for D0 framework GRID Lead:Igor Terekhov Evaluate and implement standard Grid components Create and collect use cases Plan and Design Job control, Monitoring, Info system, etc. Coordinate D0GRID and PPDG efforts Database Lead:Ruth Pordes Support and coordination for D0 database applications Includes SAM core, Analysis tools and GRID support 11/30/2018 Lee Lueking - Dzero

5 Data Handling Software

6 Components of a SAM Station
Producers/ /Consumers Project Managers Temp Disk Cache Disk MSS or Other Station MSS or Other Station File Storage Server Station & Cache Manager File Storage Clients File Stager(s) Data flow Control eworkers 11/30/2018 Lee Lueking - Dzero

7 SAM as a Distributed System
Name Server Database Server(s) (Central Database) Global Resource Manager(s) Log server Shared Globally Station 1 Servers Station 3 Servers Local To Site Station n Servers Station 2 Servers Mass Storage System(s) Arrows indicate Control and data flow Shared Locally 11/30/2018 Lee Lueking - Dzero

8 Data to and from Remote Sites
Station Configuration Replica location Prefer Avoid Forwarding File stores can be forwarded through other stations “Routing” (now) Parasitic Stagers on D0mino Working for Imperial, Lancaster, BU, Arizona, Wuppertal, Columbia SAM Station 1 SAM Station 2 SAM Station 6 Remote SAM Station MSS SAM Station 3 SAM Station 5 SAM Station 4 Extra-domain transfers Currently use bbftp (parallel transfer protocol) 11/30/2018 Lee Lueking - Dzero

9 Dzero SAM Deployment Map
Processing Center Analysis site 11/30/2018 Lee Lueking - Dzero

10 Future SAM and D0 Grid The station cache manager and File storage Server components will be merged to improve functionality of cache management and streamline routing behavior. Transition toward less centralized operation in which stations have site autonomy. The station can loose contact with the central db server and still function. Distributed naming service too. There will be a natural progression to “standard grid middleware” components. Current Grid work involves understanding job scheduling components with Condor-G, and the use of Globus tools including Grid Security Infrastructure (GSI), Grid Resource Allocation and Management (GRAM), Meta Directory Service (MDS), and GridFTP. Grid work done in conjunction with Particle Physics Data Grid (PPDG) collaboration. 11/30/2018 Lee Lueking - Dzero

11 SAM as Part of The Grid Monitoring and Job Management
Information Service Job Management JDL Job Request Broker Condor MMS Resource Info MDS Condor Class Ads Job Submission Condor-G Site Gatekeeper Data Handling GRAM DH Resource Manager Local batch system SAM Condor 11/30/2018 Lee Lueking - Dzero

12 Data Handling, Data Grid, Analysis Tools Manpower
Work Area FTE Description SAM Core 4 Maintenance and improvements Operations 2-3 Ongoing operations (incl d0 SAM shifters) Data Grid 3-4 Research and development Analysis Tools 3 Design and development 11/30/2018 Lee Lueking - Dzero

13 Mass Storage System

14 Enstore System and ISD Enstore is the principal MSS used with SAM.
Integrated Systems Dept. (ISD) responsible for maintenance and upgrades to the system. ISD also responsible for FNAL MSS hardware including robotics, drives and tapes. Security enhancements are planned both at the admin and file transfer levels. Modernize code as needs and practices evolve. Continue to investigate replacing tape with disk as the technology becomes cost effective and maintainable. 11/30/2018 Lee Lueking - Dzero

15 dCache ISD is working with DESY to provide a disk cache buffering system called dCache. The system “front-ends” the current tape systems and is consistent with our computing model. Dzero will benefit from this in a couple of obvious ways: An interface to standard file transfer protocol (like GridFTP) will reduce the need to rout data through FNAL stations to remote sites. The intermediate cache will reduce access to tape for “hot” data coming from online and/or reco farms. For small cost, should represent significant performance improvements. 11/30/2018 Lee Lueking - Dzero

16 Tape Robot Enstore Data Rates
450 GB/day 1.6 TB/day ADIC Robot/LTO Drives STK Robot/ 9940 Drives 11/30/2018 Lee Lueking - Dzero

17 Summary of Tape Technologies
Type Number written Bad tapes Total stored Comment 9940 795 1 (file) 43 TB Problem files recovered LTO 198 17 TB m2 567 dozens 26 TB Retired m1 1050 14 TB 11/30/2018 Lee Lueking - Dzero

18 Data Stored in SAM last 12 Months
Number of Files 450,000 Data Size 100 TB Number of Events 400M (~120M RAW) 11/30/2018 Lee Lueking - Dzero

19 Storage Roadmap Current tape costs are about $1/GB and Commodity off the Shelf (COTS) IDE disk about $2/GB. Price trends indicate that tape will drop in price/GB by a factor of 2 every two years. COTS disk prices will reduce by a factor of 2 every year. Operation and deployment costs may be high for large storage facilities. Even if COTS disk is not the primary location for data, there will be very large disk caches. We will continue to use/buy STK or ADIC robots as needed. Older data will be shelved or copied to newer more dense media. 11/30/2018 Lee Lueking - Dzero

20 Cartridge Size and Media Cost
2003 GB $/GB 2005 2007 GB $/GB 2009 STK LTO Disk (COTS) 11/30/2018 Lee Lueking - Dzero

21 Drive Rates and Cost 2003 MB/s cost 2005 2007 2009 STK 20 $40k 40 $40k
2003 MB/s cost 2005 2007 2009 STK $40k $40k $40k $40k LTO $10k $10k $10k $10k Disk (COTS) $200 $200 $200 $200 11/30/2018 Lee Lueking - Dzero

22 Robot Capacities Tape Slots/unit Drive Slots/unit Mounts /hour k$/unit
Tape Slots/unit Drive Slots/unit Mounts /hour k$/unit STK-Powderhorn 5500 20 N/A 75 ADIC- AML/2 3500 150 300 2003 TB MB/s 2005 2007 2009 STK ADIC AML/2 11/30/2018 Lee Lueking - Dzero

23 Database

24 Database Approach We rely on a central database model with Highly Available Oracle server located at FNAL serving all offline onsite and offsite needs. An 8 processor SUN 4500 is employed with 1.2 TB RAID disk array. This has been very reliable. A three tier software architecture is employed with the middle tier called the “DB server”. 11/30/2018 Lee Lueking - Dzero

25 Database Applications
Name Responsible Person Status & Plans Calibration Taka Yasuda About ½ the applications are in production. Taka is coordinating with the individual application developers on delivery, testing etc. We expect work here as more data validation is done, and as the access profile increases. Offline Luminosity and Streams Michael Begal, Jeremy Simmons, Greg Landsberg, (Analysis Tools Group) The offline luminosity application is not written. Streaming is not started, neither is the application to access the streaming information. This is a fairly significant project that needs the experiment requirements and design work to be completed. L1,L2,L3 Trigger Elizabeth Gallas Development will likely continue for the next 9 months. Reporting and physicist friendly interfaces will be needed after that time. It can be expected that a further round of work is needed for the Run IIB upgrades 11/30/2018 Lee Lueking - Dzero

26 Database Applications (cont.)
Name Responsible Person Status & Plans SAM File and Event Catalog Lee Lueking General maintenance and updates Run Summary and Configuration Vladimir Sirotenko Jeremy Simmons An upgrade to this application is currently underway in support of data quality and validation. It is anticipated it will take another few months to complete VPC Calibration Volker Buescher This is now the responsibility of the online database group. RCP Steve White Marc Paterno The database side of this project is in abeyance. It is anticipated that no further work will be needed here. Speakers Bureau Elizabeth Gallas Completed Release Request Harry Melanson Completed. May need some tweaking as release procedures change 11/30/2018 Lee Lueking - Dzero

27 Database Application Storage Requirements
Application Name Estimated Size 2 Years RIIa Offline Calibration Top Level 40 MB Offline Calibration 90 GB Offline Muon 30 GB Offline CFT 14 GB Offline CPS 2 GB Offline FPS 8 GB Offline FPD Unknown Offline Luminosity and Streams 200 GB L1, L2, L3 Trigger SAM File and Event 700 GB Speakers Bureau 800 MB VLPC Calibration 7 GB Run Configuration 105 GB Total 1.15 TB 11/30/2018 Lee Lueking - Dzero

28 Dzero DB Server Client Client Client CORBA DB Server Python SQLnet
The DB server represents the middle tier of a three tier architecture. Reference material available at Client Client Client CORBA DB Server Python SQLnet Database Oracle 11/30/2018 Lee Lueking - Dzero

29 What Are Pros and Cons of 3 Tier Architecture?
Advantages Significantly reduces the required number of concurrent client licenses required (significant savings when using commercial db). Enables any internet connected node to run client processes. Insulates client from changes in DB schema, and visa versa. Enables client applications to be coded in any language. Much more scalable approach: 1) Allows customized caching and other services to be built into middle tier, 2) Dbservers can be operated on “cheap” hardware. Disadvantages Complex environment Debugging at all tiers more complicated. Adds additional level of abstraction for developers and users. Reduces effectiveness of Oracle server caching and optimization 11/30/2018 Lee Lueking - Dzero

30 Where Do We Use DB Serves?
SAM specific servers SAM user (serves SAM users) SAM prd (serves most SAM stations) SAM web (dedicated to web-based dataset editor) DLSAM (dedicated to online station) Central-Analysis (dedicated to CA station) D0 Farm (dedicated to D0 FNAL processing farm) Offline Trigger DB Offline Luminosity Generalized servers Offline Calibration (many) Offline Run Config 11/30/2018 Lee Lueking - Dzero

31 TNG DB Server Project This fully functional DB server is extremely
True multi-threading supports multiple simultaneous clients. Number of active logins to database is configurable (database pool), drops idle connections. Memory (L1) and persistent (L2) server side object caching. Additional monitoring. Dynamic configuration. Simplification of D0om interface, pass data only (no CORBA objects). Provide user identification and tracking. More object oriented. More easily maintainable. Proxy feature will allow servers to be connected together to provide maximum reliability and scalability of the overall system. Jim Kowalkowski, and Steve White have done full evaluation and design. Now working on implementation. Expect to have first working version May 31 (no L2 cache or Proxy). All features in July. This fully functional DB server is extremely important to maintain our Central DB Model ! 11/30/2018 Lee Lueking - Dzero

32 Database Hardware and Software
Ongoing Oracle licensing and support Oracle Layered products; OEM, Designer, etc. Hardware Servers – Load on DB server machines will increase as RIIa picks up. Probably require ~$60k/yr upgrades. Replace servers at beginning of RIIb for ~$300k, including software. Disk – Estimate growth as high as 1 TB/yr in RIIb. Probably ~$50k/yr. Backup – Included in server upgrade. 11/30/2018 Lee Lueking - Dzero

33 Database Manpower Task Effort Comments Database administration
1.5 FTEs Increase to 2-3 FTEs for several months for significant upgrade in infrastructure Application infrastructure 1 FTEs 2-3 FTEs til end of 2002, shared with SAM and remote access projects. Monitoring of the information .2FTEs 11/30/2018 Lee Lueking - Dzero

34 Conclusion The Dzero data handling system is used heavily, and has been reliable. The distributed data handling model is is proven sound and we expect to scale to meet most RIIa needs. SAM is being used to manage several kinds of compute hardware for various processing needs including processing farms and analysis clusters. This will continue and be expanded in RIIb. The SAM DH system will become part of an overall computing grid that is being developed and will enable easy and efficient use of compute resources around the world. This system is being built with “standard Grid Middleware” components for universal compatibality with other experiments. The current tape technologies are working well and should be adequate for RIIa and evolve to meet the needs of Run IIb We will continue to employ Oracle for critical and High Availability Database applications. The Database server technology we have developed is a key component to making this a scalable and robust solution for a system with worldwide deployment. 11/30/2018 Lee Lueking - Dzero


Download ppt "Dzero Data Handling and Databases"

Similar presentations


Ads by Google