Dzero Data Handling and Databases

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

SAM-Grid Status Core SAM development SAM-Grid architecture Progress Future work.
ICS 434 Advanced Database Systems
Database Architectures and the Web
Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
The Sam-Grid project Gabriele Garzoglio ODS, Computing Division, Fermilab PPDG, DOE SciDAC ACAT 2002, Moscow, Russia June 26, 2002.
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
Database Infrastructure for Application Development Designing tables and relations (Oracle Designer) Creating and maintaining database tables d0om - ORACLE.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Deploying and Operating the SAM-Grid: lesson learned Gabriele Garzoglio for the SAM-Grid Team Sep 28, 2004.
September 4,2001Lee Lueking, FNAL1 SAM Resource Management Lee Lueking CHEP 2001 September 3-8, 2001 Beijing China.
SAM and D0 Grid Computing Igor Terekhov, FNAL/CD.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
Database Architectures Database System Architectures Considerations – Data storage: Where do the data and DBMS reside? – Processing: Where.
ORBMeeting July 11, Outline SAM Overview and Station description Resource Management Station Cache Station Prioritized Fair Share Job Control File.
D0RACE: Testbed Session Lee Lueking D0 Remote Analysis Workshop February 12, 2002.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Elizabeth Gallas August 9, 2005 CD Support for D0 Database Projects 1 Elizabeth Gallas Fermilab Computing Division Fermilab CD Grid and Data Management.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Lee Lueking 1 The Sequential Access Model for Run II Data Management and Delivery Lee Lueking, Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo,
Database Server Concepts and Possibilities Lee Lueking D0 Data Browser Workshop April 8, 2002.
1 D0 Taking Stock By Anil Kumar CD/LSCS/DBI/DBA June 11, 2007.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.
Adapting SAM for CDF Gabriele Garzoglio Fermilab/CD/CCF/MAP CHEP 2003.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
Data Management with SAM at DØ The 2 nd International Workshop on HEP Data Grid Kyunpook National University Daegu, Korea August 22-23, 2003 Lee Lueking.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
D0 File Replication PPDG SLAC File replication workshop 9/20/00 Vicky White.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
DBS Monitor and DAN CD Projects Report July 9, 2003.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
SAM: Past, Present, and Future Lee Lueking All Dzero Meeting November 2, 2001.
1 P. Murat, Mini-review of the CDF Computing Plan 2006, 2005/10/18 An Update to the CDF Offline Plan and FY2006 Budget ● Outline: – CDF computing model.
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Jean-Philippe Baud, IT-GD, CERN November 2007
BaBar Transition: Computing/Monitoring
Database Architectures and the Web
Netscape Application Server
N-Tier Architecture.
Distributed Data Access and Resource Management in the D0 SAM System
SAM at CCIN2P3 configuration issues
Conditions Data access using FroNTier Squid cache Server
LQCD Computing Operations
Database Architectures and the Web
Objective Understand the concepts of modern operating systems by investigating the most popular operating system in the current and future market Provide.
Chapter 17: Client/Server Computing
Chapter 2: Operating-System Structures
DØ MC and Data Processing on the Grid
Lee Lueking D0RACE January 17, 2002
Proposal for a DØ Remote Analysis Model (DØRAM)
Chapter 2: Operating-System Structures
Objective Understand the concepts of modern operating systems by investigating the most popular operating system in the current and future market Provide.
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Dzero Data Handling and Databases Lee Lueking Dzero Computing Review May 9, 2002

outline Overview Data Handling Software Mass Storage System Databases Conclusions 11/30/2018 Lee Lueking - Dzero

Dzero Data Handling and Processing Architecture Reconstruction Farm Station Enstore Mover Nodes Central Analysis Station STK Silo High Capacity Switch CLuED0 Analysis Cluster Station STK Silo ADIC Tape Robot Of-Site Analysis and Processing Stations Online Data Logger Misc. Analysis Stations Misc. Analysis Stations Misc. Analysis Stations 11/30/2018 Lee Lueking - Dzero

The Anatomy of The D0 Data Handling, DataBase, Data Grid Core Lead: Lee Lueking Maintenance Production operation User support for D0 experiment Add needed D0 features Coordinate remote analysis sites Shift help from D0 experiment Physics Analysis Tools Lead:Wyatt Merritt Luminosity Physics Data Streams Interfaces for D0 framework GRID Lead:Igor Terekhov Evaluate and implement standard Grid components Create and collect use cases Plan and Design Job control, Monitoring, Info system, etc. Coordinate D0GRID and PPDG efforts Database Lead:Ruth Pordes Support and coordination for D0 database applications Includes SAM core, Analysis tools and GRID support 11/30/2018 Lee Lueking - Dzero

Data Handling Software

Components of a SAM Station Producers/ /Consumers Project Managers Temp Disk Cache Disk MSS or Other Station MSS or Other Station File Storage Server Station & Cache Manager File Storage Clients File Stager(s) Data flow Control eworkers 11/30/2018 Lee Lueking - Dzero

SAM as a Distributed System Name Server Database Server(s) (Central Database) Global Resource Manager(s) Log server Shared Globally Station 1 Servers Station 3 Servers Local To Site Station n Servers Station 2 Servers Mass Storage System(s) Arrows indicate Control and data flow Shared Locally 11/30/2018 Lee Lueking - Dzero

Data to and from Remote Sites Station Configuration Replica location Prefer Avoid Forwarding File stores can be forwarded through other stations “Routing” (now) Parasitic Stagers on D0mino Working for Imperial, Lancaster, BU, Arizona, Wuppertal, Columbia SAM Station 1 SAM Station 2 SAM Station 6 Remote SAM Station MSS SAM Station 3 SAM Station 5 SAM Station 4 Extra-domain transfers Currently use bbftp (parallel transfer protocol) 11/30/2018 Lee Lueking - Dzero

Dzero SAM Deployment Map Processing Center Analysis site 11/30/2018 Lee Lueking - Dzero

Future SAM and D0 Grid The station cache manager and File storage Server components will be merged to improve functionality of cache management and streamline routing behavior. Transition toward less centralized operation in which stations have site autonomy. The station can loose contact with the central db server and still function. Distributed naming service too. There will be a natural progression to “standard grid middleware” components. Current Grid work involves understanding job scheduling components with Condor-G, and the use of Globus tools including Grid Security Infrastructure (GSI), Grid Resource Allocation and Management (GRAM), Meta Directory Service (MDS), and GridFTP. Grid work done in conjunction with Particle Physics Data Grid (PPDG) collaboration. 11/30/2018 Lee Lueking - Dzero

SAM as Part of The Grid Monitoring and Job Management Information Service Job Management JDL Job Request Broker Condor MMS Resource Info MDS Condor Class Ads Job Submission Condor-G Site Gatekeeper Data Handling GRAM DH Resource Manager Local batch system SAM Condor 11/30/2018 Lee Lueking - Dzero

Data Handling, Data Grid, Analysis Tools Manpower Work Area FTE Description SAM Core 4 Maintenance and improvements Operations 2-3 Ongoing operations (incl d0 SAM shifters) Data Grid 3-4 Research and development Analysis Tools 3 Design and development 11/30/2018 Lee Lueking - Dzero

Mass Storage System

Enstore System and ISD Enstore is the principal MSS used with SAM. Integrated Systems Dept. (ISD) responsible for maintenance and upgrades to the system. ISD also responsible for FNAL MSS hardware including robotics, drives and tapes. Security enhancements are planned both at the admin and file transfer levels. Modernize code as needs and practices evolve. Continue to investigate replacing tape with disk as the technology becomes cost effective and maintainable. 11/30/2018 Lee Lueking - Dzero

dCache ISD is working with DESY to provide a disk cache buffering system called dCache. The system “front-ends” the current tape systems and is consistent with our computing model. Dzero will benefit from this in a couple of obvious ways: An interface to standard file transfer protocol (like GridFTP) will reduce the need to rout data through FNAL stations to remote sites. The intermediate cache will reduce access to tape for “hot” data coming from online and/or reco farms. For small cost, should represent significant performance improvements. 11/30/2018 Lee Lueking - Dzero

Tape Robot Enstore Data Rates 450 GB/day 1.6 TB/day ADIC Robot/LTO Drives STK Robot/ 9940 Drives 11/30/2018 Lee Lueking - Dzero

Summary of Tape Technologies Type Number written Bad tapes Total stored Comment 9940 795 1 (file) 43 TB Problem files recovered LTO 198 17 TB m2 567 dozens 26 TB Retired m1 1050 14 TB 11/30/2018 Lee Lueking - Dzero

Data Stored in SAM last 12 Months Number of Files 450,000 Data Size 100 TB Number of Events 400M (~120M RAW) 11/30/2018 Lee Lueking - Dzero

Storage Roadmap Current tape costs are about $1/GB and Commodity off the Shelf (COTS) IDE disk about $2/GB. Price trends indicate that tape will drop in price/GB by a factor of 2 every two years. COTS disk prices will reduce by a factor of 2 every year. Operation and deployment costs may be high for large storage facilities. Even if COTS disk is not the primary location for data, there will be very large disk caches. We will continue to use/buy STK or ADIC robots as needed. Older data will be shelved or copied to newer more dense media. 11/30/2018 Lee Lueking - Dzero

Cartridge Size and Media Cost   2003 GB $/GB 2005 2007 GB $/GB 2009 STK 120 0.65 250 0.30 500 0.15 1000 0.07 LTO 200 0.50 400 0.25 800 0.12 1600 0.06 Disk (COTS) 200 1.00 800 0.25 3200 0.06 12800 0.015 11/30/2018 Lee Lueking - Dzero

Drive Rates and Cost 2003 MB/s cost 2005 2007 2009 STK 20 $40k 40 $40k   2003 MB/s cost 2005 2007 2009 STK 20 $40k 40 $40k 80 $40k 160 $40k LTO 20 $10k 40 $10k 80 $10k 160 $10k Disk (COTS) 10 $200 40 $200 160 $200 640 $200 11/30/2018 Lee Lueking - Dzero

Robot Capacities Tape Slots/unit Drive Slots/unit Mounts /hour k$/unit   Tape Slots/unit Drive Slots/unit Mounts /hour k$/unit STK-Powderhorn 5500 20 N/A 75 ADIC- AML/2 3500 150 300   2003 TB MB/s 2005 2007 2009 STK 660 400 1320 800 2640 1600 5280 3200 ADIC AML/2 700 400 1400 800 2800 1600 5600 3200 11/30/2018 Lee Lueking - Dzero

Database

Database Approach We rely on a central database model with Highly Available Oracle server located at FNAL serving all offline onsite and offsite needs. An 8 processor SUN 4500 is employed with 1.2 TB RAID disk array. This has been very reliable. A three tier software architecture is employed with the middle tier called the “DB server”. 11/30/2018 Lee Lueking - Dzero

Database Applications Name Responsible Person Status & Plans Calibration Taka Yasuda About ½ the applications are in production. Taka is coordinating with the individual application developers on delivery, testing etc. We expect work here as more data validation is done, and as the access profile increases. Offline Luminosity and Streams Michael Begal, Jeremy Simmons, Greg Landsberg, (Analysis Tools Group) The offline luminosity application is not written. Streaming is not started, neither is the application to access the streaming information. This is a fairly significant project that needs the experiment requirements and design work to be completed. L1,L2,L3 Trigger Elizabeth Gallas Development will likely continue for the next 9 months. Reporting and physicist friendly interfaces will be needed after that time. It can be expected that a further round of work is needed for the Run IIB upgrades 11/30/2018 Lee Lueking - Dzero

Database Applications (cont.) Name Responsible Person Status & Plans SAM File and Event Catalog Lee Lueking General maintenance and updates Run Summary and Configuration Vladimir Sirotenko Jeremy Simmons An upgrade to this application is currently underway in support of data quality and validation. It is anticipated it will take another few months to complete VPC Calibration Volker Buescher This is now the responsibility of the online database group. RCP Steve White Marc Paterno The database side of this project is in abeyance. It is anticipated that no further work will be needed here. Speakers Bureau Elizabeth Gallas Completed Release Request Harry Melanson Completed. May need some tweaking as release procedures change 11/30/2018 Lee Lueking - Dzero

Database Application Storage Requirements Application Name Estimated Size 2 Years RIIa Offline Calibration Top Level 40 MB Offline Calibration 90 GB Offline Muon 30 GB Offline CFT 14 GB Offline CPS 2 GB Offline FPS 8 GB Offline FPD Unknown Offline Luminosity and Streams 200 GB L1, L2, L3 Trigger SAM File and Event 700 GB Speakers Bureau 800 MB VLPC Calibration 7 GB Run Configuration 105 GB Total 1.15 TB 11/30/2018 Lee Lueking - Dzero

Dzero DB Server Client Client Client CORBA DB Server Python SQLnet The DB server represents the middle tier of a three tier architecture. Reference material available at http://d0db-dev.fnal.gov/db_server_gen Client Client Client CORBA DB Server Python SQLnet Database Oracle 11/30/2018 Lee Lueking - Dzero

What Are Pros and Cons of 3 Tier Architecture? Advantages Significantly reduces the required number of concurrent client licenses required (significant savings when using commercial db). Enables any internet connected node to run client processes. Insulates client from changes in DB schema, and visa versa. Enables client applications to be coded in any language. Much more scalable approach: 1) Allows customized caching and other services to be built into middle tier, 2) Dbservers can be operated on “cheap” hardware. Disadvantages Complex environment Debugging at all tiers more complicated. Adds additional level of abstraction for developers and users. Reduces effectiveness of Oracle server caching and optimization 11/30/2018 Lee Lueking - Dzero

Where Do We Use DB Serves? SAM specific servers SAM user (serves SAM users) SAM prd (serves most SAM stations) SAM web (dedicated to web-based dataset editor) DLSAM (dedicated to online station) Central-Analysis (dedicated to CA station) D0 Farm (dedicated to D0 FNAL processing farm) Offline Trigger DB Offline Luminosity Generalized servers Offline Calibration (many) Offline Run Config 11/30/2018 Lee Lueking - Dzero

TNG DB Server Project This fully functional DB server is extremely True multi-threading supports multiple simultaneous clients. Number of active logins to database is configurable (database pool), drops idle connections. Memory (L1) and persistent (L2) server side object caching. Additional monitoring. Dynamic configuration. Simplification of D0om interface, pass data only (no CORBA objects). Provide user identification and tracking. More object oriented. More easily maintainable. Proxy feature will allow servers to be connected together to provide maximum reliability and scalability of the overall system. Jim Kowalkowski, and Steve White have done full evaluation and design. Now working on implementation. Expect to have first working version May 31 (no L2 cache or Proxy). All features in July. This fully functional DB server is extremely important to maintain our Central DB Model ! 11/30/2018 Lee Lueking - Dzero

Database Hardware and Software Ongoing Oracle licensing and support Oracle Layered products; OEM, Designer, etc. Hardware Servers – Load on DB server machines will increase as RIIa picks up. Probably require ~$60k/yr upgrades. Replace servers at beginning of RIIb for ~$300k, including software. Disk – Estimate growth as high as 1 TB/yr in RIIb. Probably ~$50k/yr. Backup – Included in server upgrade. 11/30/2018 Lee Lueking - Dzero

Database Manpower Task Effort Comments Database administration 1.5 FTEs Increase to 2-3 FTEs for several months for significant upgrade in infrastructure Application infrastructure 1 FTEs 2-3 FTEs til end of 2002, shared with SAM and remote access projects. Monitoring of the information .2FTEs   11/30/2018 Lee Lueking - Dzero

Conclusion The Dzero data handling system is used heavily, and has been reliable. The distributed data handling model is is proven sound and we expect to scale to meet most RIIa needs. SAM is being used to manage several kinds of compute hardware for various processing needs including processing farms and analysis clusters. This will continue and be expanded in RIIb. The SAM DH system will become part of an overall computing grid that is being developed and will enable easy and efficient use of compute resources around the world. This system is being built with “standard Grid Middleware” components for universal compatibality with other experiments. The current tape technologies are working well and should be adequate for RIIa and evolve to meet the needs of Run IIb We will continue to employ Oracle for critical and High Availability Database applications. The Database server technology we have developed is a key component to making this a scalable and robust solution for a system with worldwide deployment. 11/30/2018 Lee Lueking - Dzero