Access to HEP conditions data using FroNtier: A web-based database delivery system Lee Lueking Fermilab International Symposium on Grid Computing 2005.

Slides:



Advertisements
Similar presentations
Database Architectures and the Web
Advertisements

Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
G O B E Y O N D C O N V E N T I O N WORF: Developing DB2 UDB based Web Services on a Websphere Application Server Kris Van Thillo, ABIS Training & Consulting.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
CSE 190: Internet E-Commerce Lecture 16: Performance.
Introduction to client/server architecture
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
Database Infrastructure Major Current Projects –CDF Connection Metering, codegen rewrite, hep w/ TRGSim++ – Dennis –CDF DB Client Monitor Server and MySQL.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
Platform as a Service (PaaS)
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Acceleratio Ltd. is a software development company based in Zagreb, Croatia, founded in We create innovative software solutions for SharePoint,
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
Components of Windows Azure - more detail. Windows Azure Components Windows Azure PaaS ApplicationsWindows Azure Service Model Runtimes.NET 3.5/4, ASP.NET,
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
11/16/2012ISC329 Isabelle Bichindaritz1 Web Database Application Development.
FroNtier: High Performance Database Access Using Standard Web Components in a Scalable Multi-tier Architecture Marc Paterno Fermilab CHEP 2004 Sept. 27-Oct.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Oracle 10g Database Administrator: Implementation and Administration Chapter 2 Tools and Architecture.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
ALICE, ATLAS, CMS & LHCb joint workshop on
CMS Conditions Data Access using FroNTier Lee Lueking CMS Offline Software and Computing 5 September 2007 CHEP 2007 – Distributed Data Analysis and Information.
INFORMATION SYSTEM-SOFTWARE Topic: OPERATING SYSTEM CONCEPTS.
Database Server Concepts and Possibilities Lee Lueking D0 Data Browser Workshop April 8, 2002.
Microsoft Management Seminar Series SMS 2003 Change Management.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
WEB SERVER SOFTWARE FEATURE SETS
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
3D Testing and Monitoring Lee Lueking LCG 3D Meeting Sept. 15, 2005.
Adapting SAM for CDF Gabriele Garzoglio Fermilab/CD/CCF/MAP CHEP 2003.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED INTERSTAGE BPM ARCHITECTURE BPMS.
DBS Monitor and DAN CD Projects Report July 9, 2003.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
MONITORING CMS TRACKER CONSTRUCTION AND DATA QUALITY USING A GRID/WEB SERVICE BASED ON A VISUALIZATION TOOL G. ZITO, M.S. MENNEA, A. REGANO Dipartimento.
Platform as a Service (PaaS)
Architecture Review 10/11/2004
Databases and DBMSs Todd S. Bacastow January 2005.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Platform as a Service (PaaS)
FroNtier Overview and POOL Interface Prototype
Platform as a Service (PaaS)
Database Replication and Monitoring
WWW and HTTP King Fahd University of Petroleum & Minerals
Netscape Application Server
(on behalf of the POOL team)
Lee Lueking WLCG Workshop DB BoF 22 Jan. 2007
CMS High Level Trigger Configuration Management
Open Source distributed document DB for an enterprise
Maximum Availability Architecture Enterprise Technology Centre.
Conditions Data access using FroNTier Squid cache Server
PHP / MySQL Introduction
Introduction to client/server architecture
Introduction to Cloud Computing
Introduction of Week 6 Assignment Discussion
Building a Database on S3
Web Server Administration
Introduction to Databases Transparencies
Lecture 1: Multi-tier Architecture Overview
Moodle Scalability What is Scalability?
Internet Protocols IP: Internet Protocol
Enterprise Java Beans.
Database System Architectures
Sending data to EUROSTAT using STATEL and STADIUM web client
Presentation transcript:

Access to HEP conditions data using FroNtier: A web-based database delivery system Lee Lueking Fermilab International Symposium on Grid Computing 2005 April 26, 2005

2ISGC - HEP Conditions DB AccessApril 26, 2005 Credits Fermilab, Batavia, Illinois Sergey Kosyakov, Jim Kowalkowski, Dmitri Litvintsev, Lee Lueking, Marc Paterno, Stephen White Johns Hopkins University, Baltimore, Maryland Barry Blumenfeld, Petar Maksimovic

3ISGC - HEP Conditions DB AccessApril 26, 2005 Outline Introduction to the HEP Conditions DB Environment Introduction to the HEP Conditions DB Environment FroNtier project details and experience at the CDF Experiment. FroNtier project details and experience at the CDF Experiment. Possible use of FroNtier for CMS conditions data. Possible use of FroNtier for CMS conditions data.

4ISGC - HEP Conditions DB AccessApril 26, 2005 The HEP Database Environment Databases are used to maintain information about the detector’s operation, and details needed for calibration and alignment of the many detector sub-systems. Databases are used to maintain information about the detector’s operation, and details needed for calibration and alignment of the many detector sub-systems. This information is needed on-line, in real time to, operate the detector and off-line to understand the physics content of the “raw signal” data coming from the detectors. This information is needed on-line, in real time to, operate the detector and off-line to understand the physics content of the “raw signal” data coming from the detectors. The off-line environment is dependent on Grid computing to provide the resources needed to process and analyze the complex signal data at processing centers worldwide. The off-line environment is dependent on Grid computing to provide the resources needed to process and analyze the complex signal data at processing centers worldwide. A highly distributed database delivery system is needed to accompany the Grid computing machinery. A highly distributed database delivery system is needed to accompany the Grid computing machinery.

5ISGC - HEP Conditions DB AccessApril 26, 2005 What are “Conditons” Data? Monitoring Monitoring –Sensor channel values (HV, LV, Temp, Pressure,…) move independently of each other in time. Monitor information about the detector. –Data is collected in real time and has a single “version”. Calibration Calibration –Needed to understand the response of detector channels to signal input. –“Algorithms” are used to create the data. More than one algorithm might be stored, and each might have multiple “versions”. Alignment Alignment –Precision alignment of components of the detector which are used for “particle track recognition” is essential. –The many sub-systems comprising the detector must be aligned relative each other.

6ISGC - HEP Conditions DB AccessApril 26, 2005 Characteristics of Conditions Data For the CDF Detector, conditions data objects vary in size from a few bytes, to a few MBs. For the CDF Detector, conditions data objects vary in size from a few bytes, to a few MBs. For the CMS Detector, conditions data objects vary in size from a few hundred bytes, to a few hundred MBs. In general: In general: –The frequency of access for a data object is dependent on the kind of object and what the requesting application is doing. –It is very likely that the same object will be accessed by multiple processing applications working on signal data taken at similar times.

7ISGC - HEP Conditions DB AccessApril 26, 2005 Database Access Requirements Thousands of clients distributed at processing centers worldwide. Thousands of clients distributed at processing centers worldwide. Likelihood to reuse cached objects at each center by many clients is high. Likelihood to reuse cached objects at each center by many clients is high. High availability for database access. High availability for database access. Stateless servers are much preferred over database replicas that have higher administrative overhead. Stateless servers are much preferred over database replicas that have higher administrative overhead. Security and Access Control that fits easily into the network. Compute servers will be behind firewalls and on private networks. Security and Access Control that fits easily into the network. Compute servers will be behind firewalls and on private networks. Decoupling the client API from Database schema is highly desirable. This simplifies development and long-term maintenance of both. Decoupling the client API from Database schema is highly desirable. This simplifies development and long-term maintenance of both.

8ISGC - HEP Conditions DB AccessApril 26, 2005 How to Best Deliver Data Objects? Central Database, or replicated to one or two additional sites for redundancy if needed. Central Database, or replicated to one or two additional sites for redundancy if needed. Stateless “application” servers configured for load balancing and failover, provide connection pooling to the DB. Stateless “application” servers configured for load balancing and failover, provide connection pooling to the DB. Stateless network components, proxy caching servers, at each GRID processing center provide access control and data caching. Stateless network components, proxy caching servers, at each GRID processing center provide access control and data caching. Grid jobs (clients), running on the Grid compute resources, need outgoing access to the internet, through the proxy caching service. Grid jobs (clients), running on the Grid compute resources, need outgoing access to the internet, through the proxy caching service.

9ISGC - HEP Conditions DB AccessApril 26, 2005 The FroNtier Project Goal: Assemble a toolkit, using standard web technologies, to provide high performance, scalable, database access through a stateless, multi-tier architecture. Pilot project Ntier tested the technology: – –Tomcat, HTTP, Squid – –Client monitoring w/ existing CDF tools (udp messages) FroNtier project was established to provide a production system for CDF and other interested users

10ISGC - HEP Conditions DB AccessApril 26, 2005 FroNtier Overview CDF Persistent Object Templates (Java) FroNtier components in yellow Client Caching FroNtier Server Database FroNtier Client API Library Squid Proxy/Caching Server FroNtier Servlet running under Tomcat Database (or other persistency service) XML Server Descriptors DDL for Table Descriptions C++ Headers and Stubs JDBC HTTP

11ISGC - HEP Conditions DB AccessApril 26, 2005 CalibrationDatabase The FroNtier Servlet 1.Client sends request (URI) 2.Command Parser translates URI into commands + values 3.Servicer Factory gets XSD (XML Server Descriptor) from database and 4.Instantiates a Servicer 5.Servicer queries database and 6.Results sent for encoding 7.Encoder marshals (serializes) the data to requesting client XSDDatabase CommandParser ServicerFactoryServicer Encoder Client

12ISGC - HEP Conditions DB AccessApril 26, 2005 FroNtier XML Server Descriptor (XSD) Object name and version information Object name and version information Response description Response description The SQL mapping to the database The SQL mapping to the database –Select statement –From statement –Where clause –Special modifiers (order by, etc) calib_run, calib_version, data_status CalibRunLists cid

13ISGC - HEP Conditions DB AccessApril 26, 2005 FroNtier use of Squid Cache HTTP Proxy Caching Server: HTTP Proxy Caching Server: –Well documented, widespread operational experience –Easily installed and maintained –Highly configurable for access control, disk cache tuning, distributed cache peer relationships, and more. –Monitoring built in through SNMP-2 interface Cache Refresh options Cache Refresh options –Servlet: expiration time sent in HTTP header –Client: forced object refresh through request –Administrative: Delete each Squid’s cache files and rebuild the cache However, the objects being delivered are generally not changing, so a static cache meets most requirements. However, the objects being delivered are generally not changing, so a static cache meets most requirements.

14ISGC - HEP Conditions DB AccessApril 26, 2005 FroNtier client API features Compatible with C and C++ Compatible with C and C++ Portable Portable –32 and 64 bit systems tested Transparent object access Transparent object access –Type conversion detection –Preserves data integrity Multi-object requests Multi-object requests Easy runtime configuration Easy runtime configuration Extensive error reporting Extensive error reporting –Adjustable log levels FroNtier Service User application FroNtier API

15ISGC - HEP Conditions DB AccessApril 26, 2005 CDF FroNtier Testing at FNAL/SDSC (San Diego Super Computing Center) FNAL Launchpad SDSC Squid SDCS CAF CDF SiChipPed objects are usually about 0.5 MB, up to 1.7 MB in size. (Silicon Chip Pedestals) SiChipPed objects are usually about 0.5 MB, up to 1.7 MB in size. (Silicon Chip Pedestals) SvxBeamPosition objects are 502 Bytes (Silicon tracker beam position) SvxBeamPosition objects are 502 Bytes (Silicon tracker beam position) The real savings are also in the reduced DB access. The real savings are also in the reduced DB access. Access times for direct Oracle and Frontier Oracle Frontier Oracle Frontier SiChipPed SvxBeamPosition 1e-03 1e+011e Access time (s)

16ISGC - HEP Conditions DB AccessApril 26, 2005 CDF “Launchpad” at FNAL Four general processing nodes CPU: dual 2.4 GHz Memory: 2MB Disk 100 GB NIC: GBit Ethernet Main entry squid uses tomcats in round robin fashion

17ISGC - HEP Conditions DB AccessApril 26, 2005 Max Total kB/s Average Total kB/s Current Total kB/s Max Fetches 55.0 kB/s Average Fetches 1.0 kB/s Current Fetches 0.0 kB/s CDF FroNtier Status Client library is included in CDF production code. Client library is included in CDF production code. DB access includes calibration, trigger, and other conditions information. DB access includes calibration, trigger, and other conditions information. Extensive validation confirms data obtained with direct Oracle access is the same as via Frontier. Extensive validation confirms data obtained with direct Oracle access is the same as via Frontier. Squid deployment at CDF processing centers in San Diego (SDSC), Bologna (CNAF), Karlsruhe (GridKa), Toronto, Rutgers, MIT. Squid deployment at CDF processing centers in San Diego (SDSC), Bologna (CNAF), Karlsruhe (GridKa), Toronto, Rutgers, MIT. Still being phased in, but activity is increasing rapidly. Still being phased in, but activity is increasing rapidly. Launchpad activity for last week SNMP data for Data throughput on Fermilab Squid server. (KB/s)

18ISGC - HEP Conditions DB AccessApril 26, 2005 CMS FroNtier CMS is interested in using FroNtier approach for offline and possibly some online DB access. CMS is interested in using FroNtier approach for offline and possibly some online DB access. Off-line Requirements for DB access include large (several hundred MB) data “objects” by computing resources distributed worldwide. Off-line Requirements for DB access include large (several hundred MB) data “objects” by computing resources distributed worldwide. On-line needs include the High Level Trigger (HLT) farm with large objects and high demands on the cache. On-line needs include the High Level Trigger (HLT) farm with large objects and high demands on the cache.

19ISGC - HEP Conditions DB AccessApril 26, 2005 CMS HLT: Challenging Environment The High Level Trigger Farm is is a very interesting environment. The High Level Trigger Farm is is a very interesting environment. –1000 nodes, running ~4000 processes –Object sizes range up to several hundred MB. –Near real-time demands for new object caching. It has not been established yet that the Frontier approach will be used, however it is attractive. It has not been established yet that the Frontier approach will be used, however it is attractive. Concerns: Concerns: –Will performance be sufficient for large data objects? –Is reliability sufficient under the heavy load? –What are the hardware and configuration needs?

20ISGC - HEP Conditions DB AccessApril 26, 2005 Initial Squid Tests Attempting to use a large Squid memory cache fails miserably. Attempting to use a large Squid memory cache fails miserably. cache_mem 256MB maximum_object_size_in_memory 256MB cache_mem 256MB maximum_object_size_in_memory 256MB Obviously, memory cache is not designed to work with big objects. Obviously, memory cache is not designed to work with big objects. Performance much better when NOT using Squid memory cache. In this test cache_dir was created on XFS disk partition. Results are 7 to 10 MB/sec better, compared to Ext2, with large RAM and good disk hardware XFS can perform even better.

21ISGC - HEP Conditions DB AccessApril 26, 2005 Evaluation Summary Squid performs very well with big objects, showing no decrease in performance. Squid performs very well with big objects, showing no decrease in performance. Attempts to improve performance by putting big objects into memory reduce performance dramatically Attempts to improve performance by putting big objects into memory reduce performance dramatically For big objects Squid's performance is limited only by IO subsystem For big objects Squid's performance is limited only by IO subsystem Performance can be improved by using good IO hardware and software: e.g. fast SCSI RAID in striping mode and non-journaling file system. Performance can be improved by using good IO hardware and software: e.g. fast SCSI RAID in striping mode and non-journaling file system.

22ISGC - HEP Conditions DB AccessApril 26, 2005 Using a Memory File System Using memory-based file system could be a very good solution for the on-line HLT farm, and other high demand environments. Using memory-based file system could be a very good solution for the on-line HLT farm, and other high demand environments. It is fast, cheap, and virtually maintenance-free (memfs regenerates itself on each OS restart) It is fast, cheap, and virtually maintenance-free (memfs regenerates itself on each OS restart) Bigger data (if needed) could be handled with bigger or multiple memfs systems, but, for sizes more than 3GB, 64-bit OS could be needed. Bigger data (if needed) could be handled with bigger or multiple memfs systems, but, for sizes more than 3GB, 64-bit OS could be needed. Configuration: –cache_dir of the Squid points to memory-based file system of 1200 MB size. –Memfs is sufficient to keep 2 calibration objects of 512 MB each, plus bookkeeping data. –Hard drive is used for keeping log files only. Initial memory loading Gbit network throughput

23ISGC - HEP Conditions DB AccessApril 26, 2005 Summary HEP Conditions databases are essential to the operation of the particle detectors and needed for understanding the physics data. HEP Conditions databases are essential to the operation of the particle detectors and needed for understanding the physics data. FroNtier is a multi-tier architecture providing high throughput, low latency, scalable access to a persistent store, such as a database. FroNtier is a multi-tier architecture providing high throughput, low latency, scalable access to a persistent store, such as a database. The CDF DB access framework has been adapted to use the FroNtier approach. It is in production and users are enthusiastic about the advantages provided. The CDF DB access framework has been adapted to use the FroNtier approach. It is in production and users are enthusiastic about the advantages provided. CMS is interested in using the Frontier approach for offline, and possibly some online, DB access. Evaluations are underway to understand how the system will perform. Results are promising. CMS is interested in using the Frontier approach for offline, and possibly some online, DB access. Evaluations are underway to understand how the system will perform. Results are promising.

24ISGC - HEP Conditions DB AccessApril 26, 2005 References FroNtier Talks and Papers: FroNtier Talks and Papers: – FroNtier working page: FroNtier working page: –