Lee Lueking WLCG Workshop DB BoF 22 Jan. 2007

Slides:



Advertisements
Similar presentations
Load Sharing and Balancing - Saravanan Mathialagan Masters in Computer Science Georgia State University.
Advertisements

F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
Database monitoring and service validation Dirk Duellmann CERN IT/PSS and 3D
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
FroNtier: High Performance Database Access Using Standard Web Components in a Scalable Multi-tier Architecture Marc Paterno Fermilab CHEP 2004 Sept. 27-Oct.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
Sakai/OSP Portfolio UvA Bas Toeter Universiteit van Amsterdam
Introduction: Distributed POOL File Access Elizabeth Gallas - Oxford – September 16, 2009 Offline Database Meeting.
Indiana University’s Name for its Sakai Implementation Oncourse CL (Collaborative Learning) Active Users = 112,341 Sites.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,
CMS Conditions Data Access using FroNTier Lee Lueking CMS Offline Software and Computing 5 September 2007 CHEP 2007 – Distributed Data Analysis and Information.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
1 Database mini workshop: reconstressing athena RECONSTRESSing: stress testing COOL reading of athena reconstruction clients Database mini workshop, CERN.
Grid Deployment Enabling Grids for E-sciencE BDII 2171 LDAP 2172 LDAP 2173 LDAP 2170 Port Fwd Update DB & Modify DB 2170 Port.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
3D Testing and Monitoring Lee Lueking LCG 3D Meeting Sept. 15, 2005.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
LHC Logging Cluster Nilo Segura IT/DB. Agenda ● Hardware Components ● Software Components ● Transparent Application Failover ● Service definition.
FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
Database Project Milestones (+ few status slides) Dirk Duellmann, CERN IT-PSS (
Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
FroNTier at BNL Implementation and testing of FroNTier database caching and data distribution John DeStefano, Carlos Fernando Gamboa, Dantong Yu Grid Middleware.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
CVMFS Alessandro De Salvo Outline  CVMFS architecture  CVMFS usage in the.
The CMS Beijing Tier 2: Status and Application Xiaomei Zhang CMS IHEP Group Meeting December 28, 2007.
Access to HEP conditions data using FroNtier: A web-based database delivery system Lee Lueking Fermilab International Symposium on Grid Computing 2005.
Dynamic Extension of the INFN Tier-1 on external resources
Web Server Load Balancing/Scheduling
Understanding Solutions
Bentley Systems, Incorporated
High Availability 24 hours a day, 7 days a week, 365 days a year…
Platform as a Service (PaaS)
System Monitoring with Lemon
Dirk Duellmann CERN IT/PSS and 3D
Database Replication and Monitoring
Web Server Load Balancing/Scheduling
Dynamic Deployment of VO Specific Condor Scheduler using GT4
High Availability Linux (HA Linux)
Diskpool and cloud storage benchmarks used in IT-DSS
LCG 3D Distributed Deployment of Databases
Large-scale file systems and Map-Reduce
Exadata and ZFS Storage at Nielsen
dCache “Intro” a layperson perspective Frank Würthwein UCSD
Service Challenge 3 CERN
BDII Performance Tests
Elizabeth Gallas - Oxford ADC Weekly September 13, 2011
Database Readiness Workshop Intro & Goals
Torrent-based software distribution
Conditions Data access using FroNTier Squid cache Server
Workshop Summary Dirk Duellmann.
3D Project Status Report
Patrick Dreher Research Scientist & Associate Director
Admission Control and Request Scheduling in E-Commerce Web Sites
Scalable Database Services for Physics: Oracle 10g RAC on Linux
Module P3 Practical: Building a webapp in nodejs and
Lee Lueking D0RACE January 17, 2002
Presentation transcript:

Lee Lueking WLCG Workshop DB BoF 22 Jan. 2007 FroNTier/CMS Lee Lueking WLCG Workshop DB BoF 22 Jan. 2007

L. Lueking - FroNTier/CMS Outline Overview Features Deployment Test and performance Operational Experience Acknowledgements Barry Blumenfeld (JHU), David Dykstra (FNAL), Eric Wicklund (FNAL) POOL/CORAL team 22 Jan., 2007 L. Lueking - FroNTier/CMS

L. Lueking - FroNTier/CMS Overview Servlets run under Tomcat on Central Servers. DB connection is through JDBC and connection management is provided. Clients access servers via HTTP requests. Proxy caching servers (Squids) are deployed at Tier 0, 1, and Tier N sites. Tier N Squid Squid Squid Tier 1 ICP Squid Squid Squid HTTP Squid(s) Tomcat(s) Tier 0 JDBC DB 22 Jan., 2007 L. Lueking - FroNTier/CMS

L. Lueking - FroNTier/CMS Recent Features Client can request the data be zipped by the server (compression levels 0-9) Keep alive signals sent to client when database is busy, avoids timeouts Client can use same HTTP connection for multiple requests. Server can insert expiration time in HTTP header for objects. Significantly improved client performance. Ported to 64-bit Linux Parameters can come in long parenthesized connect string instead of environment vars Can define logical name in long string so pool file catalog can use short name 22 Jan., 2007 L. Lueking - FroNTier/CMS

FroNTier Launchpad Setup Tier-0 Farm CERN DNS round Robbin WAN 3 servers running Frontier & Squid (worker nodes) Backend Oracle Database 10gR2 (4-node RAC) Provides load balancing and failover 22 Jan., 2007 L. Lueking - FroNTier/CMS

FroNTier “Launchpad” software Squid caching proxy Load shared with Round-Robin DNS Configured in “accelerator mode” Peer-to-peer caching “Wide open frontier”* Tomcat - standard FroNTier servlet Distributed as “war” file Unpack in Tomcat webapps dir Change 2 files if name is different One xml file describes DB connection Round-Robin DNS server1 server2 server3 Squid Squid Squid Tomcat Tomcat Tomcat FroNTier servlet FroNTier servlet FroNTier servlet DB *In the past, we required the registration so we could add IP/mask to our Access Control List (ACL) at CERN. Recently decided to run in “wide-open” mode so installations can be tested w/o registration. 22 Jan., 2007 L. Lueking - FroNTier/CMS

L. Lueking - FroNTier/CMS CMS Software Stack User Application EDM Framework POOL-ORA (Object Relational Access) is used to map C++ objects to Relational schema. A CORAL-FroNTier plugin provides read-only access to the POOL DB objects via FroNTier. EDM EventSetup POOL-ORA CORAL Oracle FroNTier Frontier Cache Database 22 Jan., 2007 L. Lueking - FroNTier/CMS

L. Lueking - FroNTier/CMS N-Tier Deployment Redundant Tomcat + Squid servers are deployed at Tier 0. Squids are deployed at Tier 1, and Tier N sites. Configuration includes: Access Control List (ACL) Cache management (Memory and Disk) Inter Cache sharing (if desired). Tier hierarchy (dashed lines) is a possible configuration change if needed. Site-local-config provides URL’s of servers, and proxies. Tier N Squid Squid Squid Tier 1 Squid Squid Squid Squid(s) Tomcat(s) Tier 0 DB 22 Jan., 2007 L. Lueking - FroNTier/CMS

L. Lueking - FroNTier/CMS Site Squid Details Hardware requirements Minimum specs: 1GHz CPU, 1GByte mem, GBit network, 100 GB disk. Needs to be well connected (network-wise) to worker nodes, and have access to WAN and LAN if on Private Network. Having 2 machines for failover is a requirement for T-0/T-1, and a useful option for T-2. Inexpensive insurance for reliability. Software installation Squid server and configuration Site-local-config file 22 Jan., 2007 L. Lueking - FroNTier/CMS

Squid Deployment Status Late 2005, 10 centers used for testing Additional installation May through Oct. used for Computing, Software, and Analysis challenge (a.k.a. CSA06) Very few problems with the installation procedures CMS provides. 22 Jan., 2007 L. Lueking - FroNTier/CMS

Launchpad Load:T0 Operation One week of extensive testing during CSA06 Ramped up farm to 1000 nodes. Spikes when new activity starts, and caches loaded with new objects. The load balancing spreads requests to all three servers (monitoring for one shown) Monitoring requests/minute, Bytes/sec with SNMP port on squid. Lemon provides server stats. 45k/minute 660 kB/s 40% CPU 22 Jan., 2007 L. Lueking - FroNTier/CMS

Throughput Limitations We discovered one CMS “object” that translates into ~28k DB requests. Throughput was limited to less than 1MB/s per server (3 servers). Squid logs were being produced at the rate of 2GB/hour. Clients connect to the Frontier squid with TCP. There is one TCP connection per request. The I/O overhead of the connection is about half as big as the data payload itself (128 Bytes compressed). Solutions: “Fixing” the object storage (in progress) Rotating squid logs as needed Holding open HTTP connection for multiple requests. Output 1 MB/s Input 22 Jan., 2007 L. Lueking - FroNTier/CMS

L. Lueking - FroNTier/CMS Recent Performance Preparing for use with HLT farm. A large number of squids will be preloaded so the processor farm can quickly load from the cache. Recent testing at FNAL delivers 35MByte/s to clients 200 simultaneous clients Object size of ~2MB compressed Squid running on quad 3GHz Xeon All Gigabit network connections Limitation is squid using single CPU that saturates. Have run 1000 simultaneous clients, just to test operation. 22 Jan., 2007 L. Lueking - FroNTier/CMS

L. Lueking - FroNTier/CMS Cache Coherency In most cases CC is not a problem for conditions data. Managed by policy. In some cases, e.g.. during development, and some online uses, it can be an issue. Giving each cached object an expiration time is one solution that has been implemented. Several alternative approaches are being examined. 22 Jan., 2007 L. Lueking - FroNTier/CMS

L. Lueking - FroNTier/CMS Cache Refresh @ 3 AM UTC Forced Refresh Expire Refresh Friday 27 Oct, started directing cache objects to expire 3:00 UTC next day (FrontierInt) Temporary “fix” for cache coherency. 22 Jan., 2007 L. Lueking - FroNTier/CMS

Usage Analysis for December 2006 (Stats for server one of three) Pages Hits Bandwidth Max 07 Dec 2006 3148109 2.72 GB Min 27 Dec 2006 330 264.66 KB Average 653634.90 653634.94 572.40 MB Total 20262682 20262683 17.33 GB 22 Jan., 2007 L. Lueking - FroNTier/CMS

L. Lueking - FroNTier/CMS Cache Stats for Dec 2006 (Stats for server one of three) Access Status Hits Bandwidth TCP_MEM_HIT:NONE (Memory Cache Hit) 16484727 13.94 GB TCP_CLIENT_REFRESH_MISS:DIRECT (Forced Refresh) 2677225 1.86 GB TCP_MISS:DIRECT (Request goes to Database) 1003958 1.39 GB TCP_MISS:SIBLING_HIT (Found in “sibling” squid) 55621 68.39 MB UDP_MISS:NONE 20961 5.80 MB TCP_HIT:NONE (Disk Cache hit) 15302 63.05 MB UDP_HIT:NONE 4667 1.30 MB TCP_MISS:NONE 606 1.97 MB TCP_CLIENT_REFRESH_MISS:NONE 12 21.66 KB TCP_DENIED:NONE 1 1.40 KB Total 20263080 17.33 GB 22 Jan., 2007 Cache Hit Rate = 82% L. Lueking - FroNTier/CMS

But Wait… There’s more… A typical local Squid has a similar or better cache hit rate, see for example Fermilab in December. Thus, the cache hit rate for the global deployment is in the high 90% area. Only a few percent of requests must go all the way to the DB to be satisfied. This will improve when the conditions data becomes more stable, changing daily expiration to be less frequent, et cetera. Requests satisfied by local Squid cache Requests that are forwarded to CERN 22 Jan., 2007 L. Lueking - FroNTier/CMS

L. Lueking - FroNTier/CMS Conclusion The FroNTier architecture is being used by CMS to distribute Conditions data worldwide. It was deployed to nearly 30 sites for use in CSA06. Throughput is limited when data objects are requested in many tiny pieces. Recent measurements show 35MByte/s throughputs for reasonably sized requests. Cache coherency concern has temporary solution, but needs more work. A high cache hit rate (high 90%) is observed. Overall, experience w/ the system is positive. Deployment, maintainability, reliability, and performance look good. 22 Jan., 2007 L. Lueking - FroNTier/CMS

Finish

L. Lueking - FroNTier/CMS Squid Requirements Hardware min specs: 1GHz CPU, 1GByte mem, GBit network, 100 GB disk. Closely networked to worker nodes access to WAN and LAN if on Private Network. Having 2 machines for failover is a requirement for T-0/T-1, and a useful option for T-2. Inexpensive insurance for reliability. Software installation Squid server and configuration Site-local-config file 22 Jan., 2007 L. Lueking - FroNTier/CMS

L. Lueking - FroNTier/CMS 22 Jan., 2007 L. Lueking - FroNTier/CMS