Replication using Oracle Streams at CERN

Slides:



Advertisements
Similar presentations
Introduction to DBA.
Advertisements

High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Replication Technologies at WLCG Lorena Lobato Pardavila CERN IT Department – DB Group JINR/CERN Grid and Management Information Systems, Dubna (Russia)
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams new features in 11g Zbigniew Baranowski.
1 Copyright © 2005, Oracle. All rights reserved. Introduction.
1 Copyright © 2009, Oracle. All rights reserved. Exploring the Oracle Database Architecture.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
Oracle Database Architecture By Ayesha Manzer. Automatic Storage Management Spreads database data across all disks Creates and maintains a storage grid.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
CERN IT Department CH-1211 Genève 23 Switzerland t Streams Service Review and Outlook Distributed Database Workshop PIC, 20th April 2009.
Database Competence Centre openlab Major Review Meeting nd February 2012 Maaike Limper Zbigniew Baranowski Luigi Gallerani Mariusz Piorkowski Anton.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams Service Review Distributed Database Workshop CERN, 27 th November 2009 Eva Dafonte.
CERN IT Department CH-1211 Geneva 23 Switzerland t Eva Dafonte Perez IT-DB Database Replication, Backup and Archiving.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
20 Copyright © 2006, Oracle. All rights reserved. Best Practices and Operational Considerations.
3 Copyright © 2006, Oracle. All rights reserved. Designing and Developing for Performance.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
INFN Tier1/Tier2 Cloud WorkshopCNAF, 22 November 2006 Conditions Database Services How to implement the local replicas at Tier1 and Tier2 sites Andrea.
Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Online/offline replication via Oracle Streams WLCG Collaboration.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
WLCG Collaboration Workshop CMS online/offline replication
Jean-Philippe Baud, IT-GD, CERN November 2007
Maria Girone, CERN – IT, Data Management Group
Streams Service Review
Ian Bird WLCG Workshop San Francisco, 8th October 2016
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Database Replication and Monitoring
How To Pass Oracle 1z0-060 Exam In First Attempt?
Grid site as a tool for data processing and data analysis
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
Evolution of Data(base) Replication Technologies for WLCG
WLCG DB Service Reviews
Applying Control Theory to Stream Processing Systems
Data Challenge with the Grid in ATLAS
Maximum Availability Architecture Enterprise Technology Centre.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Readiness of ATLAS Computing - A personal view
The LHC Computing Grid Visit of Her Royal Highness
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
POOL persistency framework for LHC
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
STREAMS failover and resynchronization
Oracle Database Monitoring and beyond
Grid Canada Testbed using HEP applications
3D Project Status Report
Database Services for CERN Deployment and Monitoring
Oracle Streams Performance
The LHC Computing Grid Visit of Professor Andreas Demetriou
Presentation transcript:

Replication using Oracle Streams at CERN Eva Dafonte Pérez

Outline Scope Oracle Streams Replication Downstream Capture Streams Setups at CERN Split & Merge Procedure Streams Optimizations Lessons Learned Streams Monitoring Examples and Numbers Summary 2

The LHC Computing Challenge Data volume high rate x large number of channels x 4 experiments 15 PetaBytes of new data each year stored much more data discarded during multi-level filtering before storage Compute power event complexity x Nb. events x thousands users 100 k of today's fastest CPUs Worldwide analysis & funding computing funding locally in major regions & countries efficient analysis everywhere GRID technology The European research centre CERN currently builds the Large Hadron Collider (LHC). When it begins operations, it will produce roughly 15 Petabytes of data annually, which thousands of scientists around the world will access and analyze. All data need to be available over the 15 year estimated lifetime of the LHC. The analysis of the data, including comparison with theoretical simulations, requires of the order of 100.000 CPUs of processing power. It uses a distributed model for data storage and analysis - computing and data GRID - where: the costs of maintaining and upgrading the necessary resources for such a computing challenge are more easily handled, because individual institutes and participating organizations can fund local computing resources, while contributing to the global goal. And, it reduces the points of failure. Multiple copies of data and automatic reassigning of computational tasks to available resources ensures the load balancing resources and facilitates the access to the data for all scientists involved, independently of the geographical location. 3

Distributed Service Architecture The mission of the Worldwide LHC Computing Grid Project (WLCG) is to develop, build and maintain a computing infrastructure for storage and analysis of the LHC data. Data will be distributed around the globe according to a four -tiered model. A primary backup will be recorded at CERN, the "Tier-0" centre of WLCG. After initial processing, these data will be distributed to a series of Tier-1 centers, large computer centers with sufficient compute and storage capacities as well as round-the-clock support for Grid operations. The Tier-1 centers will perform recurrent reconstruction passes and make data available to Tier-2 centers, each consisting of one or several collaborating computing facilities, which can store sufficient data and provide adequate computing power for specific analysis tasks. Individual scientists will access these facilities through Tier-3 computing resources, which can consist of local clusters in a University Department or even individual PCs. 4

Oracle Streams Replication Technology for sharing information between databases Database changes captured from the redo-log and propagated asynchronously as Logical Change Records (LCRs) Apply Capture Redo Logs Propagate Target Database Source Database 5

Downstream Capture Downstream capture to de-couple Tier 0 production databases from destination or network problems source database availability is highest priority Optimizing redo log retention on downstream database to allow for sufficient re-synchronisation window we use 5 days retention to avoid tape access Dump fresh copy of dictionary to redo automatically 10.2 Streams recommendations (metalink note 418755) Apply Capture Propagate Target Database Downstream Database Redo Logs Source Database Redo Transport method 6

Streams Setups: ATLAS 7

Streams Setups: LHCb (Conditions and LFC) 8

Split & Merge Procedure If one site is down LCRs are not removed from the queue Capture process might be paused by flow control  impact on replication performance Objective: isolate replicas against each other Split the capture process (original) Real-Time capture for sites “in good shape” (new) normal capture for site/s unstable/s new capture queue and propagation job original propagation job is dropped spilled LCRs are dropped from the original capture queue Merge the capture processes Resynchronization suggested by Patricia McElroy ( Principal Product Manager Distributed Systems/Replication) 9

Streams Optimizations TCP and Network tuning adjust system max TCP buffer (/etc/sysctl.conf) parameters to reinforce the TCP tuning DEFAULT_SDU_SIZE=32767 RECV_BUF_SIZE and SEND_BUF_SIZE Optimal: 3 * Bandwidth Delay Product Reduce the Oracle Streams acknowledgements alter system set events '26749 trace name context forever, level 2'; 10

Streams Optimizations Rules ATLAS Streams Replication: filter tables by prefix Rules on the capture side caused more overhead than on the propagation side Avoid Oracle Streams complex rules: rules with conditions that include LIKE or NOT clauses or FUNCTIONS Complex Rule condition => '( SUBSTR(:ddl.get_object_name(),1,7) IN (''COMP200'', ''OFLP200'', ''CMCP200'', ''TMCP200'', ’'TBDP200'', ''STRM200'') OR SUBSTR (:ddl.get_base_table_name(),1,7) IN (''COMP200'', ''OFLP200'', ''CMCP200'', ''TMCP200'', ''TBDP200'', ''STRM200'') ) ' Simple Rule condition => '(((:ddl.get_object_name() >= ''STRM200_A'' and :ddl.get_object_name() <= ''STRM200_Z'') OR (:ddl.get_base_table_name() >= ''STRM200_A'' and :ddl.get_base_table_name() <= ''STRM200_Z'')) OR ((:ddl.get_object_name() >= ’'OFLP200_A'' and :ddl.get_object_name() <= ''OFLP200_Z'') OR (:ddl.get_base_table_name() >= ’'OFLP200_A'' and :ddl.get_base_table_name() <= ''OFLP200_Z'')) Avoid complex rules: LIKE Functions NOT 11

Streams Optimizations Rules 600 80 Simple Rules Complex Rules 12

Streams Optimizations Flow Control By default, flow control kicks when the number of messages is larger than the threshold Buffered publisher: 5000 Capture publisher: 15000 10.2 + Patch 5093060 = 2 new events 10867: controls threshold for any buffered message publisher 10868: controls threshold for capture publisher Manipulate default behavior 13

Lessons Learned Apply side can be significantly less efficient Manipulate the database is slower than the redo generation Execute LCRs serially => apply cannot keep up with the redo generation rate SQL bulk operations (at the source db) May map to many elementary operations at the destination side Need to control source rates to avoid overloading System generated names Do not allow system generated names for constraints and indexes Modifications will fail at the replicated site Storage clauses also may cause some issues if the target sites are not identical 14

Streams Recommended Patches Metalink Note:437838.1 - Recommended Patch for Streams MLR for Streams/Logminer bugs patches: 6081550, 6081547 and 6267873 Leak in Perm Allocations with library cache comments ora-4031 generated fix patch 6043052 ORA-00600 [KRVTADC] IN CAPTURE PARALLEL PROCESS (using compressed tables) fix patch 4061534 DEADLOCK BETWEEN 'LIBRARY CACHE LOCK' AND 'LIBRARY CACHE PIN' fix patch 5604698 Bug 6163622 SQL apply degrades with larger transactions Bug 5093060 STREAMS 5000 LCR limit is causing unnecessary FLOW CONTROL at apply site 15

Streams Monitoring Features: Architecture: Streams topology Status of streams connections Error notifications Streams performance (latency, throughput, etc.) Other resources related to the streams performance (streams pool memory, redo generation) Architecture: “strmmon” daemon written in Phython End-user web application http://oms3d.cern.ch:4889/streams/main 3D monitoring and alerting integrated with WLCG procedures and tools 16

Streams Monitoring 17

Examples Streams setup for ATLAS experiment Online  Offline  Tier1 sites (10) Real-Time downstream capture Oracle 10.2.0.3 Use of rules Database size: 1.44 TB 18

Examples ATLAS Conditions data 2 GB per day 600 - 800 LCRs per second 19

Examples ATLAS PVSS tests 6 GB per day 2500 - 3000 LCRs per second 20

Summary The LCG Database Deployment Project (LCG 3D) has set up a wold-wide distributed database infrastructure for LHC some 124 RAC nodes = 450 CPU cores at CERN + several tens of nodes at 10 partner sites are in production now Large scale tests have validated that the experiment are implemented by the RAC & streams based set-up Backup & recovery tests have been performed to validate the operational procedures at all sites Monitoring of database & streams performance has been implemented building on grid control and strmmon tools key to maintain and optimise any larger system Database infrastructure is ready for accelerator turn-on collaboration with Oracle as part of the CERN openlab has been essential and beneficial for both partners 21

Questions? 22