Ruben Gaspar – CERN - Speaker Dawid Wojcik – CERN Ignacio Coterillo– CERN Daniel Gomez- CERN UKOUG Database Server SIG Meeting, 29th January 2013.

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

ITEC474 INTRODUCTION.
Chapter 20 Oracle Secure Backup.
13,000 Jobs and counting…. Advertising and Data Platform Our System.
2 Copyright © 2005, Oracle. All rights reserved. Installing the Oracle Database Software.
Overview of Database Administrator (DBA) Tools
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 2 Overview of Database Administrator (DBA) Tools.
Introduction to DBA.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Updates from Database Services at CERN Andrei Dumitru CERN IT Department / Database Services.
On behalf DBoD team, IT Department HEPIX 2014 Nebraska Union, University of Nebraska – Lincoln, USA.
Oracle 10g Database Administrator: Implementation and Administration
2 Copyright © 2009, Oracle. All rights reserved. Installing your Oracle Software.
Implementing Failover Clustering with Hyper-V
Oracle backup and recovery strategy
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
Enterprise Reporting with Reporting Services SQL Server 2005 Donald Farmer Group Program Manager Microsoft Corporation.
Backup & Recovery Concepts for Oracle Database
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
1. Outline Introduction Virtualization Platform - Hypervisor High-level NAS Functions Applications Supported NAS models 2.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Scale-out databases for CERN use cases Strata Hadoop World London 6 th of May,2015 Zbigniew Baranowski, CERN IT-DB.
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Online Database Support Experiences Diana Bonham, Dennis Box, Anil Kumar, Julie Trumbo, Nelly Stanfield.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Backup & Recovery Backup and Recovery Strategies on Windows Server 2003.
Daniel Gomez Blanco Ignacio Coterillo Coz David Collados Polidura Ruben Domingo Gaspar Aparicio ITTF - 13 th June 2014.
ORACLE
Chapter © 2006 The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/ Irwin Chapter 7 IT INFRASTRUCTURES Business-Driven Technologies 7.
The protection of the DB against intentional or unintentional threats using computer-based or non- computer-based controls. Database Security – Part 2.
Oracle 10g Database Administrator: Implementation and Administration Chapter 2 Tools and Architecture.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
CERN IT Department CH-1211 Geneva 23 Switzerland t Daniel Gomez Ruben Gaspar Ignacio Coterillo * Dawid Wojcik *CERN/CSIC funded by Spanish.
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
A Brief Documentation.  Provides basic information about connection, server, and client.
Continuous DB integration testing with RAT „RATCOIN”
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
System Center Lesson 4: Overview of System Center 2012 Components System Center 2012 Private Cloud Components VMM Overview App Controller Overview.
CERN IT Department CH-1211 Genève 23 Switzerland t DBA Experience in a multiple RAC environment DM Technical Meeting, Feb 2008 Miguel Anjo.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Database Competence Centre openlab Major Review Meeting nd February 2012 Maaike Limper Zbigniew Baranowski Luigi Gallerani Mariusz Piorkowski Anton.
Drupal Service: Infrastructure Update 2 Marek Salwerowicz Sergio Fernandez ENTICE Meeting
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
Emil Pilecki Credit: Luca Canali, Marcin Blaszczyk, Steffen Pade.
Operating Systems & Information Services CERN IT Department CH-1211 Geneva 23 Switzerland t OIS Drupal at CERN Juraj Sucik Jarosław Polok.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
CERN - IT Department CH-1211 Genève 23 Switzerland t Operating systems and Information Services OIS Proposed Drupal Service Definition IT-OIS.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Cluman: Advanced Cluster Management for Large-scale Infrastructures.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
9 Copyright © 2004, Oracle. All rights reserved. Getting Started with Oracle Migration Workbench.
Calgary Oracle User Group
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
Maximum Availability Architecture Enterprise Technology Centre.
Cloud based Open Source Backup/Restore Tool
Oracle Database Monitoring and beyond
The Involuntary DBA Where there is Linux, you most likely will find MySQL or MariaDB. Like it or not, if you're working with Linux, you're a DBA.
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
Oracle Architecture Overview
Presentation transcript:

Ruben Gaspar – CERN - Speaker Dawid Wojcik – CERN Ignacio Coterillo– CERN Daniel Gomez- CERN UKOUG Database Server SIG Meeting, 29th January 2013

Outline CERN and Databases Architecture Hardware overview Empowering users Functionality overview Conclusions

4 CERN European Organization for Nuclear Research founded in Member States, 7 Observer States + UNESCO and UE 60 Non-member States collaborate with CERN 2400 staff members work at CERN as personnel, more researchers from institutes world-wide

5 LHC and Experiments Large Hadron Collider (LHC) – particle accelerator used to collide beams at very high energy 27 km long circular tunnel Located ~100m underground Protons currently travel at % of the speed of light Collisions are analysed with usage of special detectors and software in the experiments dedicated to LHC New particle discovered! consistent with the Standard Model Higgs boson

6 WLCG The world’s largest computing grid More than 25 Petabytes of data stored and analysed every year Over physical CPUs Over logical CPUs 157 computer centres in 36 countries More than 8000 physicists with real-time access to LHC data

7 Oracle at CERN Relational DBs play a key role in the LHC production chains Accelerator logging and monitoring systems Online acquisition, offline: data (re)processing, data distribution, analysis Grid infrastructure and operation services Monitoring, dashboards, etc. Data management services File catalogues, file transfers, etc. Metadata and transaction processing for tape storage system Database on Demand service: Oracle & MySQL instances (more later)

9 Database as a Service – Rationale Empowering CERN IT and research community Users can request and manage different database instances (currently MySQL and Oracle single instance) Aimed at medium size and long-term projects Users are provided with a self-service portal Ease of administration Integrated backup & recovery Monitoring solution One click patching

10 Database as a Service principles Scalable Provide flexible and cost effective Database as a Service Owners are grouped by a mailing group (access authorization) Owners receive full DBA privileges* on their instances Owners are responsible for ensuring that their systems, and the use of their systems, are fully compliant with the Rules of CERN Computing Facilities (including security) The “Database on Demand” (DBoD) service – OS administration and providing support for self-service portal functionality The DBoD service does not provide DBA or application support

11 DBaaS providers Legend MySQL Oracle SQL Server B&R HA Upgrades Monitoring add-on

12 Private cloud model 12 Reuse existing virtualization infrastructure and know- how – cost efficient Improve operations Standardization Consolidation – migrate existing DBs to DBoD service Reuse tools and management frameworks HA via virtualization (live migration) Oracle clusterware Master/Slave replication (just for MySQL)

13 Architecture 13 Virtualization Oracle VM (2.2, and 3.1.1) on Linux x86_64 Typical VM size: 2 cores, 8 or 16GB RAM Physical server: usually running several instances Storage NFS over 10 Gigabit Ethernet Configuration Management Open-source Quattor Toolkit CERN is currently adopting Puppet Management framework Syscontrol – developed at CERN Custom development mainly Perl + Bash. About 10K lines code Web self-service portal

+ Physical Servers

15 Hardware servers Dell blades PowerEdge M610 2x Quad-Core Intel 2.53GHz 48 GB RAM Transtec Database server 2x Six-Core Intel 2.26 GHz 128 GB RAM NetApp cluster Next release 10GbE

16 Hardware storage NetappFAS3240(nowadays)FAS6210 (future release) CPU 1xL5410 (Harpertown/FSB) 2xE5520 (Nehalem-EP, QPI) cores48 RAM8GB (-1GB NVRAM)24GB NVRAM1 GB4 GB Max flash512GB1.5TB Max aggregate size90 TB162 TB Moving from 7-mode Ontap to C-mode QPI = Intel QuickPath Interconnect (measured x3 memory bandwidth over FSB)

Hardware storage II DatafilesBinary logs DBOD instances Redo logs Controlfile DBOD instances Datafiles diagnostics Dual SAS loop 2x3GbE 10GbE NetApp cluster

18 Shared Instances 18 DBoD supports more than one MySQL instance on one VM Sharing CPU Sharing MySQL/Oracle binaries Separate buffer pools (pre-allocated memory) Separate NFS volumes Independent backup and restore

19 MySQL 19 Currently running MySQL Community Edition 5.5 InnoDB as the preferred storage engine – backup & recovery Binary logs enabled ACID (atomicity, consistency, isolation, durability) – innodb_flush_log_at_trx_commit = 1, sync_binlog and innodb_flush_method = O_DIRECT Using innodb_buffer_pool_size of ~5GB Using thread cache (big gain for some clients) Using query cache ( query_cache_size = 768M ) Performance schema is enabled by default

20 Oracle 11gR2 Archivelog mode Scheduler job for automatic archivelog clean-up COST to Restrict Instance Registration [ID ] SQLNET.CRYPTO_CHECKSUM_SERVER=required & SQLNET.ENCRYPTION_SERVER=required filesystemio_options='SETALL‘ Scheduler job for automatic archivelog clean-up

21 Empowering users 21 Self-service portal Instance administration (status, start, stop) Manage configuration and logs MySQL: download/upload my.cnf, download slow queries log Oracle: download trace files Set up backups (automatic or manual) or command a restore One button instance upgrade (coming one button system upgrade) Access to monitoring information Behind scenes: J2EE Web Application running on central web servers ZK Framework (Ajax based) SSO (Single Sign On) + SSL for authentication/authorisation JDBC + Apache DBCP connection pooling via JNDI JAX-WS 2.2 for SOAP Web Services Webapp (Java, ZUML, Javascript, CSS, etc): ~ lines of code

22 Backup solution 22 MySQL instances running in binary log mode (InnoDB recommended) Oracle instances running in archivelog mode Backups based on storage snapshots Full online DB backups done just in a few seconds Manual and/or automatic (scheduled) Small storage overhead (depends on instance activity) Point-in-time recovery – easy with snapshots and binary/archive logs Snapshots can be configured to be sent to tape (DR)

23 Backup management 23 Backup configuration panel Backup procedure: mysql> FLUSH TABLES WITH READ LOCK; mysql> FLUSH LOGS; Or Oracle>alter database begin backup; mysql> UNLOCK TABLES; Or Oracle>alter database end backup; snapshot resume … some time later new snapshot

~1 sec Flush tables with read lock

25 Instance restore 25 Owners can request point-in-time recoveries Full restore takes just a few seconds Recovery time depends on number of binary logs/redo logs to replay/apply Warning: snapshots taken after the one used for recovery are lost

26 Instance restore 26 Binary logs TIME Data files Manual snapshot Now Automatic snapshots Point-in-time recovery

27 Framework Monitoring 27 Management server Queries its jobs table regulary (based on Oracle dbms_scheduler) Informs admins in case of: Pending jobs not executed Timed out jobs Failed jobs Lines of code: SQL, PL/SQL ~ 1300

28 Instance Monitoring 28 Evaluated several monitoring products OEM with Pythian plug-in MySQL Enterprise Monitor Monitoring server runs RACMon (in-house development) Implemented using Python ~13k lines & 15k lines of PHP Availability and performance monitoring system for Oracle DBs, MySQL, NAS storage and VM infrastructure ~30 MySQL metrics stored in monitoring DB mysql> show status Selected AWR metrics stored for Oracle instances Admins are notified via (and SMS if needed) about Availability problems Performance issues (OS level and DB checks)

29 Monitoring interface 29

30 One button upgrades 30 DBoD admins prepare upgrade scripts Complete upgrade process is scripted and tested Upgrades of one minor version or several minors possible Owners can decide to upgrade at their convenience - one button upgrade Instance is stopped Binaries are upgraded shared instances must be upgraded at the same time Instance is restarted All post-installation tasks executed

31 HA: Oracle VM 31 CERN has more than 2 years of experience of running Oracle VM in production Easy scale-out of VM Pools 10 Gbit Ethernet – one network only (NAS over NFS) Production on OVM 2.2, currently testing OVM and Live migration for HW interventions and host OS upgrades

32 HA: Oracle CRS (being tested) Well-known clusterware in IT-DB group Requires to modify /etc/init.d/mysql Special case mysqld is suddenly killed → pid file not removed Stop/start operation should be done via crsctl Start-up after server’s boot as well Tar files allow several binaries upgrades mysqld_safe, mysqlbinlog, mysqladmin,… invoked from right distribution and right environment: --basedir=$BASEDIR --bindir=$BINDIR --datadir=$DATADIR Two basic configurations: 2 nodes cluster → critical applications +2 nodes cluster → optimised resources utilization

HA: Oracle CRS Check Start Stop Clean VIP for MySQL instance Enough time for crash recovery HOSTING_MEMBERS + SERVER_POOLS to assign instances

HA: Replication MySQL master → slave replication Monitoring Based on a “ping” table, show slave status not reliable Percona script pt-table-checksum (manual run) Change role “automated” my.cnf variables: Disaster recovery + Upgrades

35 HA comparison TechnologyUse caseTime to recover (seconds) CRSkill MySQL daemon~ 60 cluster relocate~ 20 server crashes~ 20 OracleVMhypervisor crashes~ 360 live migration0 ReplicationRole changeN/A On All cases sysbench was producing some OLTP load

36 Important clients 36 Hosting ~45 instances at CERN for IT and experiments Drupal content management system BOINC – CERN document server Audio video conferencing and webcasts service HammerCloud for experiments Piwik (open source web analytics software) OpenStack Nova Trac for subversion PVSS, Fisheye,…

37 Summary 37 Many lessons learned during the design and implementation of the DBoD service Building Database as a Service helped CERN DB group to Gain experience with MySQL Provide a solution for Oracle database with special needs e.g. Unicode character sets Improve tools and operations Standardize on tools and frameworks Consolidate

38 Acknowledgements DBoD Team: IT-DB group colleagues, specially our Virtual experts! Daniel Gomez Dawid Wojcik Ignacio Coterillo