Eric Grancher, Maaike Limper, CERN IT department Oracle at CERN Database technologies Image courtesy of Forschungszentrum.

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

Exadata Distinctives Brown Bag New features for tuning Oracle database applications.
A Fast Growing Market. Interesting New Players Lyzasoft.
Harvard University Oracle Database Administration Session 2 System Level.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
CERN IT Department CH-1211 Geneva 23 Switzerland t Sequential data access with Oracle and Hadoop: a performance comparison Zbigniew Baranowski.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
MySQL Data Warehousing Survival Guide Marius Moscovici Steffan Mejia
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Scale-out databases for CERN use cases Strata Hadoop World London 6 th of May,2015 Zbigniew Baranowski, CERN IT-DB.
1 Copyright © 2009, Oracle. All rights reserved. Exploring the Oracle Database Architecture.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
CERN IT Department CH-1211 Geneva 23 Switzerland t Experience with NetApp at CERN IT/DB Giacomo Tenaglia on behalf of Eric Grancher Ruben.
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Oracle on Windows Server Introduction to Oracle10g on Microsoft Windows Server.
Luca Canali, CERN Dawid Wojcik, CERN
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Goodbye rows and tables, hello documents and collections.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Introduction to Hadoop and HDFS
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
ORACLE GOLDENGATE AT CERN
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
IMDGs An essential part of your architecture. About me
Copyright 2006 MySQL AB The World’s Most Popular Open Source Database MySQL Cluster: An introduction Geert Vanderkelen MySQL AB.
Oracle9i Performance Tuning Chapter 12 Tuning Tools.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
Development of Hybrid SQL/NoSQL PanDA Metadata Storage PanDA/ CERN IT-SDC meeting Dec 02, 2014 Marina Golosova and Maria Grigorieva BigData Technologies.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
6 May 2014 CERN openlab IT Challenges workshop, Kacper Szkudlarek, CERN Manuel.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
08-Nov Database TEG workshop, Nov 2011 ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements Gancho Dimitrov.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
CERN IT Department CH-1211 Geneva 23 Switzerland t Oracle Tutorials CERN June 8 th, 2012 Performance Tuning.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Database Competence Centre openlab Major Review Meeting nd February 2012 Maaike Limper Zbigniew Baranowski Luigi Gallerani Mariusz Piorkowski Anton.
Nov 2006 Google released the paper on BigTable.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
BNL Oracle database services status and future plans Carlos Fernando Gamboa, John DeStefano, Dantong Yu Grid Group, RACF Facility Brookhaven National Lab,
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
RAC aware change In order to reduce the cluster contention, a new scheme for the insertion has been developed. In the new scheme: - each “client” receives.
Eric Grancher CERN IT department Oracle and storage IOs, explanations and experience at CERN CHEP 2009 Prague [id. 28] Image courtesy.
Considerations for database servers Castor review – June 2006 Eric Grancher, Nilo Segura Chinchilla IT-DES.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
Exadata Distinctives 988 Bobby Durrett US Foods. What is Exadata? Complete Oracle database platform Disk storage system Unique to Exadata – intelligent.
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
1 PVSS Oracle scalability Target = changes per second (tested with 160k) changes per client 5 nodes RAC NAS 3040, each with one.
The PVSS Oracle Archiver FW WG 6 th July Credits Many people involved IT/DES: Eric Grancher, Nilo Segura, Chris Lambert IT/PSS: Luca Canali ALICE:
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Eric Grancher CERN IT department Overview of Database Technologies Computing and Astroparticle Physics 2 nd ASPERA Workshop /1.
Jean-Philippe Baud, IT-GD, CERN November 2007
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
How To Pass Oracle 1z0-060 Exam In First Attempt?
Open Source distributed document DB for an enterprise
Operational & Analytical Database
Oracle at CERN Database technologies
Oracle Storage Performance Studies
Case studies – Atlas and PVSS Oracle archiver
ASM-based storage to scale out the Database Services for Physics
Data Lifecycle Review and Outlook
Oracle Streams Performance
Performance And Scalability In Oracle9i And SQL Server 2000
Presentation transcript:

Eric Grancher, Maaike Limper, CERN IT department Oracle at CERN Database technologies Image courtesy of Forschungszentrum Jülich / Seitenplan, with material from NASA, ESA and AURA/Caltech

Outline Experience at CERN –Current usage and deployment –Replication –Lessons from last 15 years Oracle and MySQL Trends –Flash –Compression –Open source “relational” databases –NoSQL databases 2

CERN IT-DB Services - Luca Canali 3

4 CERN databases services –~130 databases, most of them database clusters (Oracle RAC technology RAC, 2 – 6 nodes) –Currently over 3000 disk spindles providing more than ~3PB raw disk space (NAS and SAN) –MySQL service Some notable databases at CERN –Experiments’ databases – 14 production databases Currently between 1 and 12 TB in size Expected growth between 1 and 10 TB / year –LHC accelerator logging database (ACCLOG) – ~120TB, >5*10 12 rows, expected growth up to 70TB / year –... Several more DBs in the 1-10 TB range CERN databases in numbers original slide by Luca Canali

5 LHC logging service, > 5 * rows 1 year 50 TB

Key role for LHC physics data processing Online acquisition, offline production, data (re)processing, data distribution, analysis SCADA, conditions, geometry, alignment, calibration, file bookkeeping, file transfers, etc.. Grid Infrastructure and Operation services Monitoring, Dashboards, User-role management,.. Data Management Services File catalogues, file transfers and storage management, … Metadata and transaction processing for custom tape-based storage system of physics data Accelerator control and logging systems AMS as well: data/mc production bookkeeping and slow control data 6 original slide by Luca Canali

CERN openlab and Oracle Streams Worldwide distribution of experimental physics data using Oracle Streams  Huge effort, successful outcome 7 slide by Eva Dafonte Pérez

CMS LHCb ATLAS COMPASS ALICE 8 slide by Eva Dafonte Pérez

9

10 Oracle is fully instrumented All actions –Network related –IO related –Internals (cluster communication, space management, etc.) –Application related (transaction locks, etc.) –etc. Key for “scientific” performance understanding.

UKOUG Conference The Tuning Process 1. run the workload, gather ASH/AWR information, 10046… 2. find the top event that slows down the processing 3. understand why time is spent on this event 4. modify client code, database schema, database code, hardware configuration

12 Demo 1 Single.java –Tanel Poder’s snapper.sql –Disable autocommit –Tanel Poder’s snapper.sql Batch.java

13 ProgramTime per rowTop wait eventNotable statistics Single0.76 ms / rowlog file sync 57.1% User commits=user calls=1.4k/s Single autocommit=false 0.31 ms/rowSQL*Net message from client 69.9% Requests to/from client = execute count= 3.69k/s Batch mslog file sync 18.4%, SQL*Net message from client 19.0% * 2.45 * 267

14 Oracle Linux “perf top”

15 Oracle Real Application Cluster from Oracle9i Real Application Clusters Deployment and Performance

16 PVSS Oracle scalability Target = changes per second (tested with 160k) changes per client 5 nodes RAC NAS 3040, each with one aggregate of 13 disks (10k rpm FC)

UKOUG Conference PVSS Tuning (1/6) Shared resource: EVENTS_HISTORY (ELEMENT_ID, VALUE…) Each client “measures” input and registers history with a “merge” operation in the EVENTS_HISTORY table Performance: 100 “changes” per second Table events_ history trigger on update eventlast … merge (…) 150 ClientsDB ServersStorage Update eventlastval set … Table event lastval

UKOUG Conference PVSS Tuning (2/6) Initial state observation: database is waiting on the clients “SQL*Net message from client” Use of a generic library C++/DB Individual insert (one statement per entry) Update of a table which keeps “latest state” through a trigger

UKOUG Conference PVSS Tuning (3/6) Changes: bulk insert to a temporary table with OCCI, then call PL/SQL to load data into history table Performance: 2000 changes per second Now top event: “db file sequential read” EventWaitsTime(s)Percent Total DB TimeWait Class db file sequential read29, User I/O enq: TX - contention Other CPU time log file parallel write1, System I/O db file parallel write3, System I/O awrrpt_1_5489_5490.htm l

UKOUG Conference PVSS Tuning (4/6) Changes: Index usage analysis and reduction Table structure changes. IOT. Replacement of merge by insert. Use of “direct path load” with ETL Performance: “changes” per second Now top event: cluster related wait event test5_rac_node1_8709_8710.html EventWaitsTime(s)Avg Wait(ms) % Total Call Time Wait Class gc buffer busy27, Cluster CPU time gc current block busy6, Cluster gc current grant busy24, Cluster gc current block 2- way 118, Cluster

UKOUG Conference PVSS Tuning (5/6) Changes: Each “client” receives a unique number. Partitioned table. Use of “direct path load” to the partition with ETL Performance: changes per second Now top event : “freezes” once upon a while rate75000_awrrpt_2_872_873.html EventWaitsTime(s) Avg Wait(ms) % Total Call TimeWait Class row cache lock Concurrency gc current multi block request7, Cluster CPU time log file parallel write1, System I/O undo segment extension785, Configuration

UKOUG Conference PVSS Tuning (6/6) Problem investigation: Link between foreground process and ASM processes Difficult to interpret ASH report, trace Problem identification: ASM space allocation is blocking some operations Changes: Space pre-allocation, background task. Result: Stable “changes” per second.

UKOUG Conference PVSS Tuning Schema 150 ClientsDB ServersStorage Table events_ history PL/SQL: insert /*+ APPEND */ into eventh (…) partition PARTITION (1) select … from temp Temp table Table events_ history trigger on update eventlast … merge (…) Table event lastval Update eventlastval set … Bulk insert into temp table

UKOUG Conference PVSS Tuning Summary Conclusion: from 100 changes per second to “changes” per second 6 nodes RAC (dual CPU, 4GB RAM), 32 disks SATA with FCP link to host 4 months effort: – Re-writing of part of the application with changes interface (C++ code) –Changes of the database code (PL/SQL) –Schema change –Numerous work sessions, joint work with other CERN IT groups

Overload at CPU level (1/) Observed many times: “the storage is slow” (and storage administrators/specialists say “storage is fine / not loaded”) Typically happens that observed (from Oracle rdbms point of view) IO wait times are long if CPU load is high Instrumentation / on-off cpu 25

Overload at CPU level (2/) example 26 load growing hit load limit ! 15k... 30k... 60k... 90k k...135k... || 150k (insertions per second) Insertion time (ms), has to be less than 1000ms

OS level / high-load 27 time Oracle OS IO t1t2 Acceptable load Oracle OS IO t1t2t1t2 High load Off cpu t1t2t1t2

Overload at CPU level (3/), Dtrace Dtrace (Solaris) can be used at OS level to get (detailed) information at OS level 28 syscall::pread:entry /pid == $target && self->traceme == 0 / { self->traceme = 1; self->on = timestamp; self->off= timestamp; self->io_start=timestamp; } syscall::pread:entry /self->traceme == 1 / { self->io_start=timestamp; } syscall::pread:return /self->traceme == 1 / = = = count(); }

Dtrace 29 sched:::on-cpu /pid == $target && self->traceme == 1 / { self->on = = quantize(self->on - = sum(self->on - = avg (self->on - = count(); } sched:::off-cpu /self->traceme == 1/ { self->off= = sum(self->off - = avg(self->off - = quantize(self->off - = count(); } tick-1sec /i++ >= 5/ { exit(0); }

Dtrace, “normal load” 30 -bash-3.00$ sudo./cpu.d4 -p dtrace: script './cpu.d4' matched 7 probes CPU ID FUNCTION:NAME :tick-1sec avg_cpu_on avg_cpu_off avg_io [...] 1 off-cpu value Distribution count | | | | 0 [...] count_cpu_on 856 count_io 856 count_cpu_off 857 total_cpu_on total_cpu_off

Dtrace, “high load” 31 -bash-3.00$ sudo./cpu.d4 -p dtrace: script './cpu.d4' matched 7 probes CPU ID FUNCTION:NAME :tick-1sec avg_cpu_on avg_cpu_off avg_io [...] 1 off-cpu value Distribution count 8192 | | | | | | | | | 0 [...] count_io 486 count_cpu_on 503 count_cpu_off 504 total_cpu_on total_cpu_off

32 Lessons learnt Aiming for high-availability is (often) adding complexity … and complexity is the enemy of availability Scalability can be achieved with Oracle Real Application Cluster (150k entries/s for PVSS) Database / application instrumentation is key for understanding/improving performance NFS/D-NFS/pNFS are solutions to be considered for stability and scalability (very positive experience with NetApp, snapshots, scrubbing, etc.) Database independence is very complex if performance is required Hiding IO errors from the database leaves the database handle what it is best at (transactions, query optimisation, coherency, etc.)

Incidents review 33

Evolution Flash Large memory systems, in – memory databases Compression Open source “relational” databases NoSQL databases 34

Flash, memory and compression Flash changes the picture in the database area IO –Sizing for IO Operations Per Second –Usage of fast disks for high number of IOPS and latency Large amount of memory –Enables consolidation and virtualisation (less nodes) –Some databases fully in memory Compression is gaining momentum for databases –For example Oracle’s hybrid columnar compression –Tiering of storage 35

Exadata Hybrid Columnar Compression on Oracle 11gR2 Compression factor 36 slide by Svetozar Kapusta

Local analysis of data “Big Data”, analysis of large amount of data in reasonable of time Goole MapReduce, Apache Hadoop implementation Oracle Exadata –Storage cells perform some of the operations locally (Smart Scans, storage index, column filtering, etc.) Greenplum –shared-nothing massively parallel processing architecture for Business Intelligence and analytical processing Important direction for at least the first level of selection 37

Open source relational databases MySQL and PostgreSQL Some components are not “free”, replacements exist (for example for hot backup Percona XtraBackup) MySQL default for many applications (Drupal, etc.) PostgreSQL has a strong backend with Oracle-like features (stored procedures, write-ahead logging, “standby”, etc.) 38

NoSQL databases 1/2 Not about SQL limitations/issues –about scalability –Big Data scale-out Feature-rich SQL engines are complex, lead to some unpredictability Workshop at CERN (needs and experience) ATLAS Computing Technical Interchange Meeting 39

NoSQL databases 2/2 Lot of differences –Data models (Key-Value, Column Family, Key- Document, Graph, etc.) –Schema free storage –Queries complexity –Mostly not ACID (Atomicity, Consistency, Isolation and Durability), data durability relaxed compared to traditional SQL engines –Performance with sharding –Compromise on consistency to keep availability and partition tolerance -> application is at the core, evolution? 40

Summary Oracle –Critical component for LHC accelerator and physics data processing –Scalable and stable, including data replication CERN central services run on Oracle, for which we have components and experience to build high availability, guaranteed data, scalability MySQL as a “DataBase On Demand” service –Nice features and light, lacks some scalability and High- Availability features for some service requirements NoSQL/scale-out SQL is being considered –Ecosystem is evolving rapidly (architecture, designs and interfaces subject to change!) 41

References NoSQL ecosystem, Database workshop at CERN and ATLAS Computing Technical Interchange Meeting Eva Dafonte Pérez, UKOUG 2009 “Worldwide distribution of experimental physics data using Oracle Streams” Luca Canali, CERN IT-DB Deployment, Status, Outlook CERN openlab, CAP theorem, ACID,

Backup slides 43

Incidents review 44

Oracle EU director for research, Monica Marinucci Strong collaboration with CERN and universities Well known solution, easy to find database administrators and developers, training available Support and licensing 45