ORACLE & VLDB Nilo Segura IT/DB - CERN. VLDB The real world is in the Tb range (British Telecom - 80Tb using Sun+Oracle) Data consolidated from different.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce
5 Copyright © 2005, Oracle. All rights reserved. Managing Database Storage Structures.
Help! My table is getting too big! How to divide and conquer SQL Relay 2014.
A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
Introduction to DBA.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
5 Creating the Physical Model. Designing the Physical Model Phase IV: Defining the physical model.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Backup Concepts. Introduction Backup and recovery procedures protect your database against data loss and reconstruct the data, should loss occur. The.
© 2009 Oracle Corporation. S : Slash Storage Costs with Oracle Automatic Storage Management Ara Vagharshakian ASM Product Manager – Oracle Product.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
Systems analysis and design, 6th edition Dennis, wixom, and roth
CSC271 Database Systems Lecture # 30.
CERN IT Department CH-1211 Geneva 23 Switzerland t Experience with NetApp at CERN IT/DB Giacomo Tenaglia on behalf of Eric Grancher Ruben.
Database Administration TableSpace & Data File Management
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 9 Index Management.
Oracle on Windows Server Introduction to Oracle10g on Microsoft Windows Server.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
CERN/IT/DB A Strawman Model for using Oracle for LHC Physics Data Jamie Shiers, IT-DB, CERN.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Srik Raghavan Principal Lead Program Manager Kevin Cox Principal Program Manager SESSION CODE: DAT206.
5 Copyright © 2005, Oracle. All rights reserved. Managing Database Storage Structures.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 4 Logical & Physical Database Design
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
D0 Taking Stock1 By Anil Kumar CD/CSS/DSG June 06, 2005.
Creating Indexes on Tables An index provides quick access to data in a table, based on the values in specified columns. A table can have more than one.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
CERN/IT/DB Oracle9i & VLDB Montse Collados Polidura, IT/DB Database Workshop - July 2001.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
SCALING AND PERFORMANCE CS 260 Database Systems. Overview  Increasing capacity  Database performance  Database indexes B+ Tree Index Bitmap Index 
CERN/IT/DB DB US Visit Oracle Visit August 20 – [ plus related news ]
Parallel IO for Cluster Computing Tran, Van Hoai.
LHC Logging Cluster Nilo Segura IT/DB. Agenda ● Hardware Components ● Software Components ● Transparent Application Failover ● Service definition.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
The Database Project a starting work by Arnauld Albert, Cristiano Bozza.
Virtual Server Server Self Service Center (S3C) JI July.
Oracle 10g database installation kit  A bundle of scripts which allows to install Oracle 10g database server on a single node: Useful for both experienced.
2 Copyright © 2006, Oracle. All rights reserved. RAC and Shared Storage.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Managing Data Resources File Organization and databases for business information systems.
Oracle Storage Performance Studies
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
Data Lifecycle Review and Outlook
Database Systems Instructor Name: Lecture-3.
Database System Architectures
Presentation transcript:

ORACLE & VLDB Nilo Segura IT/DB - CERN

VLDB The real world is in the Tb range (British Telecom - 80Tb using Sun+Oracle) Data consolidated from different sources to build Data Warehouses and using Data mining techniques to extract useful information Data is always READ only Physics data has similar characteristics

VLDB Current size limits : Solaris 64bits + Oracle 64bits = 4Pb per database This is more or less SunStoredge A1000 units (216Gb per unit today) The current technology does not allow us to store on-line all this data in a manageable way

VLDB A typical LHC experiment will get several Petabytes of raw data But the end user ought not to have access to all this. We need to process/group/summarize it following certain criterias This also means more disk space (if we want to keep everything on-line)

VLDB Not to mention backup…we need to keep our data safe (have you devised your backup strategy?) RAID technology to help us to increase the performance and availability RAID 0+1(best) or RAID 5(cheaper) Today, we should have raw data on tapes and reconstructed data on-line

VLDB Now some software tricks to deal with all this amount of data –Partitioning –Parallel DML –Materialized Views –Parallel Server (Cluster configuration) –Bitmapped indexes?

VLDB Partitioning A large amount of data can be divided into physically independent structures according to a certain criteria However the user continues to see the same logical structure Partition keys can be defined by range or using a hash function The system can discard partitions based on the user’s queries, reducing the amount of data to be processed

VLDB Parallel DML –A single select/insert/delete/update can be executed by several processes (slaves) coordinated by a master process –Naturally leads to a better use of SMP machines –The degree of parallelism can be set by the user of automatically by the system (testing is a must) –Parallel insert/update/delete does need partitioning –You need to plan carefully your I/O system

VLDB Materialized views –Normal views are just a name for a SQL query with no real data associated (until runtime) –This can be very costly if we run it regularly –MV is just a view with all the data that satisfy the query already there (like in a normal table) –It can be refreshed (manually or automatically) to reflect the dynamic nature of the view

Parallel Server –Several nodes are attacking the same database that is on a disk system shared by all the nodes in the cluster –Users can be assigned to different nodes (load balancing) –Intra parallelism – queries are executed across the different nodes –At CERN there are 3 Sun Clusters (2 for DB, 1 for Web) and 1 Compaq –There is no such thing for Linux (yet)

Others Another point is how to distribute data amongst the different Institutes Network , Tapes + Post … It would be nice to have a plug-in plug-out mechanism in the database This is called transportable tablespaces We may also use the database replication option but….

Conclusion Do not despair by the size of problem We are trying to solve tomorrow’s problem using today’s technology So keep an eye open and be VERY flexible in your model to be able to adapt quickly and painlessly Do never declare you model frozen, it is a mistake, try to improve it, adopt new technologies (if it is a benefit)