Oracle, MySQL and PostgreSQL DBMS

Oracle, MySQL and PostgreSQL DBMS
Comparison of Oracle, MySQL and PostgreSQL DBMS in the context of ALICE needs Wiktor Peryt, Warsaw University of Technology, Faculty of Physics

We chose the following features: Elementary features
We have taken the following approach: first of all we determined what features of DBMS are important from the point of view of such a large experiment. We chose the following features: Elementary features basic data types SQL language features declarative integrity constraints programming abstractions automatic generation of identifiers national characters support

Transactions and multi-user access
locks multi-user access Programming in database stored procedures triggers Elements of database administration access control backup copies data migration

Portability and scalability
portability of DBMS scalability Performance and VLDB (Very Large Databases) query optimization structures supporting query optimization support for analytical processing allocation of disk space data size limits VLDB implementations Distributed databases access to multiple databases heterogeneous systems support

Distributed databases
access to multiple databases heterogeneous systems support Special data types large objects in database post-relational extensions support for special data types Application development and interfaces embedded SQL standard interfaces, additional interfaces interoperability with Web technology XML, CASE

Reliability Commercial issues failure recovery
technical support available market position Having completed step one we carried out subsequent work in 3 subgroups; each of them dealt with only one DBMS. The members of particular subgroups had their own practical experience with using DBMS being subject to investigation by their subgroup. Such a procedure gave us the possibility of verifying information contained in manuals and other documentation available (for instance on Internet). As a result 3 extended documents devoted to Oracle, MySQL and PostgreSQL were created.

Konrad Bohuszewicz undergraduate student
Maciej Czyzowicz undergraduate student Michal Janik Ph.D. student Dawid Jarosz undergraduate student Piotr Mazan undergraduate student Marcin Mierzejewski undergraduate student Mikolaj Olszewski undergraduate student Wiktor S. Peryt Sylwester Radomski undergraduate student Piotr Szarwas Ph.D. student Tomasz Traczyk Dominik Tukendorf undergraduate student Jacek Wojcieszuk undergraduate student Faculty of Electronics and Information Technology Faculty of Mathematics and Information Sciences Faculty of Physics

About “Comparison …” discussion by all people involved in this task
compilation was made by Dr. Tomasz Traczyk compilation circulated within the whole group a few times to make sure we avoided some omissions or mistakes this version of the document is accepted by all co-authors we consider it a quite comprehensive and objective comparison it contains also some kind of "weights" called by us "importance", with differentiation for Central database and Lab-participants. Central database should be a kind of data warehouse at CERN, containing all the data, also data transferred from Lab-participants periodically the term "Lab-participants" denotes smaller databases in labs involved in ALICE experiment preparation few explanations of terminology used in the database domain are also included to make this document easy to comprehend for non-specialists

only ORACLE can be taken into account seriously
Our preliminary conclusions: for Central Data Repository for ALICE at CERN: only ORACLE can be taken into account seriously for Labs-participants (mainly for production phase databases): Oracle is also the best but using MySQL or PostgreSQL is possible the choice one of them is not obvious at the moment Some extended tests concerning MySQL and PostgreSQL performance, stability etc. with real data for STAR SSD are still in progress in Warsaw. They will be published in 1-2 weeks on the website: the same place for document “Comparison of Oracle, MySQL and PostgreSQL DBMS”

Questions for ALICE How to start with databases for ALICE and how to manage the project? General concept of system architecture Databases in production phase Software technologies recommended DBMS platform choice How to proceed?

Databases types for ALICE
The following main categories of information should go into databases: production and assembly phase measurements and descriptive data  ProdPhase database calibrations data  Calibration database configuration data  Configuration database detector condition data  Condition database run logs data  RunLog database geometry data (?)  Geometry database or part of Calibration DB (?) some others? ... to be defined later, during "phase one" work

Databases contents (1) ProdPhase database RunLog database
all information coming from test-beds, from manufacturers, assembly processes, object flow between manufacturers and labs, etc. RunLog database to store the summary information describing the contents of an experimental run and to point the locations where detailed information associated with the run is stored Example of Web based interface developed by Sylwester Radomski (undergraduate student from Faculty of Physics, WUT) for STAR can be seen on -> Computing and from table New the first item

Why database in production phase?
The environment in which the archive facility operates is composed of many sources of information We have to deal with data: produced by various test-bench systems entered manually by operators submitted by collaborating institutes and companies Usually there is a number of distinct data formats Files are stored in many locations Consequently, without database: it is not only hard to locate the right piece of information but also to ensure the safety and good quality of data

Goals for production phase database
secure archiving of all the test results in repository easy availability of info upon location of objects (in geographic sense: manufacturers, labs) makes the assembly arrangement easier creating the possibility of automatic assignment of quality attributes according to the well defined criteria statistical analysis of the quality should be made easily and at any time preparing data for future on-line use by slow-control, DCS and DAQ easy access to all data during production and assembly phase In the future - easy access to all data during experiment run

DB production phase Basic requirements:
data should be stored in central repository to make easy and reliable the management and maintenance access to the data should be assured for everybody which participates in tests during production phase, i.e. software allowing use of WEB browsers is necessary objects' registration should be possible manually (by operator with suitable privileges) as well as automatically (from LabVIEW application, for example or other software) The software should allow creating (SQL) queries to the database even for inexperienced users

From the point of view of domain experts ... (1)
there is an ever-increasing demand for centralized storage of data for consistent and easy to use search and retrieval facilities experts want to be able to retrieve and analyze the information in a user-friendly way, regardless of its origin They do not want to be forced to perform several queries just because data in question was taken by different data acquisition systems

From the point of view of domain experts ... (2)
they wish to do statistics on data sets spanning months (and more) without having to browse tens of subdirectories on backup storage devices usually - they prefer to use industry-standard, versatile software tools to process and analyze data they certainly would not mind should they be able to automate their routine, everyday tasks Their task is to look at the information, not to look for it

Requirements addressed to software developers
we should address those issues by providing a modular framework for archiving and for platform- independent retrieval of data in heterogeneous distributed computing environment our database system must be open enough to follow inevitable evolution of information gathering systems related to the development of the particular detectors we should be able to cope with the fast evolving new Internet technologies in order to take full advantage of facilities they provide

DB for STAR - software technologies used
Use of: PHP4 software running on the server side C/C++ for API JAVA + SWING & JDBC 1 for applications requiring more interactivity (JDBC = JAVA DataBase Connectivity) seems to be the right choice of tools used for client side software development

On the flight plots creation ...

From SQL query to plot ... Generation of plots and histograms from database and putting them on the Web. Attempt made by S. Radomski: Data chain: Http server (Apache - Tomacat) calls servlet (dbPlot) with parameter - SQL query. Servlet in http server connects to ROOT based server through socket and sends query ROOT server means ROOT script which handles connections and scripting dbPlot class. dbPlot::Init() reuse existing connection to database or creates new one if the old one does not exist. dbPlot::TakeData() server sends query to DB and takes data using TSQLServer class. dbPlot::TakeData() takes data from TSQLResult and put them to TNtuple. This function can recognise and parse 'private' format of data stored in BLOB. dbPlot::PlotData() calls TTree->Draw() with proper parameters. dbPlot::Style() set colors and labels. ROOT server saves canvas as EPS and then calls convert to produce PNG. ROOT server sends to servlet identification of created PNG file. servlet sends data in PNG format to use agent - Web browser. Tool can be used in a normal ROOT session without http server.

Performance and problems ...
One histogram takes about 1-2 sec. Slowest element in the chain - convert. Convert makes use of GhostScript. Creation of PostScript and then conversion to PNG is overcomplicated and rather simple TGrph with ~758 lines takes ~10 s ROOT cannot generate Gif in -b mode. Problems with memory deallocation in ROOT - after about 100 plots ROOT crashes. Modification of Draw() in Ttree: In 1-D Histogram Draw() always makes 100 bins. If data has its own grid (measurement precision) plots look terribly - especially when histogramming integers. Small modification in TTreePlayer permits to recognize if data are “gridy” and sets number of bins dependently of bounds, grid and number of entries. We suggest to include this code to ROOT release

Typical architecture for local site/lab i.e. Lab-participant
measurements are performed on dedicated computer data are transferred over Ethernet local network to database users can access the measurements by means of JAVA applets or PHP applications graphical user interface make the construction of complex queries easy even for user with no database experience another capability of this applet is the visualisation of selected data it is clear that using JAVA, JDBC and PHP allows to access the database over the Internet or local network with user's favourite browser DB server (daemon) repository JAVA applet PHP applications LabVIEW application DUT ROOT or AliROOT

Production phase database for ALICE
MySQL repository MySQL server (daemon) JAVA applet PHP applications LabVIEW application DUT ROOT or AliROOT Lab 3 Lab 4 Lab n ..... DATA repository Data services Data Archive Server Library ORACLE server (daemon) Application services DBMS AliROOT CERN Lab 2 Lab1 somewhere in Europe

Central data warehouse at CERN three tier architecture
Client layer  modules and applications Filters Generic data loader Custom one can easily distinguish the three logical tiers - according to present tendencies: client layer, application services layer and data services layer each layer contains several components (not all shown on the picture) top level is a layer containing client applications, responsible for data transfer into database and visualisation the middle layer is composed of application services; this layer knows the logical structure and physical locations of data the bottom layer contains data and Database Management System Interactive software: WWW browsers, JAVA applets, PHP, HTML, command line utilities etc... Data Archive Server Library ORACLE server AliROOT Application services Data services DATA repository DBMS

How to start with databases for ALICE?
Project should be managed in few phases Project is large so I strongly suggest to apply methodology proven in "commercial environment"

How to manage the project? (1)
Phase 1: strategy (or planning) for the WHOLE project: determination of scope of the project partitioning into subsystems (natural way:  subdetectors, but not only) formulation of general models which could be applied creation of list of actors/participants approximate time schedule for particular tasks initial choice of software technologies

Successive phases should be performed in "spiral cycle" It simply means that particular subsystems are elaborated successively. Each subsystem must go through the following phases: analysis/conceptual design ® software design ® ® development (programming) ® implementation improvements/corrections in earlier completed subsystems must be continued during the work on successive subsystems simultaneous work on several subsystems is a good practice; easy to apply - especially when more working teams would be involved

work on pilot project - in parallel to the main one; the same software technology, it should contain most urgent things efficiency tests; creation of "simulated data" with capacity volumes similar to the expected ones creation of "conceptual models" during the analysis phase is necessary before design of subsystems; the appropriate formalism and class CASE tools are needed for that. For linux - UML (Unified Modelling Language) is a appropriate option elaboration for the whole project of such standards as: system of keys, terminology, security, access rights etc.

First steps ... Start to formally organize database central group for ALICE After that: begin phase 1 of the project, i.e. strategy for the WHOLE project/experiment Partial, of highest priority tasks for this group: determination of scope of the project formulation of general models which could be applied creation of list of actors/participants (including 1-2 representatives from each subdetector!) initial choice of software technologies which could be used partitioning into subsystems analysis/conceptual design

Oracle, MySQL and PostgreSQL DBMS

Similar presentations

Presentation on theme: "Oracle, MySQL and PostgreSQL DBMS"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Oracle, MySQL and PostgreSQL DBMS

Similar presentations

Presentation on theme: "Oracle, MySQL and PostgreSQL DBMS"— Presentation transcript:

Similar presentations

About project

Feedback