File-Metadata Management System For The LHCb Experiment Carmine Cioffi Department of Physics, University of Oxford CHEP04 Interlaken, 27 September 2004.

Slides:



Advertisements
Similar presentations
WP2: Data Management Gavin McCance University of Glasgow.
Advertisements

Data Management Expert Panel - WP2. WP2 Overview.
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
Introduction to Databases
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
Let’s try Oracle. Accessing Oracle The Oracle system, like the SQL Server system, is client / server. For SQL Server, –the client is the Query Analyser.
Chapter 2 Data Models Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
CSE 190: Internet E-Commerce Lecture 10: Data Tier.
Multiple Tiers in Action
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 1- 1.
Information systems and databases Database information systems Read the textbook: Chapter 2: Information systems and databases FOR MORE INFO...
DBMS1 Database Management System (DBMS) Introductory Concepts Week-1.
A Comparsion of Databases and Data Warehouses Name: Liliana Livorová Subject: Distributed Data Processing.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
IST Databases and DBMSs Todd S. Bacastow January 2005.
9 Feb 2004Mikko Mäkinen & Saija Ylönen Joint UNECE/Eurostat/OECD work session on statistical metadata (METIS) Geneva, 9-11 February 2004, Topic (ii): Metadata.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
2 1 Chapter 2 Data Models Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Technical Session By: Prof. Adarsh Patel.
CST203-2 Database Management Systems Lecture 2. One Tier Architecture Eg: In this scenario, a workgroup database is stored in a shared location on a single.
This material is based upon work supported by the U.S. Department of Energy Office of Science under Cooperative Agreement DE-SC Michigan State.
Eurotrace Hands-On The Eurotrace File System. 2 The Eurotrace file system Under MS ACCESS EUROTRACE generates several different files when you create.
311: Management Information Systems Database Systems Chapter 3.
Introduction to Database Systems Fundamental Concepts Irvanizam Zamanhuri, M.Sc Computer Science Study Program Syiah Kuala University Website:
Computer Science 101 Database Concepts. Database Collection of related data Models real world “universe” Reflects changes Specific purposes and audience.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Instructor: Dema Alorini Database Fundamentals IS 422 Section: 7|1.
CERN - IT Department CH-1211 Genève 23 Switzerland t DB Development Tools Benthic SQL Developer Application Express WLCG Service Reliability.
Chapter(1) Introduction and conceptual modeling. Basic definitions Data : know facts that can be recorded and have an implicit. Database: a collection.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
Deep Dive into Data Management in SharePoint applications Raj Chaudhuri.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
EGEE is a project funded by the European Union under contract IST “Interfacing to the gLite Prototype” Andrew Maier / CERN LCG-SC2, 13 August.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
Interface for Glyco Vault Functionality and requirements. Initial proposal. Maciej Janik.
LHCb File-Metadata: Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 04 July 2006.
EbXML Registry and Repository Dept of Computer Engineering Khon Kaen University.
E.Bertino, L.Matino Object-Oriented Database Systems 1 Chapter 9. Systems Seoul National University Department of Computer Engineering OOPSLA Lab.
Andrea Valassi (CERN IT-DB)CHEP 2004 Poster Session (Thursday, 30 September 2004) 1 HARP DATA AND SOFTWARE MIGRATION FROM TO ORACLE Authors: A.Valassi,
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
SQL Query Analyzer. Graphical tool that allows you to:  Create queries and other SQL scripts and execute them against SQL Server databases. (Query window)
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
AMGA-Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 05 July 2006.
Copyright © 2004 Pearson Education, Inc. Chapter 1 Introduction and Conceptual Modeling.
Hoi Le. Why database? Spreadsheet is not good to: Store very large information Efficiently update data Use in multi-user mode Hoi Le2.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Physical Layer of a Repository. March 6, 2009 Agenda – What is a Repository? –What is meant by Physical Layer? –Data Source, Connection Pool, Tables and.
A web based tool for estimation of Gage R&R and Measurement Uncertainty Siva Venkatachalam & Dr. Jay Raja Center for Precision Metrology The University.
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
Generating XML Data from a Database Eugenia Fernandez IUPUI.
Data Resource Management Data Concepts Database Management Types of Databases Chapter 5 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies,
Fundamental of Database Systems
Chapter 1: Introduction
CS422 Principles of Database Systems Course Overview
Introduction What is a Database?.
New developments on the LHCb Bookkeeping
Chapter 1: Introduction
Status and plans for bookkeeping system and production tools
Best Practices in Higher Education Student Data Warehousing Forum
Presentation transcript:

File-Metadata Management System For The LHCb Experiment Carmine Cioffi Department of Physics, University of Oxford CHEP04 Interlaken, 27 September 2004

CHEP04 Interlaken 27 September 2004 File-Metadata Management system2 Outline What are Metadata and why we need them in the LHCb experiment. The File-Metadata Management System –The two schema strategy –XML and the warehousing database –Services and specialised views –Relationship between the warehousing database and views. –Web Services ARDA and future planning

CHEP04 Interlaken 27 September 2004 File-Metadata Management system3 Metadata Generally speaking, metadata are data which characterise data-files The two facets of metadata –Job provenance: Everything you ever wanted to know about how a data-file was created –Bookkeeping: How do I identify the datasets I am interested in for my analysis ? Metadata are needed to get straight to the files of interest, avoiding unnecessary access to the data storage.

CHEP04 Interlaken 27 September 2004 File-Metadata Management system4 The two schema strategy The two schema strategy consists of having a Database (Warehousing DB) and a View of it, both with their own schema. –The Warehousing DataBase (WDB) is meant to store data in a simple way but be flexible enough to accept new data. –The View is designed to be efficient for the service it is made for.

CHEP04 Interlaken 27 September 2004 File-Metadata Management system5 Entity-Relationship model for WDB

CHEP04 Interlaken 27 September 2004 File-Metadata Management system6 XML and the insertion of data Due to the key-value strategy the WDB is liable to be corrupted: –Any data with any semantic can be inserted. –Partial information can be inserted. To prevent this the data must be presented in XML format. In this way, using a predefined DTD/XML-SCHEMA it is possible to verify the correctness of the data.

CHEP04 Interlaken 27 September 2004 File-Metadata Management system7 The DTD for the insertion of a job related metadata – –<!ATTLIST Job ConfigName CDATA #REQUIRED – ConfigVersion CDATA #REQUIRED – Date CDATA #REQUIRED> –<!ATTLIST JobOption Recipient CDATA #REQUIRED – Name CDATA #REQUIRED – Value CDATA #REQUIRED> –<!ATTLIST TypedParameter Name CDATA #REQUIRED – Value CDATA #REQUIRED – Type (Info|Environment_Variable) #REQUIRED> – –<!ATTLIST OutputFile Name CDATA #REQUIRED – TypeName CDATA #REQUIRED – TypeVersion CDATA #REQUIRED> –<!ATTLIST Parameter Name CDATA #REQUIRED – Value CDATA #REQUIRED> –<!ATTLIST Quality Group CDATA #REQUIRED – Flag CDATA #REQUIRED>

CHEP04 Interlaken 27 September 2004 File-Metadata Management system8 Services and the specialised views Sometimes complex SQL queries do not work well for bulk lookups. –But the WDB contains all the information about the file that can be used to generate specialised views for specific service. Knowing the service, the views can be optimised to give the best performance.

CHEP04 Interlaken 27 September 2004 File-Metadata Management system9 Replica FILE_ID REPLICA LOCATION DT_JobSummary JOB_ID CONFIG DBVERSION EVENTTYPE JOBDATE LABORATORY PROGRAM0 INPUTFILE0 PROGRAM1 INPUTFILE1 PROGRAM2 INPUTFILE2 DT_FileSummary FILE_ID JOB_ID EVENTTYPE EVENTDESCRIPTION NBEVENTS FILETYPE FILENAME FILESIZE Jython Web Server SERVLETS XMLRPC SPECIALISED VIEW SCHEMA Web Browser Example of view with service and applications This example shows the specialised view that sits on back of the XMLRPC and SERVLETS Services. These services are used by GANGA and the Web Browser. GANGA application

CHEP04 Interlaken 27 September 2004 File-Metadata Management system10 Jobs JobParams FileParams Files TypeParams ConfigNameConfigVersion Date ValueName Type LogName ValueName ValueName QualityParams ValueName Replica FILE_ID REPLICA LOCATION DT_JobSummary JOB_ID CONFIG DBVERSION EVENTTYPE JOBDATE LABORATORY PROGRAM0 INPUTFILE0 PROGRAM1 INPUTFILE1 PROGRAM2 INPUTFILE2 DT_FileSummary FILE_ID JOB_ID EVENTTYPE EVENTDESCRIPTION NBEVENTS FILETYPE FILENAME FILESIZE Generation of the specialised View Warehouse DB Specialised View Done periodically or on demand based on the needs of the experiment (every night for LHCb). This is fast despite the fact that WDB contains many GB. SQL script

CHEP04 Interlaken 27 September 2004 File-Metadata Management system11 Some Numbers LHCb is using ORACLE 9i technology for its DB –It is hosted on a cluster of two Sun Fire 280R machine –Each with two processors of 750MHz –2 GB RAM –600 GB HD The DB contains ~20GB of data –Shared between real data and indexing tables –~2M jobs rows –~5.5M files rows –~57M rows in parameters.

CHEP04 Interlaken 27 September 2004 File-Metadata Management system12 LHCb services Actually LHCb is using two services to access the information from the databases: –Servlet service : the service allows the selection of datasets based on their history (job provenance) by the web browser. –XML-RPC service: access to and modification of the WDB data allow GANGA to access Bookkeeping data.

CHEP04 Interlaken 27 September 2004 File-Metadata Management system13 Collaboration with ARDA LHCb has engaged a collaboration with ARDA: –Definition of metadata and understanding of LHCb requirements –Elaboration of a new interface for the manipulation of file- metadata. –Possible technology (WSDL). –See how this will fit with the already existing LHCb system. Stress-test the Bookkeeping services, analysing various behaviours: –Different number of clients –Different queries –Comparison with direct RPC calls Implement the new defined interface –Using the actual LHCb File-Metadata DB as back-end –Using the technology developed with ARDA

CHEP04 Interlaken 27 September 2004 File-Metadata Management system14 CONCLUSIONS The two schema strategy works well for LHCb, and with the DC04 its flexibility was well proven, indeed no changes were required to the WDB although new data have been stored. Because of key-value nature of the WDB it can be easily adapted for warehousing of any data, including that of other experiments.