Alice Off-line Week, February 24th, 2005

Slides:



Advertisements
Similar presentations
Connecting to Databases. relational databases tables and relations accessed using SQL database -specific functionality –transaction processing commit.
Advertisements

The AMGA metadata catalog Riccardo Bruno - INFN Madrid, 07-11/05/2007.
Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
15 Chapter 15 Web Database Development Database Systems: Design, Implementation, and Management, Fifth Edition, Rob and Coronel.
CS490T Advanced Tablet Platform Applications Network Programming Evolution.
Databases and the Internet. Lecture Objectives Databases and the Internet Characteristics and Benefits of Internet Server-Side vs. Client-Side Special.
Week 7 Lecture Web Database Development Samuel Conn, Asst. Professor
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013 Relational APDM & Relational ASDM models effort done in online.
LHCb week, 27 May 2004, CERN1 Using services in DIRAC A.Tsaregorodtsev, CPPM, Marseille 2 nd ARDA Workshop, June 2004, CERN.
INFSO-RI Enabling Grids for E-sciencE Distributed Metadata with the AMGA Metadata Catalog Nuno Santos, Birger Koblitz 20 June 2006.
INFSO-RI Enabling Grids for E-sciencE AMGA Metadata Server - Metadata Services in gLite (+ ARDA DB Deployment Plans with Experiments)
Enabling Grids for E-sciencE EGEE-III INFSO-RI I. AMGA Overview What is AMGA Metadata Catalogue of EGEE’s gLite 3.1 Middleware Main Feature of.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks AMGA PHP API Claudio Cherubino INFN - Catania.
Mike Jackson EPCC OGSA-DAI Architecture + Extensibility OGSA-DAI Tutorial GGF17, Tokyo.
Metadata Mòrag Burgon-Lyon University of Glasgow.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Kemal Baykal Rasim Ismayilov
Benjamin Post Cole Kelleher.  Availability  Data must maintain a specified level of availability to the users  Performance  Database requests must.
Lattice QCD Data Grid Middleware: Meta Data Catalog (MDC) -- CCS ( tsukuba) proposal -- M. Sato, for ILDG Middleware WG ILDG Workshop, May 2004.
Web Technologies Lecture 8 Server side web. Client Side vs. Server Side Web Client-side code executes on the end-user's computer, usually within a web.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Evaluating Metadata access strategies with.
AJAX and REST. Slide 2 What is AJAX? It’s an acronym for Asynchronous JavaScript and XML Although requests need not be asynchronous It’s not really a.
Summary of Metadata Workshop Peter Hristov 28 February 2005 Alice Computing Day.
Matthew Farrellee Computer Sciences Department University of Wisconsin-Madison Condor and Web Services.
FP6−2004−Infrastructures−6-SSA Enabling Grids for E-sciencE The AMGA Metadata Catalog Introduction and hands-on exercises Nuno Santos.
AMGA-Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 05 July 2006.
EGEE is a project funded by the European Union under contract IST Feedback on the gLite middleware Dietrich Liko / IT - LCG ARDA Workshop,
Aleksandar Drašković Enterprise Architect deroso Solutions GmbH Data shredding: a deep dive into SharePoint 2013 storage architecture.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
National College of Science & Information Technology.
Java Web Services Orca Knowledge Center – Web Service key concepts.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
ODBC, OCCI and JDBC overview
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Sofia, Bulgaria,
Security and Replication of Metadata with AMGA
WEB SERVICES.
Cosc 5/4730 REST services.
Metadata Services on the GRID
BDII Performance Tests
AJAX and REST.
Unit – 5 JAVA Web Services
 .NET CORE
POOL persistency framework for LHC
PHP / MySQL Introduction
LCG middleware and LHC experiments ARDA project
New developments on the LHCb Bookkeeping
PostgreSQL Database and C++ Interface (and Midterm Topics)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
DUCKS – Distributed User-mode Chirp-Knowledgeable Server
WEB API.
1 Demand of your DB is changing Presented By: Ashwani Kumar
POOL/RLS Experience Current CMS Data Challenges shows clear problems wrt to the use of RLS Partially due to the normal “learning curve” on all sides in.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Grid Data Integration In the CMS Experiment
Triple Stores.
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Plovdiv, Bulgaria,
67th IETF meeting netconf WG
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Sofia, Bulgaria,
The AMGA metadata catalog
SLAC monitoring Web Services
Database Management Systems
Metadata Services on the GRID
Graduation Project #1 University Internet Student Registration System
Presentation transcript:

Alice Off-line Week, February 24th, 2005 ARDA Experiences with Metadata Alice Off-line Week, February 24th, 2005 Nuno Santos Metadata Solutions of the experiments A Generic Interface to Metadata An ARDA Metadata Service Prototype SOAP: A good way to Access Metadata? Conclusions

Experience with Metadata ARDA tested several Metadata solutions from the experiments: LHCb Bookkeeping – XML-RPC with Oracle backend CMS: RefDB - PHP in front of MySQL, giving back XML tables Atlas: AMI - SOAP-Server in Java in front of MySQL gLite (Alien Metadata) - Perl in front of MySQL parsing command, streaming back text Tested performance, scalability and features.

Results taken on November 2004 AMI & RefDB (SOAP) Both AMI & RefDB ship responses in single XML package Results taken on November 2004 Poor scalability, both in number of concurrent clients and size of response. Problems with memory consumption at server – Requests too large to fit in memory. 20 40 60 80 100 120 140 time-to-complete [s] #connections Average response time 19.5Mb/conn. 5.5Mb/conn. 1.2Mb/conn. 200 300 400 500 600 700 800 512MB limit 10 20 30 40 50 60 80 100 120 140 Time to completion [s] Number of rows selected 150 30 Clients 5 1 Results on RefDB:J. Andreeva, J. Herrala

LHCb(XML-RPC) LHCb uses XML-RPC (predecessor of SOAP): Being also “one shot” query based, the solutions suffers from the same problems as the two based on SOAP W.-L. Ueng

gLite/Alien(Streaming) gLite/Alien streams responses to the perl implemented shell Errors Time to completion # Clients Time to Completion [s] Selecting 2.5k files of 10k 5 10 15 20 25 30 35 40 60 80 100 Streaming offers good scalability

Synthesis Generally, the scalability is poor Sending big responses in a single packet limits scalability Use of SOAP and XML-RPC worsens the problem Schema evolution not really supported RefDB and Alien don't do schema evolution at all. AMI, LHCb-Bookkeeping via admins adjusting tables. No common Metadata interface Experience with existing Software Propose Generic Interface & Prototype as Proof-Of-Concept

Design decisions Metadata organized as a hierarchy - Collect objects with shared attributes into collections Collections stored as a table - allows queries on SQL tables Analogy to file system: Collection <-> Directory Entry <-> File Abstract the backend – Allows supporting several backends. PostgreSQL, MySQL, Oracle, filesystem… Values restricted to ASCII strings Backend is unknown Scalability: Transfer large responses in chunks, streaming from DB Decreases server memory requirements, no need to store full response in memory Implement also a non-SOAP protocol Compare with SOAP. What is the performance price of SOAP?

Terminology Define common terms, first Metadata: Key-Value pairs Any data necessary to work on the grid not living in the files Entry: Entities to which metadata is attached Denoted by a string, format like file-path in Unix, wild-cards allowed Collection: Set of entries Collections are themselves entries, think of Directories Attribute: Name or key of piece of metadata Alphanumerical string with starting letter Value: Value of an entry's attribute Printable ASCII string Schema: Set of attributes of an entry Classifies types of entries: Collection's schema inherited by its entries Storage Type: How back end stores a value Back end may have different (SQL) datatypes than application Transfer Type: How values are transferred Values are transported as printeable ASCII

Interface I Interface designed in discussions with gLite-team, GridPP metadata working group and GAG. Accepted by PTF. Presented here as a C style API. Entry management: int addEntry(string entry, string type) int addEntries(list<string> entries, list<string> types) int removeEntries(string pattern) Schema management and metadata reading/writing: int addAttr(string collection, list<string> keys, list<string> types) int removeAttr(string collection, list<string> keys) int setAttr(string pattern, list<string> keys, list<string> values) int clearAttr(string pattern, string key) With R. Rocha(gLite), V. Pose, B. Koblitz

Interface: Retrieving data Bulk transfers to client - session handlers and iterators on the backend: int getAttr(string pattern, list<string>keys, Handler handler) int listAttr(string entry, Handler handler) int getNext(handle_t handle, DataChunk dc) int abort(handle_t handle) Searching for entries: int find(string pattern, string query, Handler &handle) int listEntries(string pattern, Handler &handle) Example query:'tracks > 10 and sin(p_angle) <0.5 and trigger & 2' With R. Rocha(gLite), V. Pose, B. Koblitz

Protocols implemented Two implementations: SOAP - Fulfil formal requirements on EGEE Server – C++ (gSOAP). Clients – Tested WSDL with C++ (gSOAP), Python (ZSI), Java (AXIS) TCP/IP streaming – For efficiency Text based protocol (like SMTP, POP3, and many others) Server – C++ Clients - C++, Java, Python, Perl, Ruby SSL for authentication and encryption implemented Client: getattr file(s) key1 key2… Server: 0 file value1 value2 …

TCP/IP Streaming Good experiences with ARDA prototype using streaming: Results from November 2004 Now very stable after tuning. Collaboration with LHCb - large scale test Database with 40M entries GANGA back end gLite(Alien) Time to Completion [s] # Clients 20 40 60 80 100 5 25 10 15 30 35 Crashes Selecting 2.5k entries of 10k ARDA Entries / s [1/s] Entries per directory/collection 5 10 15 20 25 30 1000 10000 100000 Attach MD (Alien) Create Entry (Alien) Create Entry (ARDA) Attach MD (ARDA) With V.Pose

TCP Streaming Streaming operations: Significant performance gains Open a transaction on the database (cursor) Stream the whole response to the client in a single TCP connection Significant performance gains Server throughput around 8 times higher

SOAP Toolkits What is the state of current SOAP toolkits? C++ - gSOAP Python – ZSI Java – Axis 1.2RC2 Ease of development? WSDL generation: From a .h file using gSOAP. After two days – a WSDL working fine with a gSOAP client. But not with other toolkits. From a Java interface using AXIS Resulting WSDL generated warnings when validated by other toolkits In the end - WSDL written manually Only known way of creating a WSDL that is accepted without warnings by toolkits being used. Ease of development? Client side: Not so bad with proper WSDLs and good toolkits 1 hour from WSDL to a working Java application using Axis 1.2RC2. What about performance? N. Santos

SOAP Toolkits performance Performance varies significantly gSOAP is fast AXIS is more than10 times slower ZSI is more than 20 times slower!! SOAP serialization performance is not too important for the client But essential for scalable servers. gSOAP is an interesting option.

SOAP vs TCP What is the overhead of a SOAP call compared to a TCP request? Around 6 times if using HTTP keepalive to improve SOAP. More than 10 times if no HTTP keepalive used. TCP connections are kept open between requets.

SOAP Iterators – Retrieving large results Iterators are used in SOAP as an alternative to streaming. Server throughput improved by 3 times But still far from TCP streaming

Conclusions on SOAP Tests show: SOAP performance is generally poor when compared to TCP streaming gSOAP is significantly faster than other toolkits. Iterators (with stateful server) helps considerably – results returned in small chunks. SOAP puts an additional load on server SOAP interoperability very problematic Writing an interoperable WSDL is hard Not all toolkits are mature

(discussed with gLite, GridPP, GAG, accepted by PTF) Conclusions Many problems understood studying metadata implementations of experiments Common requirements exist ARDA proposes generic interface to metadata on Grid: Retrieving/Updating of data Hierarchical view Schema discovery and management (discussed with gLite, GridPP, GAG, accepted by PTF) Prototype with SOAP & streaming front end built SOAP can be as fast as streaming in many cases SOAP toolkits still immature http://project-arda-dev.web.cern.ch/project-arda-dev/metadata/