First results of the feasibility study of the Cassandra DB application in Panda Job Monitoring ATLAS S&C Week at CERN April 5 th, 2011 Maxim Potekhin for.

Slides:



Advertisements
Similar presentations
Data Storage Solutions Module 1.2. Data Storage Solutions Upon completion of this module, you will be able to: List the common storage media and solutions.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.
The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.
1. Topics Is Cloud Computing the way to go? ARC ABM Review Configuration Basics Setting up the ARC Cloud-Based ABM Hardware Configuration Software Configuration.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
Database Optimization & Maintenance Tim Richard ECM Training Conference#dbwestECM Agenda SQL Configuration OnBase DB Planning Backups Integrity.
Milestone 1 Workshop in Information Security – Distributed Databases Project Access Control Security vs. Performance By: Yosi Barad, Ainat Chervin and.
Chapter 9 Designing Systems for Diverse Environments.
ProjectWise Overview – Part 1 V8 XM Edition
Database System Development Lifecycle Transparencies
© 2001 by Prentice Hall8-1 Local Area Networks, 3rd Edition David A. Stamper Part 3: Software Chapter 8 Client/Server Architecture.
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
Paper on Best implemented scientific concept for E-Governance Virtual Machine By Nitin V. Choudhari, DIO,NIC,Akola By Nitin V. Choudhari, DIO,NIC,Akola.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Load Test Planning Especially with HP LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
Computer System Architectures Computer System Software
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
#devshark welcome to #devshark. #devshark HELLO! I’M Ville Rauma Fingersoft Product Owner Web
Planning and Designing Server Virtualisation.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013 Evaluation of mongoDB for Persistent Storage of Monitoring.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
Summary of Alma-OSF’s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013 Heavily based on the presentation by Tzu-Chiang Shen, Leonel.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Block1 Wrapping Your Nugget Around Distributed Processing.
Database System Development Lifecycle 1.  Main components of the Infn System  What is Database System Development Life Cycle (DSDLC)  Phases of the.
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI Feb 2012 Presentation.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Development of a noSQL storage solution for the Panda Monitoring System “Database Futures” Workshop at CERN June 7 th, 2011 Maxim Potekhin Brookhaven National.
Development of Hybrid SQL/NoSQL PanDA Metadata Storage PanDA/ CERN IT-SDC meeting Dec 02, 2014 Marina Golosova and Maria Grigorieva BigData Technologies.
Remote Site C Pilot Scheduler Pilots and Pilot Schedulers Jobs Statistics Production Dashboard Dynamic Data Movement Monitor Panda Server (Apache) Development.
08-Nov Database TEG workshop, Nov 2011 ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements Gancho Dimitrov.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Next Generation of Apache Hadoop MapReduce Owen
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Component 8/Unit 1bHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 1b Elements of a Typical.
Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
An Open Source Project Commonly Used for Processing Big Data Sets
Open Source distributed document DB for an enterprise
Elizabeth Gallas - Oxford ADC Weekly September 13, 2011
Software Architecture in Practice
Case studies – Atlas and PVSS Oracle archiver
Data Lifecycle Review and Outlook
Overview of big data tools
Performance And Scalability In Oracle9i And SQL Server 2000
Presentation transcript:

First results of the feasibility study of the Cassandra DB application in Panda Job Monitoring ATLAS S&C Week at CERN April 5 th, 2011 Maxim Potekhin for : BNL Physics Applications Software Group BNL RHIC and ATLAS Computing Facility (M.Ernst, S.Misawa, H.Ito et al)

2 Looking at noSQL solutions from the Panda perspective noSQL is not a replacement for the Oracle DB in the core part of the Panda server functionality. In addition to RDBMS vs noSQL comparison presented in Vincent’s talk, we observe that: There are classes of complex queries which include time and date ranges where Oracle typically performs rather poorly Storing lots of historical data on expensive transaction- oriented RDBMS does not seem optimal An option to unload significant amounts of archival-type reference data from Oracle to a highly performing, scalable system based on commodity hardware appears attractive

3 Objectives of the project Ultimately, we aim to create of a noSQL-based data store for archive-type reference Panda data. In the validation stage, it would continuously function in parallel with the Oracle DB in order to ascertain the performance and characteristics of such system under real life data volume and query loads. To this end, we are pursuing multiple objectives, in the context of Cassandra DB: to create an optimal data design for the data generated by the Panda server as well as indexes to perform a broad performance comparison of the recently built cluster at BNL (March 2011) to the existing Oracle facility at CERN To determine the optimal hardware configuration of that system and its tuning parameters

4 Stages of the project In recent months, we accomplished the following: In Jan-Feb 2011, using the 4-VM Cassandra R&D cluster at CERN (generously shared by the DDM team), gained experience in data design and indexing of Panda data, as well as operational experience with Cassandra Demonstrated usefulness of map-reduce approach to generation of date range indexes into Cassandra data, which results in marked improvement over Oracle In March 2011, with support from RACF, commissioned a 3- node real hardware cluster at BNL and loaded a year worth of real Panda data in a modified format (based on initial CERN experience). Ran a series of performance tests aimed at the “worst-case performance” estimation, to set the baseline for further optimization.

5 The Cassandra Cluster at BNL Currently we have 3 nodes, with possible expansion to 5. A node: two Xeon X5650 processors, 6 cores each due to hyper-threading – effectively 24 cores total of eight 500GB 2.5in, 7200RM SATA disks out of 8 spindles, 6 are RAID0 for the Cassandra data 48GB RAM on a 1333MHz bus Such configuration will be highly optimal for write performance, which in Cassandra is CPU and memory bound. What we needed to do was to gauge the read performance under variety of queries and loads.

6 Loading Data/Write Performance To stress a decent Cassandra installation in write mode, it is necessary to have multiple clients operating simultaneously, and ideally have data pre-staged on local drives on the client nodes. We created a multithreaded Python application that is capable of loading data either directly from the Amazon S3 data store created by T.Wenaus, or from local data files previously downloaded from the network. The data comes in CSV format and is parsed using a standard Python module. We observe rates of 500 rows/s on the CERN VM cluster and more than rows/s on the BNL cluster. Therefore, network bandwidth of the connection to the external data source will be the limiting factor, not Cassandra performance.

7 Patterns in the Data In Panda, jobs are often submitted not individually, but in batches which can measure anywhere from 2 to hundreds of jobs. This is demonstrated by the following plot which shows the distribution of lengths of continuous ID ranges as they are found in the database

8 Patterns in the Data Practically, this means that a lot of queries will be wasteful – when looking at activity of a user on a particular cloud, or his/her job sets, or tasks, we will typically make 20 trips to the database, each extracting a very small amount of data, instead of one read. In the second generation of the Cassandra schema, we opted to store data in bunched of 100 Panda entries, in order to take advantage of that pattern in the data. We also decided to store entries in csv form, as opposed to pre-parsed columns, in order to conserve space and resources spent on serialization/deserialization of the data in the client and the server.

9 Main results of testing at CERN Note on indexing: native ability to index individual columns is a relatively new feature in Cassandra, introduced in mid Since our data is mostly write-once, read-many, we can do a better job by bucketing data in map-reduce fashion, and we don’t need to create extraneous columns to implement an equivalent of composite indexes (so we save space) Example: Query for , (extracting all data pertaining to these jobs, not just count) Takes 5.8 seconds (or 2.7 with hot key cache) on the CERN Cassandra cluster vs 1 min on Oracle RDBMS

10 Main results of testing at CERN Using asynchronously built indexes allows to do statistical analysis of the workflow that is not currently done due to poor performance of date range queries. We used “natural” time binning of the data into 1 day bins as they exist in our S3 backup facility as basis for building indexes (i.e. all indexes in this exercise were built “on daily basis”). Find the number of failed jobs in the NL cloud, for user “borut” over a period of 5 days in December 2010: 0.005s on Cassandra ~1 min on Oracle A few indexes, such as user::cloud::jobstatus, cloud::jobstatus etc, only take 1 to 3% of disk space, and that means we still have headroom to compute multiple such indexes to accommodate users’ needs.

11 Contents of the BNL test We based our selection of queries to be run against the Cassandra cluster on actual Oracle usage statistics. One caveat in using this is a benchmark that is there are not too many date range queries. We believe there is a bias in that sample due to the fact that users are discouraged from doing such queries both by their own experience and also by policies which aim to reduce the load on the Oracle server (e.g. by limiting the number of days covered by a query done from the Monitor). We were aiming for getting lower limits on the Cassandra performance, i.e. the worst-case scenario. In most testing, key and row caches were disabled, which puts Cassandra at a disadvantage vs Oracle (whose caching capabilities we were not in the position to control)

12 Contents of the BNL test Disclaimer: this work started recently and is an effort in progress! Tuning Cassandra for a particular application and doing proper data design according to query pattern takes some time.

13 Main results of testing at BNL Data load: 1 year up till now (which is practically a majority of data generated so far due to explosive growth in 2010) Case #1: Large query load, primary key queries: Looking at throughput in a random query of 10k items over the past year, vs the number of concurrent clients. Numbers are seconds to query completion. Note fluctuation in Oracle’s response from day to day (typical) Cassandra Oracle

14 Main results of testing at BNL Apparently our Cassandra cluster did better at 20Hz rate, was equal at roughly 100Hz, and fell behind at 500. Comments: iostat monitoring shows that the disks in the Cassandra setup are saturated with 100% load at 10 clients or so. We have 18 spindles vs 108 at the ADCR Oracle (and these are presumably better drives). Also, we utilized NO caching (on purpose) in Cassabdra vs working cache in Oracle. Key question, what is the realistic load? Answering this question is work in progress, working with Gancho.

15 Main results of testing at BNL Case #2: 52 randomly chosen jobsets for user Johannes Elmsheuser (actual popular query) Cassandra came ahead this time, despite obvious stress conditions for both Oracle and Cassandra. Original hints and bind variables were used in the optimized Oracle queries. 130 Cassandra Oracle

16 Main results of testing at BNL Case #3: 1 random job picked from past year, 30 clients Case #4: 100 tasks are being queried with 1 client 30 Cassandra0.051 Oracle Cassandra27 Oracle50

17 Summary (1) We conducted preliminary studies of the feasibility of Cassandra application as the archival database for Panda job status and other data. In addition to VM-based platform at CERN, we built an initial version of a 3-machine Cassandra cluster at BNL. We demonstrated the following: Good write performance of the cluster (of the order of 10k entries per second written to the DB) which makes it a good data sink for Oracle or any other data source By using techniques similar to map-reduce we can index the data in a way that enables a variety of queries over date ranges, a piece of functionality now under-utilized on Oracle due to performance issues

18 Summary (2) On a 3 node cluster, we observe superior performance in important classes of queries even with caching in Cassandra disabled. In another case, under significant load this particular configuration has a worse overall throughput compared to the Oracle RDBMS deployed at CERN, likely due to disparity in number and quality of spindles between the two systems. Based on what we know, we could scale horizontally, but in order to make such decision we need to better understand current and projected query load requirements (i.e. whether we really need to do this to provide an adequate service). Work in progress. So far indications are that Cassandra can indeed serve as archival DB for Panda, which will reduce the load on central Oracle service and provide better performance in a variety of cases. We need to define the policy: which clients will access Cassandra storage and how. Will the data be spliced in apps like Panda Monitor?

19Acknowledgements Thanks to Vincent, Donal and others for nice teamwork and help with the CERN VM-based cluster Thanks to RACF team, M.Ernst, S.Misawa, H.Ito and others for installation of a new cluster and continued support of this effort.