Summary of Alma-OSF’s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013 Heavily based on the presentation by Tzu-Chiang Shen, Leonel.

Slides:



Advertisements
Similar presentations
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Advertisements

By Daniela Floresu Donald Kossmann
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
C-Store: Introduction to TPC-H Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 20, 2009.
In 10 minutes Mohannad El Dafrawy Sara Rodriguez Lino Valdivia Jr.
Evaluation of NoSQL databases for DIRAC monitoring and beyond
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Chapter 1 Introduction to Databases
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria Supervisor: Dr. Jian Yang.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Database Systems – Data Warehousing
Processing and Analyzing Large log from Search Engine Meng Dou 13/9/2012.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Elasticsearch in Dashboard Data Management Applications David Tuckett IT/SDC 30 August 2013 (Appendix 11 November 2013)
NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
WTT Workshop de Tendências Tecnológicas 2014
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013 Evaluation of mongoDB for Persistent Storage of Monitoring.
Goodbye rows and tables, hello documents and collections.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
Experimenting Lucene Index on HBase in an HPC Environment Xiaoming Gao Vaibhav Nachankar Judy Qiu.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
One Billion Objects in 2GB: Big Data Analytics on Small Clusters with Doradus OLAP There are many good software modules available today that provide big.
© Copyright 2013 STI INNSBRUCK
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013 Relational APDM & Relational ASDM models effort done in online.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of.
Reaching out… through IT R Document Store - Pilot 001 Presented to.
Solr Team CS5604: Cloudera Search in IDEAL Nikhil Komawar, Ananya Choudhury, Rich Gruss Tuesday May 5, 2015 Department of Computer Science Virginia Tech,
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Modeling MongoDB with Relational Model Proposed by Christopher Polanco.
Cloudera Kudu Introduction
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
Monitoring with InfluxDB & Grafana
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Efficient Data Management Tools for the Heterogeneous Big Data Warehouse Autors: Aleksandr Alekseev (Programmer), Victoria Osipova (Associate professor),
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Solr Power FTW Alex #solrnosql. What Will I Cover? Who I am What Bazaarvoice does SOLR and NoSQL Can SOLR handle 20K queries per second?
Information Retrieval in Practice
CSE-291 (Distributed Systems) Winter 2017 Gregory Kesden
CS 405G: Introduction to Database Systems
and Big Data Storage Systems
Understanding and Improving Server Performance
An Open Source Project Commonly Used for Processing Big Data Sets
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Learning MongoDB ZhangGang
Modern Databases NoSQL and NewSQL
Dineesha Suraweera.
Senior Solutions Architect, MongoDB Inc.
Alejandro Álvarez on behalf of the FTS team
NOSQL databases and Big Data Storage Systems
CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden
1 Demand of your DB is changing Presented By: Ashwani Kumar
Big Data - in Performance Engineering
Data Lifecycle Review and Outlook
Objective of This Course
Overview of big data tools
CloudAnt: Database as a Service (DBaaS)
The Database Environment
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Presentation transcript:

Summary of Alma-OSF’s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013 Heavily based on the presentation by Tzu-Chiang Shen, Leonel Peña ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013

ICT-CPM April 2013 Monitoring Storage Requirement n Expected data rate with 66 antennas:  150,000 monitor points (“MP”s) total.  MPs get archived once per minute ~1 minute of MP data bucketed into a “clob”  ~ 7000 clobs/s ~ GB/day, ~10 TB/year 2500 clobs/s + dependent MP demultiplexing + fluctuations  ~ equivalent to 310KByte/s or 2,485Mbit/s n Monitoring data characteristic  Simple data structure: [ID, timestamp, value]  But huge amount of data  Read-only data

ICT-CPM April 2013 Prior DB Investigations n Oracle: See Alisdair’s slides. n MySQL  Query problems, similar to Oracle DB n HBase ( )  Got stuck with Java client problems  Poor support from the community n Cassandra ( )  Keyspace / replicator issue resolved  Poor insert performance: Only 270 inserts / minute (unclear what size)  Clients froze n These experiments were done “only” with some help from archive operators, not in the scope of a student’s thesis like it was later with MongoDB. n Also “administrational complexity” was mentioned, without details.

ICT-CPM April 2013 n no-SQL and document oriented. n The storage format is BSON, a variation of JSON. n Documents within a collection can differ in structure.  For monitor data we don’t really need this freedom. n Other features: Sharding, Replication, Aggregation (Map/Reduce) Very Brief Introduction of MongoDB SQLmongoDB Database TableCollection RowDocument Field Index

ICT-CPM April 2013 Very Brief Introduction of MongoDB … A document in mongoDB: { _id: ObjectID("509a8fb2f3f4948bd2f983a0"), user_id: "abc123", age: 55, status: 'A' }

ICT-CPM April 2013 Schema Alternatives 1.) One MP value per doc n One MP value per doc: n One MongoDB collection total, or one per antenna.

ICT-CPM April 2013 n A clob (~1 minute of flattened MP data): n Collection per antenna / other device. Schema Alternatives 2.) MP clob per doc

ICT-CPM April 2013 n One monitor point data structure per day n Monthly database n Shard key = antenna + MP, keeps matching docs on the same node. n Updates of pre-allocated documents. Schema Alternatives 3.) Structured MP /day/doc

ICT-CPM April 2013 n Advantages of variant 3.):  Fewer documents within a collection There will be ~150,000 documents per day The amount of indexes will be lower as well.  No data fragmentation problem  Once a specific document is identified ( nlog(n) ), the access to a specific range or a single value can be done in O(1)  Smaller ratio of metadata / data Analysis

ICT-CPM April 2013 n Query to retrieve a value with seconds-level granularity:  Ej: To get the value of the FrontEnd/Cryostat/GATE_VALVE_STATE at T15:29:18. db.monitorData_[MONTH].findOne( {"metadata.date": " ", "metadata.monitorPoint": "GATE_VALVE_STATE", "metadata.antenna": "DV10", "metadata.component": "FrontEnd/Cryostat”}, { 'hourly ': 1 } ); How would a query look like?

ICT-CPM April 2013 n Query to retrieve a range of values  Ej: To get values of the FrontEnd/Cryostat/GATE_VALVE_STATE at minute 29 (at T15:29) db.monitorData_[MONTH].findOne( {"metadata.date": " ", "metadata.monitorPoint": "GATE_VALVE_STATE", "metadata.antenna": "DV10", "metadata.component": "FrontEnd/Cryostat”}, { 'hourly.15.29': 1 } ); How would a query look like …

ICT-CPM April 2013 n A typical query is restricted by:  Antenna name  Component name  Monitor point  Date db.monitorData_[MONTH].ensureIndex( { "metadata.antenna": 1, "metadata.component": 1, "metadata.monitorPoint": 1, "metadata.date": 1 } ); Indexes

ICT-CPM April 2013 n A cluster of two nodes were created  CPU: Intel Xeon Quad core X5410.  RAM: 16 GByte  SWAP: 16 GByte n OS:  RHEL 6.0  el6.x86_64 n MongoDB  V2.2.1 Testing Hardware / Software

ICT-CPM April 2013 n Real data from Sep-Nov of 2012 was used initially, but: n A tool to generate random data was implemented:  Month: 1 (February)  Number of days: 11  Number of antennas:70  Number of components by antenna: 41  Monitoring points by component: 35  Total daily documents:  Total of documents:  Average weight by document: 1,3MB  Size of the collection: 1,375.23GB  Total index size193MB Testing Data

ICT-CPM April 2013 Database Statistics

ICT-CPM April 2013 Data Sets

ICT-CPM April 2013 Data Sets …

ICT-CPM April 2013 Data Sets

ICT-CPM April 2013 Schema 1: One Sample of Monitoring Data per Document

ICT-CPM April 2013 Proposed Schema:

ICT-CPM April 2013 n For more tests, see meDataTestingUsingMongoDB meDataTestingUsingMongoDB More tests

ICT-CPM April 2013 n Test performance of aggregations/combined queries n Use Map/Reduce to create statistics (max, min, avg, etc) of range of data to improve performance of queries like:  i.e: Search monitoring points which values >= 10 n Test performance under a year worth of data n Stress tests with big amount of concurrent queries TODO

ICT-CPM April 2013 n MongoDB is suitable as an alternative for permanent storage of monitoring data. nReported 25,000 clobs/s ingestion rate in the tests. n The schema + indexes are fundamental to achieve milliseconds level of responses OSF

ICT-CPM April 2013 n What are the requirements going to be like? nOnly extraction by time interval and offline processing? nOr also “data mining” running on the DB? nAll queries ad-hoc and responsive, or also batch jobs? nRepair / flagging of bad data? Later reduction of redundancies? n Can we hide the MP-to-document mapping from upserts/queries? nCurrently queries have to patch together results at the 24 hour and monthly breaks. Comments