ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, 17-19 April 2013 Evaluation of mongoDB for Persistent Storage of Monitoring.

Slides:



Advertisements
Similar presentations
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce
C-Store: Introduction to TPC-H Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 20, 2009.
In 10 minutes Mohannad El Dafrawy Sara Rodriguez Lino Valdivia Jr.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Chapter 1 Introduction to Databases
UNESCO ICTLIP Module 4. Lesson 1 Database Design, & Information Storage and Retrieval Lesson 1. Introduction to CDS/ISIS Windows (WinISIS) version: basic.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
What is MongoDB? Developed by 10gen It is a NoSQL database A document-oriented database It uses BSON format.
A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria Supervisor: Dr. Jian Yang.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Processing and Analyzing Large log from Search Engine Meng Dou 13/9/2012.
Elasticsearch in Dashboard Data Management Applications David Tuckett IT/SDC 30 August 2013 (Appendix 11 November 2013)
MongoDB An introduction. What is MongoDB? The name Mongo is derived from Humongous To say that MongoDB can handle a humongous amount of data Document.
NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.
WTT Workshop de Tendências Tecnológicas 2014
Goodbye rows and tables, hello documents and collections.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
Oracle Index study for Event TAG DB M. Boschini S. Della Torre
One Billion Objects in 2GB: Big Data Analytics on Small Clusters with Doradus OLAP There are many good software modules available today that provide big.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
Summary of Alma-OSF’s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013 Heavily based on the presentation by Tzu-Chiang Shen, Leonel.
© Copyright 2013 STI INNSBRUCK
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013 Relational APDM & Relational ASDM models effort done in online.
Page 1 Online Aggregation for Large MapReduce Jobs Niketan Pansare, Vinayak Borkar, Chris Jermaine, Tyson Condie VLDB 2011 IDS Fall Seminar
Reaching out… through IT R Document Store - Pilot 001 Presented to.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Database Concepts Track 3: Managing Information using Database.
Introduction to MongoDB
MongoDB Jer-Shuan Lin.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Modeling MongoDB with Relational Model Proposed by Christopher Polanco.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Clusterpoint Margarita Sudņika ms RDBMS & NoSQL Databases & tables → Document stores Columns, rows → Schemaless documents Scales UP → Scales UP.
+ Big Data. + Chapter Objectives Learn the basic concepts of Big Data, structured storage, and the MapReduce process Learn the basic concepts of data.
Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
Cloudera Kudu Introduction
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Monitoring with InfluxDB & Grafana
Introduction to MongoDB. Database compared.
NoSQL databases A brief introduction NoSQL databases1.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Efficient Data Management Tools for the Heterogeneous Big Data Warehouse Autors: Aleksandr Alekseev (Programmer), Victoria Osipova (Associate professor),
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.
CS 405G: Introduction to Database Systems
DBSI Teaser Presentation
and Big Data Storage Systems
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
CS122B: Projects in Databases and Web Applications Winter 2017
Distributed Network Traffic Feature Extraction for a Real-time IDS
Running virtualized Hadoop, does it make sense?
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
WinCC OA NextGen Archiver: OSS Database selection process Dipl. -Ing
Learning MongoDB ZhangGang
Dineesha Suraweera.
MongoDB Connection in Husky
NOSQL databases and Big Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
What is database? Types and Examples
Introduction to Database Systems
Overview of big data tools
Applying Data Warehousing and Big Data Techniques to Analyze Internet Performance Thiago Barbosa, Renan Souza, Sérgio Serra, Maria Luiza and Roger Cottrell.
CloudAnt: Database as a Service (DBaaS)
The Database Environment
SDMX meeting Big Data technologies
Presentation transcript:

ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013 Evaluation of mongoDB for Persistent Storage of Monitoring Data Tzu-Chiang Shen Leonel Peña

ICT-CPM April 2013 Monitoring Storage Requirement n Expected data rate with 66 antennas:  ~ clobs/s ~ GB/day  ~ equivalent to 310KByte/s or 2,485Mbit/s  ~ 130, ,000 monitor points n Monitoring data characteristic  Simple data structure: [timestamp, value]  But huge amount of data  Read-only data  Data is sorted at the moment of insertion

ICT-CPM April 2013 n no-SQL and document oriented. n The storage format is BSON, a variation of JSON. n A document within a collection, doesn’t required to have the same fields. n Other features: Sharding, Replication, Aggregation (Map/Reduce) Very Brief Introduction of MongoDB SQLmongoDB Database TableCollection RowDocument Field Index

ICT-CPM April 2013 Very Brief Introduction of MongoDB … A document in mongoDB: { _id: ObjectID("509a8fb2f3f4948bd2f983a0"), user_id: "abc123", age: 55, status: 'A' }

ICT-CPM April 2013 Alternatives of Schema for Monitoring Data n One monitoring point per document

ICT-CPM April 2013 n A clob per document Alternatives of Schema …

ICT-CPM April 2013 n A monitor point per day per document Alternatives of Schema …

ICT-CPM April 2013 n Advantages:  The amount of documents within a collection is bounded There will be ~150,000 documents per day The amount of indexes will be bounded as well.  No data fragmentation problem  Once a specific document is identified ( nlog(n) ), the access to a specific range or a single value can be done in O(1)  Smaller ratio of metadata / data Analysis

ICT-CPM April 2013 n Query to retrieve a value with seconds-level granularity:  Ej: To get the value of the FrontEnd/Cryostat/GATE_VALVE_STATE at T15:29:18. db.monitorData_[MONTH].findOne( {"metadata.date": " ", "metadata.monitorPoint": "GATE_VALVE_STATE", "metadata.antenna": "DV10", "metadata.component": "FrontEnd/Cryostat”}, { 'hourly ': 1 } ); How would a query look like?

ICT-CPM April 2013 n Query to retrieve a range of value  Ej: To get values of the FrontEnd/Cryostat/GATE_VALVE_STATE at minute 29 (at T15:29) db.monitorData_[MONTH].findOne( {"metadata.date": " ", "metadata.monitorPoint": "GATE_VALVE_STATE", "metadata.antenna": "DV10", "metadata.component": "FrontEnd/Cryostat”}, { 'hourly.15.29': 1 } ); How would a query looks like …

ICT-CPM April 2013 n A typical query is restricted by:  Antenna name  Component name  Monitor point  Date db.monitorData_[MONTH].ensureIndex( { "metadata.antenna": 1, "metadata.component": 1, "metadata.monitorPoint": 1, "metadata.date": 1 } ); Indexes

ICT-CPM April 2013 n A cluster of two nodes were created  CPU: Intel Xeon Quad core X5410.  RAM: 16 GByte  SWAP: 16 GByte n OS:  RHEL 6.0  el6.x86_64 n MongoDB  V2.2.1 Testing Hardware / Software

ICT-CPM April 2013 n Real data from from Sep-Nov of 2012 was used initially, but: n A tool to generate random data was implemented:  Month: 1 (February)  Number of days: 11  Number of antennas:70  Number of components by antenna: 41  Monitoring points by component: 35  Total daily documents:  Total of documents:  Average weight by document: 1,3MB  Size of the collection: 1,375.23GB  Total index size193MB Testing Data

ICT-CPM April 2013 Database Statistics

ICT-CPM April 2013 Data Sets

ICT-CPM April 2013 Data Sets …

ICT-CPM April 2013 Data Sets

ICT-CPM April 2013 Schema 1: One Sample of Monitoring Data per Document

ICT-CPM April 2013 Proposed Schema:

ICT-CPM April 2013 n For more tests, see meDataTestingUsingMongoDB meDataTestingUsingMongoDB More tests

ICT-CPM April 2013 n Test performance of aggregations/combined queries n Use Map/Reduce to create statistics (max, min, avg, etc) of range of data to improve performance of queries like:  i.e: Search monitoring points which values >= 10 n Test performance under a years worth of data n Stress tests with big amount of concurrent queries Pending

ICT-CPM April 2013 n MongoDB is suitable as an alternative for permanent storage of monitoring data n The schema + indexes are fundamental to achieve milliseconds level of responses Conclusion