another noSql customization for the HDB++ archiving system

Slides:



Advertisements
Similar presentations
Introduction to MongoDB
Advertisements

Amaze business, make your devs happy
NoSQL Databases: MongoDB vs Cassandra
Organizing Data & Information
Log Monitoring, Management and Analysis with Nagios
A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria Supervisor: Dr. Jian Yang.
Databases with Scalable capabilities Presented by Mike Trischetta.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
ZhangGang, Fabio, Deng Ziyan /31 NoSQL Introduction to Cassandra Data Model Design Implementation.
Introduction to Databases A line manager asks, “If data unorganized is like matter unorganized and God created the heavens and earth in six days, how come.
Goodbye rows and tables, hello documents and collections.
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 4th Edition Copyright © 2009 John Wiley & Sons, Inc. All rights.
NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book.
Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 5th Edition Copyright © 2015 John Wiley & Sons, Inc. All rights.
CASTOR logging at RAL Rob Appleyard, James Adams and Kashyap Manjusha.
Advanced Databases COMP3017 Dr Nicholas Gibbins
A presentation on ElasticSearch
Database Systems: Design, Implementation, and Management Tenth Edition
CSE-291 (Distributed Systems) Winter 2017 Gregory Kesden
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
NoSQL: Graph Databases
and Big Data Storage Systems
NoSQL Databases NoSQL Concepts Databases Telerik Software Academy
Solving Common Data Table Problems with JMP® 13:
WinCC-OA Log Analysis SCADA Application Service - Reporting
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
Searching and Indexing
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
© The McGraw-Hill Companies, All Rights Reserved APPENDIX C DESIGNING DATABASES APPENDIX C DESIGNING DATABASES.
Open Source distributed document DB for an enterprise
Experience in CMS with Analytic Tools for Data Flow and HLT Monitoring
Chapter 11 Object-Oriented Design
Physical Database Design and Performance
Fundamentals & Ethics of Information Systems IS 201
Modern Databases NoSQL and NewSQL
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Dineesha Suraweera.
Twitter & NoSQL Integration with MVC4 Web API
NOSQL databases and Big Data Storage Systems
Database Performance Tuning and Query Optimization
New Mexico State University
Databases.
CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden
CS 174: Server-Side Web Programming February 12 Class Meeting
1 Demand of your DB is changing Presented By: Ashwani Kumar
CS6604 Digital Libraries IDEAL Webpages Presented by
MANAGING DATA RESOURCES
LECTURE 34: Database Introduction
Object-Oriented Programming
NoSQL Databases Antonino Virgillito.
Overview of big data tools
Analysis models and design models
Adding Multiple Logical Table Sources
Introduction to Elasticsearch with basics of Lucene May 2014 Meetup
Chapter 11 Database Performance Tuning and Query Optimization
AIMS Equipment & Automation monitoring solution
Rafał Kuć – Sematext sematext.com
Academic & More Group 4 谢知晖 王逸雄 郭嘉宋 程若愚.
CloudAnt: Database as a Service (DBaaS)
LECTURE 33: Database Introduction
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Indexing with ElasticSearch
TN19-TCI: Integration and API management using TIBCO Cloud™ Integration
Relationships—Topics
Presentation transcript:

HDB@ELK: another noSql customization for the HDB++ archiving system M. Di Carlo*a, M. Canzaria, M. Dolcia, R. Smaregliab aINAF Osservatorio Astronomico d’Abruzzo, Teramo, Italy; bINAF Osservatorio Astronomico di Trieste, Trieste, Italy

Introduction Study how to extend HDB++ Study the archiving in Elasticsearch Use Kibana to visualise

HDB++ Module View Event Subscriber Configuration Manager uses inherit Database Abstraction Layer C++ module KEY MySql Cassandra

HDB++ Runtime View (from the tango docs camp)

HDB++ Data Model * AttributeConfigurationHistory association 1 inherit AttributeParameter 1 Table * 1 1 AttributeEventData Value KEY Double Long String ...

Elasticsearch Real-time distributed search and analytics engine “Real-time” refers to the ability to search (and sometimes create) data as soon as they are produced Distributed because its indices are divided into shards with zero or more replicas the analytics engine which allows the discovery, interpretation, and communication of meaningful patterns in data Based on Apache Lucene: a free and open-source information retrieval software library It is developed alongside a data-collection and log-parsing engine called Logstash, and an analytics and visualization platform called Kibana

Elasticsearch main features no transaction: no support for transaction; schema flexible: there is no need to specify the schema upfront; relations: denormalization, parent-child relations and nested objects; robustness: to properly work, elasticsearch requires that memory is abundant; distributed: it is a CP-system in the CAP (Consistency-Availability- Partition tolerance) theorem

Elasticsearch and relations Everything is flat: every document is independent and therefore every document should contain all of the information required to decide whether it matches a query This helps in indexing, in searching and in scalability since documents can be spread across multiple nodes Relations are not managed in the same way of a RDBMS

Implementation Selected development language was C++ The new library had to be able to work with REST and with Json data “REST client for C++”: https://github.com/mrtazz/restclient-cpp “Json for modern C++”: https://github.com/nlohmann/json The total amount of time needed to implement the “AbstractDB” was around 4 weeks Testing and studying was around two months

Implementation: class diagram To add a new entity in the DB: Add an entity that represent the information to store with the four main operations (DBEntity); Implement the Get and Save operation in the DAL.

Global Centroid-Moment-Tensor (CMT) Project, www.globalcmt.org Tests Dolci et al., AMICA at Dome C: results from the first year of automatic operation tests in Antarctica Global Centroid-Moment-Tensor (CMT) Project, www.globalcmt.org

Kibana: time series

Archiving data for the CMT project Event Attribute Name Value double Value string Value Date t1 Latitude -28.39 - Longitude -176.79 Location “KERMADEC ISLANDS REGION” Date 1976/02/15 21:23:22.6 t2 -14.74 167.10 “VANUATU ISLANDS” 1976/03/04 02:50:00.5 … tN How do we plot them?

Questions How do we archive structured data? Json? Array? How do we aggregate unstructured data?

Elasticsearch and relations: four possibilities to bridge the gap Application-side join There are no relations in the data and the only possibility is to make more than one query to filter and emulate a join Data denormalization Increasing read performance adding some redundant copy of the data Disadvantage in term of concurrency and index dimension Nested objects: it is possible to relate a document with a nested document that is indexed together There are a number of special operator to deal with those objects Parent-child relationship: a document can be a parent of another one and one child can have only one parent Documents are completely separated

Transformation into a (usable) table Event Attribute Name Value double Value string …. t1 a1 … aM t2 tN Event a1 a2 … aM t1 Value double Value string t2 ... tN .. .... (M+1)xN

Event Attribute Name Value double Value string Value Date Latitude -28.39 - Longitude -176.79 Location “KERMADEC ISLANDS REGION” Date 1976/02/15 21:23:22.6 t2 -14.74 167.10 “VANUATU ISLANDS” 1976/03/04 02:50:00.5 … tN Event Latitude Longitude Location Date t1 -28.39 -176.79 “KERMADEC ISLANDS REGION” 1976/02/15 21:23:22.6 t2 -14.74 167.10 “VANUATU ISLANDS” 1976/03/04 02:50:00.5 … tN ...

Kibana: Geopoint

Transformation - general Event Attribute Name Value double Value string …. t1 a1 … aM t2 tN aM a2 (M+1) x N x Z a1 g1 g2 gZ t-1 t-2 The dimension depends on the grouping one want to do (for instance, time-device-attribute) t-i t-N

Conclusion A json device attribute (with the needed changes in the source code to be able to archive another type) was introduced whenever the aggregation was needed (nested objects relations) The system appears to be thought for archiving time series (only) New development of the TANGO core model can be helpful to reduce the aggregation tradeoff Specific json data type, scheduling custom archiving scripts can be beneficial too.

Thank for the attention For any question you can write to: Matteo Di Carlo matteo.dicarlo@inaf.it