another noSql customization for the HDB++ archiving system

HDB@ELK: another noSql customization for the HDB++ archiving system
M. Di Carlo*a, M. Canzaria, M. Dolcia, R. Smaregliab aINAF Osservatorio Astronomico d’Abruzzo, Teramo, Italy; bINAF Osservatorio Astronomico di Trieste, Trieste, Italy

Introduction Study how to extend HDB++
Study the archiving in Elasticsearch Use Kibana to visualise

HDB++ Module View Event Subscriber Configuration Manager uses inherit
Database Abstraction Layer C++ module KEY MySql Cassandra

HDB++ Runtime View (from the tango docs camp)

HDB++ Data Model * AttributeConfigurationHistory association 1 inherit
AttributeParameter 1 Table * 1 1 AttributeEventData Value KEY Double Long String ...

Elasticsearch Real-time distributed search and analytics engine
“Real-time” refers to the ability to search (and sometimes create) data as soon as they are produced Distributed because its indices are divided into shards with zero or more replicas the analytics engine which allows the discovery, interpretation, and communication of meaningful patterns in data Based on Apache Lucene: a free and open-source information retrieval software library It is developed alongside a data-collection and log-parsing engine called Logstash, and an analytics and visualization platform called Kibana

Elasticsearch main features
no transaction: no support for transaction; schema flexible: there is no need to specify the schema upfront; relations: denormalization, parent-child relations and nested objects; robustness: to properly work, elasticsearch requires that memory is abundant; distributed: it is a CP-system in the CAP (Consistency-Availability- Partition tolerance) theorem

Elasticsearch and relations
Everything is flat: every document is independent and therefore every document should contain all of the information required to decide whether it matches a query This helps in indexing, in searching and in scalability since documents can be spread across multiple nodes Relations are not managed in the same way of a RDBMS

Implementation Selected development language was C++
The new library had to be able to work with REST and with Json data “REST client for C++”: “Json for modern C++”: The total amount of time needed to implement the “AbstractDB” was around 4 weeks Testing and studying was around two months

Implementation: class diagram
To add a new entity in the DB: Add an entity that represent the information to store with the four main operations (DBEntity); Implement the Get and Save operation in the DAL.

Global Centroid-Moment-Tensor (CMT) Project, www.globalcmt.org
Tests Dolci et al., AMICA at Dome C: results from the first year of automatic operation tests in Antarctica Global Centroid-Moment-Tensor (CMT) Project,

Kibana: time series

Archiving data for the CMT project
Event Attribute Name Value double Value string Value Date t1 Latitude -28.39 - Longitude Location “KERMADEC ISLANDS REGION” Date 1976/02/15 21:23:22.6 t2 -14.74 167.10 “VANUATU ISLANDS” 1976/03/04 02:50:00.5 … tN How do we plot them?

Questions How do we archive structured data? Json? Array?
How do we aggregate unstructured data?

Elasticsearch and relations: four possibilities to bridge the gap
Application-side join There are no relations in the data and the only possibility is to make more than one query to filter and emulate a join Data denormalization Increasing read performance adding some redundant copy of the data Disadvantage in term of concurrency and index dimension Nested objects: it is possible to relate a document with a nested document that is indexed together There are a number of special operator to deal with those objects Parent-child relationship: a document can be a parent of another one and one child can have only one parent Documents are completely separated

Transformation into a (usable) table
Event Attribute Name Value double Value string …. t1 a1 … aM t2 tN Event a1 a2 … aM t1 Value double Value string t2 ... tN .. .... (M+1)xN

Event Attribute Name Value double Value string Value Date
Latitude -28.39 - Longitude Location “KERMADEC ISLANDS REGION” Date 1976/02/15 21:23:22.6 t2 -14.74 167.10 “VANUATU ISLANDS” 1976/03/04 02:50:00.5 … tN Event Latitude Longitude Location Date t1 -28.39 “KERMADEC ISLANDS REGION” 1976/02/15 21:23:22.6 t2 -14.74 167.10 “VANUATU ISLANDS” 1976/03/04 02:50:00.5 … tN ...

Kibana: Geopoint

Transformation - general
Event Attribute Name Value double Value string …. t1 a1 … aM t2 tN aM a2 (M+1) x N x Z a1 g1 g2 gZ t-1 t-2 The dimension depends on the grouping one want to do (for instance, time-device-attribute) t-i t-N

Conclusion A json device attribute (with the needed changes in the source code to be able to archive another type) was introduced whenever the aggregation was needed (nested objects relations) The system appears to be thought for archiving time series (only) New development of the TANGO core model can be helpful to reduce the aggregation tradeoff Specific json data type, scheduling custom archiving scripts can be beneficial too.

Thank for the attention
For any question you can write to: Matteo Di Carlo

another noSql customization for the HDB++ archiving system

Similar presentations

Presentation on theme: "another noSql customization for the HDB++ archiving system"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

another noSql customization for the HDB++ archiving system

Similar presentations

Presentation on theme: "another noSql customization for the HDB++ archiving system"— Presentation transcript:

Similar presentations

About project

Feedback