Download presentation
Presentation is loading. Please wait.
Published byMeghan Ryan Modified over 5 years ago
1
ONAP DataLake Guobiao Mo (China Mobile) Xin Miao (Huawei) Zhaoxing Meng (ZTE) October 29, 2018
2
Project Goals Collect and permanently store the data flowing around ONAP system into several Big Data storages, each in different category. Also serve as a common data storage for all ONAP components, with easy access. Provide APIs and ways for ONAP components and external systems (e.g. BSS/OSS) to consume the data. Provide sophisticated and ready-to-use data analytics tools built on the data.
3
Architecture OSS/BSS ONAP Components DMaaP/Kafka Other Sources
JSON/XML/YAML DataLake Dispatcher DL Admin (UI) OLAP Store (Druid) Document Store (Couchbase) Other Stores Superset Query/UI Spark
4
Data Sources DataLake monitors all or selected DMaaP topics and real-time reads the data via Kafka API, and persists it. Other ONAP components can use DataLake as a storage to save application specific data, through DMaaP or DataLake REST APIs. Other data sources will be supported if needed.
5
Document Store POC is on MongoDB, which supports flexible database schemas and powerful ad hoc queries. Due to MongoDB license issue, we plan to replace it with Couchbase, which is a distributed document-oriented database, and supports Spark. DataLake real-time pulls the data and insert it into the store, one table for each topic, with the same table name as the topic name. Data types JSON, XML, and YAML are auto converted into native store schema. DL provides REST API for data query, while applications can access the data through the store’s native API as well. Couchbase supports Spark, which reads the document into DataFrame, for easy processing. This is suitable for complicate analytics model.
6
OLAP Store Apache Druid is a popular large scale OLAP data store.
Superset is a UI tool for interactive analytics. Both are active in GitHub. DL extracts the dimensions and metrics from JSON files, and pre-configure Druid settings for each topics. DL pre-builds Superset interactive dashboards.
7
Other Storages Based on future requirements, other storages may be supported. Some Candidates: Search engine, ElasticSearch or Apache Solr. OpenTSDB, a distributed, scalable Time Series Database.
8
Summary Storage Target Document Store (Couchbase)
Document Storage and Retrieval Document Store + Spark Customized Computation OLAP (Druid and Superset) Interactive Analytics Others Based on future requirements
9
In Relation to Other Components
DCAE DCAE focuses on being a part of automated closed control loop on VNFs, storing collected data for archiving has not been covered by DCAE scope. (see ONAP wiki forum) Envision that some DCAE analytics applications may use the data in DataLake. PNDA PNDA is an infrastructure that bundles a wide variety of big data technologies for data processing. Applications are to be developed on the technologies provided by PNDA. The goal of DataLake is to store DMaaP and other data, and build ready-to-use applications around the data, making use of suitable technologies, whether they are provided by PNDA. Currently Couchbase, Druid and Superset are not included in PNDA.
10
Thank You Thank You DataLake proposal on ONAP wiki.
Contact: Guobiao Mo
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.