Time Series Data Repository (TSDR) Project Proposal
TSDR Functional Objectives To capture ODL data into a persistent time series data repository This includes: Statistics counters Performance data Health status information Operational configuration data To facilitate various applications built on top of TSDR Applications include: Operational configuration optimization Traffic engineering Network analytics with automated intelligence Security risk detection Performance analysis Major functions Data Collection Data Storage Data Queries Data Aggregation Data Purge Lithium Focus TSDR functionalities on OpenFlow Statistics data
TSDR Design Objectives Generic and Extensible architectural framework Generic and extensible TSDR Data Model. Abstract and generic TSDR Persistence Layer with TSDR Persistence APIs Allow implementation of various data store plugins under TSDR Persistence Layer with HBase Plugin as an example TSDR Data Store implementation. Scalable with high performance Providing both integrated and distributed architectures to handle different scales of time series data Fully utilizing MD-SAL’s clustering capability to handle performance and scalability in large scale deployment scenarios
TSDR Integrated Architecture TSDR Data Services including Data Collection, Data Storage, Data Query, Data Purging, and Data Aggregation are MD-SAL services. Data Collection service receives time series data published on MD-SAL messaging bus from MD-SAL southbound plugins. Data Collection service communicates with Data Storage service to store the data into TSDR. TSDR data services access TSDR Data Store such as HBase through generic TSDR Data Persistence Layer. Needs MD-SAL notification subsystem support.
TSDR Distributed Architecture In large data center deployment scenarios, TSDR Distributed Architecture would be needed to handle the performance and scalability. In distributed architecture, TSDR data services are deployed in a separate MD-SAL instance. The data pushed onto MD-SAL messaging bus by ODL southbound plugin are propagated to the other MD-SAL instance for TSDR data services to process into TSDR data repository. Needs ODL clustering support.
TSDR Data Flow with multiple data models TSDR Data Flow involves multiple data models including source data model ( OpenFlow statistics), TSDR data model, and TSDR plugin ( HBase) data model. Data Collection Service subscribes to receive OpenFlow Statistics data from MD-SAL Notification Subsystem and passes the data to Data Storage Service. Data Storage Service converts OpenFlow Statistics data model to TSDR data model. HBase TSDR Plugin converts TSDR data model to HBase specific data model based on HBase TSDR schema design.
Unstructured or Semi-Structured data consideration – for future release For unstructured or semi-structured data such as syslog data, MD-SAL receives the data in the format of syslog specifica data model. Data Filtering and Preprocessing can be added to filter out the data noise and optionally extract structured information from the semi-structured data. Third party specific TSDR plugin such as Splunk Plugin could be added under TSDR Data Persistence Layer to work with proprietary data stores. Data Aggregation Service is not needed when handling unstructured data. Third party tools such as Splunk could leverage Data Query Service to obtain the unstructured data from TSDR and add application specific processing on top of it.
TSDR Data Model The goal of the TSDR data model design: Generic Extensible Scalable Performance Optimized The data model captures: Statistics data Log type of data Note: To add a new group, extend TSDRBaseRecord DataCategory contains: Flow Stats Interface Stats Queue Stats Flow Group Stats Flow Meter Stats Log Records Note: More categories can be added to the above list. RecordKeys contains: A list of composite keys Different categories contain different set of keys Key set validation is needed based on different data categories
TSDR Persistence APIs Interface Name Description/comments Extends from ODL Common APIs? Specific to TSDR Persistence API? Will be implemented in HBase plugin in Lithium? save() Including saving one or a list of objects Yes No find() Including query based on a list of IDs, with specified criteria, and paging support count() delete() Including delete with one or a list of IDs, and delete the entire table exists() Including query based on one or a list of IDs min(), max(), avg() For Data Aggregation purpose
HBase TSDR Schema – Raw Data TableName RowKey Column Family: Column Qualifier = Cell Value FlowMetrics MetricID_NodeID_TableID(_FlowID)_timestamp ‘raw’ = metric_value InterfaceMetrics MetricID_NodeID_TableID(_PortID)_timestamp QueueMetrics MetricID_NodeID_TableID_PortID_QueueID_timestamp GroupMetrics MetricID_NodeID_GroupID(_GroupBucketID)_timestamp MeterMetrics MetricID_NodeID_GroupID(_MeterID)_timestamp Schema Design considerations: General HBase Schema Design Rules applied: Keep RowKey, Column Family Key, Column Qualifier as short as possible. Design the RowKey properly so as to keep rows evenly distributed in multiple data nodes. Keep the number of column family low Other performance considerations: Multiple tables are created based on the data categories in the TSDR data model. Data storage and query operations run much faster on smaller data sets stored in HBase tables with structured keys.
HBase TSDR Schema – Aggregated Data TableName RowKey Column Family: Column Qualifier = Cell Value HourlyFlowMetrics MetricID_NodeID_TableID(_FlowID)_timestamp ‘min = metric_value ‘max’ = metric_value ‘avg’ = metric_value HourlyInterfaceMetrics MetricID_NodeID_TableID(_InterfaceID)_timestamp HourlyQueueMetrics MetricID_NodeID_TableID_PortID_QueueID_timestamp HourlyGroupMetrics MetricID_NodeID_GroupID(_GroupBucketID)_timestamp HourlyMeterMetrics MetricID_NodeID_GroupID(_MeterID)_timestamp For performance consideration, we design multiple aggregation tables with different granularity. Aggregation tables with different granularity will have similar schema as displayed above
HBase TSDR Data Model TSDR HBase Plugin converts the generic TSDR data model into HBase specific data model based on HBase schema design. TSDR HBase Plugin leverages this HBase specific data model to implement the generic TSDR Persistence APIs including storage, query, purging, and aggregation to complete the TSDR data services in HBase.
TSDR Scope in Lithium In the Lithium release, we will focus on the following deliverables: Architectural framework Data Type Support as specified in the architectural design OpenFlow Statistics Deployment scenarios support Data Collection mechanisms TSDR Integrated Architecture HBase on Hadoop single node deployment scenario Implement Pub/Sub collection mechanism Data Persistence Layer Functionality implementation Complete TSDR Persistence APIs with interface definition Data Collection Data Storage TSDR Plugin Data Model implementation HBase plugin as an example implementation Focus on the storage API implementation in HBase plugin to support Data Storage Service in Lithium TSDR Data Model to support OpenFlow Statistics HBase Data Model for HBase Plugin implementation