Time Series Data Repository (TSDR)

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.
Integrating Cybersecurity Log Data Analysis in Hadoop Bryan Stearns, Susan Urban, Sindhuri Juturu Texas Tech University 2014 NSF Research Experience for.
LACP Project Proposal.
Device Driver Framework Project October 2014.
Need for SOA database for storing SOA data Divya Gade Rejitha Rajasekhar.
Chapter 19: Network Management Business Data Communications, 4e.
A survey of commercial tools for intrusion detection 1. Introduction 2. Systems analyzed 3. Methodology 4. Results 5. Conclusions Cao er Kai. INSA lab.
1 Personal Activity Coordinator (PAC) Xia Hong UC Berkeley ISRG retreat 1/11/2000.
NOV 20, 2014 Abi Varghese Tiju John Mahesh Govind
CHAPTER 3 DATABASES AND DATA WAREHOUSES. 3-2 STUDENT LEARNING OUTCOMES 1.Describe business intelligence and its role 2.Compare databases and data warehouses.
Understanding and Managing WebSphere V5
Distribution Statement A. Approved for public release; distribution is unlimited. Test and Evaluation/Science and Technology Program Rapid Data Analyzer.
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
31 January 2007Craig E. Ward1 Large-Scale Simulation Experimentation and Analysis Database Programming Using Java.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
THE GITB TESTING FRAMEWORK Jacques Durand, Fujitsu America | December 1, 2011 GITB |
An Introduction to Software Architecture
DAY 14: ACCESS CHAPTER 1 Tazin Afrin October 03,
Persistence Store Project Proposal.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Introduction to Microsoft Access Overview 1. Introduction What is Access? A relational database management system What is a Relational Database? Organized.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Engr. M. Fahad Khan Lecturer Software Engineering Department University Of Engineering & Technology Taxila.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Performance Evaluation on Hadoop Hbase By Abhinav Gopisetty Manish Kantamneni.
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
Client: The Boeing Company Contact: Mr. Nick Multari Adviser: Dr. Thomas Daniels Group 6 Steven BromleyJacob Gionet Jon McKeeBrandon Reher.
Engr. M. Fahad Khan Lecturer Software Engineering Department University Of Engineering & Technology Taxila.
Device Identification & Driver Management TSC Update January 8, 2015.
Server to Server Communication Redis as an enabler Orion Free
OpenDaylight: Introduction, Lithium and Beyond
© 2007 IBM Corporation SOA on your terms and our expertise Software WebSphere Process Server and Portal Integration Overview.
The IBM Rational Publishing Engine. Agenda What is it? / What does it do? Creating Templates and using Existing DocExpress (DE) Resources in RPE Creating.
HP PPM Center release 8 Helping IT answer the tough questions
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Ceilometer + Gnocchi + Aodh Architecture
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
December 30, 2015 Richard Chien Marko Lai Jason Yuan
Collaborative Planning Training. Agenda  Collaboration Overview  Setting up Collaborative Planning  User Setups  Collaborative Planning and Forecasting.
Object storage and object interoperability
Distributed Handler Architecture Beytullah Yildiz
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Company LOGO Network Management Architecture By Dr. Shadi Masadeh 1.
Interactions & Automations
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Confidential | Copyright © 2014 TriZetto Corporation 1.
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
Time Series Data Repository #ODSummit - The Generic, Extensible, and Elastic Data Repository in OpenDaylight for Advanced Analytics.
ODL based AI/ML for Networks Prem Sankar Gopannan, Ericsson
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Present and Future Pedro Andrade (CERN IT) 31 st August.
Atrium Router Project Proposal Subhas Mondal, Manoj Nair, Subhash Singh.
AMSA TO 4 Advanced Technology for Sensor Clouds 09 May 2012 Anabas Inc. Indiana University.
Microsoft Ignite /28/2017 6:07 PM
OpenDaylight Based Machine Learning for Networks
Collectd 101.
Self Healing and Dynamic Construction Framework:
Zhangxi Lin, The Rawls College,
Time Series Data Repository
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Advance Metering Infrastructure (AMI) system awareness Training
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
ONAP Architecture Principle Review
SDMX meeting Big Data technologies
Presentation transcript:

Time Series Data Repository (TSDR) Project Proposal

TSDR Functional Objectives To capture ODL data into a persistent time series data repository This includes: Statistics counters Performance data Health status information Operational configuration data To facilitate various applications built on top of TSDR Applications include: Operational configuration optimization Traffic engineering Network analytics with automated intelligence Security risk detection Performance analysis Major functions Data Collection Data Storage Data Queries Data Aggregation Data Purge Lithium Focus TSDR functionalities on OpenFlow Statistics data

TSDR Design Objectives Generic and Extensible architectural framework Generic and extensible TSDR Data Model. Abstract and generic TSDR Persistence Layer with TSDR Persistence APIs Allow implementation of various data store plugins under TSDR Persistence Layer with HBase Plugin as an example TSDR Data Store implementation. Scalable with high performance Providing both integrated and distributed architectures to handle different scales of time series data Fully utilizing MD-SAL’s clustering capability to handle performance and scalability in large scale deployment scenarios

TSDR Integrated Architecture TSDR Data Services including Data Collection, Data Storage, Data Query, Data Purging, and Data Aggregation are MD-SAL services. Data Collection service receives time series data published on MD-SAL messaging bus from MD-SAL southbound plugins. Data Collection service communicates with Data Storage service to store the data into TSDR. TSDR data services access TSDR Data Store such as HBase through generic TSDR Data Persistence Layer. Needs MD-SAL notification subsystem support.

TSDR Distributed Architecture In large data center deployment scenarios, TSDR Distributed Architecture would be needed to handle the performance and scalability. In distributed architecture, TSDR data services are deployed in a separate MD-SAL instance. The data pushed onto MD-SAL messaging bus by ODL southbound plugin are propagated to the other MD-SAL instance for TSDR data services to process into TSDR data repository. Needs ODL clustering support.

TSDR Data Flow with multiple data models TSDR Data Flow involves multiple data models including source data model ( OpenFlow statistics), TSDR data model, and TSDR plugin ( HBase) data model. Data Collection Service subscribes to receive OpenFlow Statistics data from MD-SAL Notification Subsystem and passes the data to Data Storage Service. Data Storage Service converts OpenFlow Statistics data model to TSDR data model. HBase TSDR Plugin converts TSDR data model to HBase specific data model based on HBase TSDR schema design.

Unstructured or Semi-Structured data consideration – for future release For unstructured or semi-structured data such as syslog data, MD-SAL receives the data in the format of syslog specifica data model. Data Filtering and Preprocessing can be added to filter out the data noise and optionally extract structured information from the semi-structured data. Third party specific TSDR plugin such as Splunk Plugin could be added under TSDR Data Persistence Layer to work with proprietary data stores. Data Aggregation Service is not needed when handling unstructured data. Third party tools such as Splunk could leverage Data Query Service to obtain the unstructured data from TSDR and add application specific processing on top of it.

TSDR Data Model The goal of the TSDR data model design: Generic Extensible Scalable Performance Optimized The data model captures: Statistics data Log type of data Note: To add a new group, extend TSDRBaseRecord DataCategory contains: Flow Stats Interface Stats Queue Stats Flow Group Stats Flow Meter Stats Log Records Note: More categories can be added to the above list. RecordKeys contains: A list of composite keys Different categories contain different set of keys Key set validation is needed based on different data categories

TSDR Persistence APIs Interface Name Description/comments Extends from ODL Common APIs? Specific to TSDR Persistence API? Will be implemented in HBase plugin in Lithium? save() Including saving one or a list of objects Yes No find() Including query based on a list of IDs, with specified criteria, and paging support count() delete() Including delete with one or a list of IDs, and delete the entire table exists() Including query based on one or a list of IDs min(), max(), avg() For Data Aggregation purpose

HBase TSDR Schema – Raw Data TableName RowKey Column Family: Column Qualifier = Cell Value FlowMetrics MetricID_NodeID_TableID(_FlowID)_timestamp ‘raw’ = metric_value InterfaceMetrics MetricID_NodeID_TableID(_PortID)_timestamp QueueMetrics MetricID_NodeID_TableID_PortID_QueueID_timestamp GroupMetrics MetricID_NodeID_GroupID(_GroupBucketID)_timestamp MeterMetrics MetricID_NodeID_GroupID(_MeterID)_timestamp Schema Design considerations: General HBase Schema Design Rules applied: Keep RowKey, Column Family Key, Column Qualifier as short as possible. Design the RowKey properly so as to keep rows evenly distributed in multiple data nodes. Keep the number of column family low Other performance considerations: Multiple tables are created based on the data categories in the TSDR data model. Data storage and query operations run much faster on smaller data sets stored in HBase tables with structured keys.

HBase TSDR Schema – Aggregated Data TableName RowKey Column Family: Column Qualifier = Cell Value HourlyFlowMetrics MetricID_NodeID_TableID(_FlowID)_timestamp ‘min = metric_value ‘max’ = metric_value ‘avg’ = metric_value HourlyInterfaceMetrics MetricID_NodeID_TableID(_InterfaceID)_timestamp HourlyQueueMetrics MetricID_NodeID_TableID_PortID_QueueID_timestamp HourlyGroupMetrics MetricID_NodeID_GroupID(_GroupBucketID)_timestamp HourlyMeterMetrics MetricID_NodeID_GroupID(_MeterID)_timestamp For performance consideration, we design multiple aggregation tables with different granularity. Aggregation tables with different granularity will have similar schema as displayed above

HBase TSDR Data Model TSDR HBase Plugin converts the generic TSDR data model into HBase specific data model based on HBase schema design. TSDR HBase Plugin leverages this HBase specific data model to implement the generic TSDR Persistence APIs including storage, query, purging, and aggregation to complete the TSDR data services in HBase.

TSDR Scope in Lithium In the Lithium release, we will focus on the following deliverables: Architectural framework Data Type Support as specified in the architectural design OpenFlow Statistics Deployment scenarios support Data Collection mechanisms TSDR Integrated Architecture HBase on Hadoop single node deployment scenario Implement Pub/Sub collection mechanism Data Persistence Layer Functionality implementation Complete TSDR Persistence APIs with interface definition Data Collection Data Storage TSDR Plugin Data Model implementation HBase plugin as an example implementation Focus on the storage API implementation in HBase plugin to support Data Storage Service in Lithium TSDR Data Model to support OpenFlow Statistics HBase Data Model for HBase Plugin implementation