Blaze - An IoT Analytics Engine

Slides:



Advertisements
Similar presentations
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Running Hadoop-as-a-Service in the Cloud
Hive: A data warehouse on Hadoop
Multiple Tiers in Action
Raghav Ayyamani. Copyright Ellis Horowitz, Why Another Data Warehousing System? Problem : Data, data and more data Several TBs of data everyday.
SERVER Betül ŞAHİN What is this? Betül ŞAHİN
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Introduction to Hadoop and HDFS
Distributed Systems Fall 2014 Zubair Amjad. Outline Motivation What is Sqoop? How Sqoop works? Sqoop Architecture Import Export Sqoop Connectors Sqoop.
Hive Facebook 2009.
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
An Introduction to HDInsight June 27 th,
Performance Evaluation of Image Conversion Module Based on MapReduce for Transcoding and Transmoding in SMCCSE Speaker : 吳靖緯 MA0G IEEE.
Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.
ABSTRACT The JDBC (Java Database Connectivity) API is the industry standard for database- independent connectivity between the Java programming language.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.
Google Map Engine Can export images to Map Engine from Earth Engine
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
GROUP PresentsPresents. WEB CRAWLER A visualization of links in the World Wide Web Software Engineering C Semester Two Massey University - Palmerston.
Beyond Hadoop The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise. Ibrahim.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Apache Hadoop on Windows Azure Avkash Chauhan
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
The Holmes Platform and Applications
CNIT131 Internet Basics & Beginning HTML
Connected Infrastructure
Web Technologies Computing Science Thompson Rivers University
Big Data Enterprise Patterns
Data Platform and Analytics Foundational Training
Connected Maintenance Solution
Big Data A Quick Review on Analytical Tools
Integrating QlikView with MPP data sources
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Chapter 10 Data Analytics for IoT
Spark Presentation.
Hadoopla: Microsoft and the Hadoop Ecosystem
Connected Maintenance Solution
Connected Infrastructure
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
SQOOP.
Central Florida Business Intelligence User Group
Add intelligence to Dynamics AX with Cortana Intelligence suite
07 | Analyzing Big Data with Excel
Ministry of Higher Education
Gen-Tao Chiang Data and Analytic Engineer
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Web Browser server client 3-Tier Architecture Apache web server PHP
Tools for Processing Big Data Jinan Al Aridhee and Christian Bach
Data science and machine learning at scale, powered by Jupyter
Introduction to Apache
Overview of big data tools
Setup Sqoop.
Data Warehousing in the age of Big Data (1)
Charles Tappert Seidenberg School of CSIS, Pace University
Web Technologies Computing Science Thompson Rivers University
Storing and Processing Sensor Networks Data in Public Clouds
IBM C IBM Big Data Engineer. You want to train yourself to do better in exam or you want to test your preparation in either situation Dumpspedia’s.
Big-Data Analytics with Azure HDInsight
TN19-TCI: Integration and API management using TIBCO Cloud™ Integration
Introduction to Azure Data Lake
SQL Server 2019 Bringing Apache Spark to SQL Server
Pig Hive HBase Zookeeper
Big Data.
Presentation transcript:

Blaze - An IoT Analytics Engine Project Slides Sri Ranga Teja Kolli (31152056) Prashanth Reddy Billa (60804572) Sudeep Meduri (51884271)

Motivation and Goals Extremely rapid expansion of Internet of Things Major service providers in the industry have or are developing custom platforms to cater to IoT We wanted analytics framework to which any IoT system can be added to. This would enable multi variant queries or analytics to be done. Goal of the project is to build an end-to-end system, and illustrate using sample use cases. Image credit: http://www.i-scoop.eu/internet-of-things/

Related Work MapReduce Processing on IoT Clouds, Ichiro Satoh Published in: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom) MapReduce based Data Processing on IoT, Ichiro Satoh 2014 IEEE International Conference on Internet of Things (iThings) IEEE Green Computing and Communications (GreenCom) IEEE Cyber, Physical and Social Computing(CPSCom) A Data Processing Framework for Distributed Embedded Systems, Ichiro Satoh 9th International Symposium on Intelligent Distributed Computing – IDC'2015, Guimarães, Portugal, October 2015

System Flow Diagram MQTT Broker User Interface System Flow Diagram Visual Analytics and Text Results. JSON data tree view to analyze sensor information Query Interface Hive SQL MQTT Broker Generate Intermediate Files such as CSV and Store user query, results and other information Hive JDBC Driver JSON data HiveServer2 - Hive Thrift Server, Hive Driver, Metastore, Apache Derby DB Query Result MapReduce v2 JSON data Yarn Cluster Resource Management Sensor devices Hadoop Distributed File System

System Design Specifics Prototype of complete system which we have developed Sensor Data Publishing - Sensor data published to the MQTT broker Sensor Data Ingestion - The data is pulled into our Hadoop Distributed File System Hadoop and HiveServer2 - For processing the data and getting relevant information out of the huge data sets Web UI and Analytics (JSP, HTML5, CSS3, Javascript, MySQL) Login and Registration Basic Mode - We can select a specific sensor and select a specific action to be executed Advanced Mode - We can upload our own sensor data file (JSON) and perform custom Hive Query Executed Tasks shown - System keeps tracks of user’s queries and all the tasks executed along with results Hadoop and Hive Status - Status indicating whether the environment is ready or is it yet to be started

Testing of our system Verify that the IoT endpoint is connected and stays live. Verify all events being generated by IoT endpoint is being received downstream. Verify that sensor data for each sensor is created properly into a separate file and pushed into HDFS. Verify if continuous ingestion of data into the JSON file works simultaneously with user queries in the User Interface and does not lead to any locking issues. Verify that during queries, a hive table with sensor schema and current data in sensor data file is created. Verify that the maps and graphs rendered and consistent with the data set and also scaling is done properly.

Evaluation Plan Comparison of response time: With the same sensor dataset pushed into relational database such as MySQL Average Execution Time: (Tmap/M) + (Treduce/N) Maximum Execution Time: Max(Tmap(i)) + Max(Treduce(j)) Minimum Execution Time: Min(Tmap(i)) + Min(Treduce(j)) Make Span: FinishTime(ReduceTask) = Time span from start to finish Delay Time: Start Time(Map Task) + Start Time(Reduce Task) -­ Finish Time(Map task) Network Cost: Delay Time * Network Cost per Unit Time

Conclusion and Future Scope of our project Successfully demonstrated an end-to-end system to which any IoT environment could be connected, and different analytics could be run on the data generated by each. Incorporate additional complex learning algorithms in the analytics part. Thank You! :)