Metis Data Science Meetup:

Metis Data Science Meetup:
Saswata Sengupta Nov 30, 2016 Metis Data Science Meetup: Analyzing IBM Watson IoT Events Using Spark & Jupyter Notebooks

Agenda + + - Overview - Introduction to IBM Watson IoT Platform
- Introduction to Node Red + IBM and Spark Spark as a Service IBM DataScience Experience Python, Panda and DataFrame Jupyter notebook and Visualization + Recorded Demo + Walkthrough

IBM Watson IoT Platform and Node-Red

What is IBM Watson IoT Platform ?
IIIBM Watson IoT Platform What is IBM Watson IoT Platform ? The IBM Internet of Things service lets your apps communicate with and consume data collected by your connected devices, sensors, and gateways. IBM recipes make it super easy to get devices connected to our Internet of Things cloud.Apps can then use our real-time and REST APIs to communicate with your devices and consume the data you've set them up to collect. This is available as part of the IBM Bluemix Cloud offering IBM | Spark

Connect your devices securely to the cloud
IIIBM Watson IoT Platform Connect your devices securely to the cloud Before your apps can get to work, you need to get your devices connected up! IBM have a set of verified instructions, or 'recipes', for connecting devices, sensors and gateways from a variety of partners and individuals. Build an app that talks to your devices Communications between your devices and the cloud happen via the open, lightweight MQTT protocol. For example you might have a sensor that collects and sends humidity readings every minute. IBM REST and real-time APIs allow you to quickly pull that device data into your apps for further analysis. IBM | Spark

Flows can be then deployed to the runtime in a single-click.
IIIBM Watson IoT Platform What is Node-Red ? Node-RED provides a browser-based flow editor that makes it easy to wire together flows using the wide range nodes in the palette. Flows can be then deployed to the runtime in a single-click. JavaScript functions can be created within the editor using a rich text editor. A built-in library allows you to save useful functions, templates or flows for re-use. Node-Red Starter pack is available as part of the IBM Bluemix Cloud offering IBM | Spark

IBM and Spark

Spark Ecosystem Business Applications and Business Intelligence
Apache Spark Spark SQL Streaming MLlib (machine learning) GraphX Hadoop Database Mainframe Data- warehouse IBM | Spark

Spark is complementary to Hadoop, but much faster, with in-memory performance
IBM | Spark

1 2 3 IBM’s 3 pillars for Apache Spark Spark as a
component of BigInsights 2 Spark-as-a-Service on Bluemix 3 IBM products on Spark SPSS, Commerce, Security, Power, Heathcare, and many more IBM | Spark

1 2 3 Why does Spark matter to a business?
Data Science Design Development Why does Spark matter to a business? 1 Spark makes it easier to access and work with all data Enables new data-based use cases All data: Internal/External, Structured/Unstructured Real-time insights, from all data sources Automates analytics with machine learning Clients that lead in data, lead in their industry 2 Spark lets you develop line-of-business applications faster 3 Spark learns from data and delivers in real-time IBM | Spark

IBM has the largest investment in Spark of any company in the world
IBM Spark Technology Center Top committer/contributor 300+ inventors Commitment to educate 1 million data scientists Contributed SystemML Founding member of AMPLab Partnerships in the ecosystem IBM | Spark

IBM DataScience Experience- Python, Panda and Jupyter Notebooks

IBM Data Science Exprience
IBM Data Science Experience Now you can create value faster using the best of open source and IBM together. Built for data scientists by data scientists, the IBM Data Science Experience is a new cloud-based, social workspace that helps data professionals consolidate create and collaborate across multiple open source tools such as R and Python, Pandas and DataFrame. Data Science Experience uses Apache Spark as a service for distributed analytic workload management. IBM | Spark

IIPython, Pandas and DataFrame
Python is a widely used high-level, general-purpose, interpreted, dynamic programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than possible in languages such as C++ or Java. Data Engineers and Data Scientist have been using python with Pandas, NumPy and SciPy to build analytical models. It is a viable alternative to R programming language. Pandas are python libraries for for data manipulation and analysis. DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or SQL table, or a dict of Series objects. IBM | Spark

What Is A Jupyter Notebook?
IIIBM Data Science Experience - Notebooks What Is A Jupyter Notebook? In this case, "notebook" or "notebook documents" denote documents that contain both code and rich text elements, such as figures, links, equations, ... Because of the mix of code and text elements, these documents are the ideal place to bring together an analysis description and its results as well as they can be executed perform the data analysis in real time. IBM | Spark

IBM Watson IOT Event processing and Analysis using Spark and Notebook
IBM | Spark

Architecture Components
IBM”s Watson IoT Platform provides an integrated cloud-and-edge analytics programming model that allows control and optimization over the data flowing between edge devices and the Cloud Apache Spark is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant processing of live data streams. The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. In this demo we will learn ways to explore historical events captured on the IBM Watson IoT Platform. Node-RED is a visual tool for wiring the Internet of Things, this will be used to connect to IoT Platform and store the data in cloudant. Cloudant is an open source non-relational, distributed database service of the same name. Cloudant is based on the Apache-backed CouchDB project and the open source BigCouch project. Cloudant will be used as a datastore. IBM | Spark

Demo steps available here-
Shared Notebook You will need to create a datascience.ibm.com account and a bluemix account(It is free) IBM | Spark

Please feel free to contact with me at Saswata.sengupta@ibm.com
Thank You Please feel free to contact with me at IBM | Spark

Metis Data Science Meetup:

Similar presentations

Presentation on theme: "Metis Data Science Meetup:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Metis Data Science Meetup:

Similar presentations

Presentation on theme: "Metis Data Science Meetup:"— Presentation transcript:

Similar presentations

About project

Feedback