Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning Library for Apache Ignite

Similar presentations


Presentation on theme: "Machine Learning Library for Apache Ignite"— Presentation transcript:

1 Machine Learning Library for Apache Ignite
Examiners Dr. Scott Spetka Dr. Bruno Andriamanalimanana Dr. Roger Cavallo Ignite-ML Machine Learning Library for Apache Ignite Corey Pentasuglia Masters Project 5/11/2016

2 Masters Project Objectives
Research DML (Distributed Machine Learning) Preparation for Doctoral studies Compare current DML frameworks Develop a library built over Apache Ignite for Machine Learning Currently, nothing available to perform Machine Learning in Ignite framework Spark (sister project of Ignite provides MLlib) Compare Ignite & Spark Apache Ignite might be a better framework for DML (especially in a practical sense) Develop Ignite-ML as a practical library for attempting ML on transactional data as well as analytical processing (many others focus only on analytics). Develop idea of TDML (Transactional DML) Comparison Paper Summarize findings in a short paper

3 What is Apache Ignite? In Memory Data Fabric
An open source Apache Incubator project Started and still mostly maintained by a company named GridGain Ignite contains several key components for high performance computing within a distributed architecture

4 Compute Grid Designed for high performance, low latency, and scalability Availability is definitely considered. Jobs will execute as long as there is at least one node Failover Included a load balancer to orchestrate jobs that have failed

5 Compute Grid (Key Benefits)
Fault Tolerance If a node fails, jobs will automatically be transferred over to another node (if available) Load Balancing Automatic load balancing will occur to allow an efficient distribution of work among the available nodes Job Scheduling Priority can be set for tasks that run on the grid, however by default tasks will be worked off randomly Direct MapReduce API

6 Ignite vs. Spark

7 OLTP vs. OLAP

8 Ignite vs. Spark (Cont.) Apache Ignite (Hybrid OLTP & OLAP)
Spark (OLAP) In-Memory Treats memory as primary storage Better indexing Avoids (de)serialization Reduced latency RDD (Resilient Distributed Datasets) Real streaming (No delays) Utilizes Off-heap memory Avoid garbage collection pauses In-Memory SQL indexes Avoids full scans of datasets Map Reduce Fully compatible with Hadoop MR APIs Support for legacy MR code In-Memory Used only for processing RDD (Resilient Distributed Datasets) Created beforehand Is immutable Language Support Scala, Java, Python, and R (Ignite – Scala, and Java)

9 Grid Configuration The lab machines selected can be seen below:
More machine could easily be added, however I have been utilizing these four lab machines

10 Project Background Initially just developed some code to perform KNN in Apache Ignite JavaML & Apache Ignite Utilized JavaML to perform KNN on the compute grid Determined that an extensible library would be more useful to others Switched to Weka and Apache Ignite Weka is a more well adopted ML library Developed the start of an extensible architecture Allows others to plugin in additional ML algorithms Attempts to auto-scale based upon cluster size

11 Ignite-ML Ignite-ML is my own project built on top of the Apache Ignite In- Memory data fabric ( grid/tree/master/ignite-ml) The library consists of API and Executor sub modules The idea is that the library provides an extensible entry point for plugging in Machine Learning algorithms to Ignite The API contains custom defined exceptions, request objects, response objects, and handlers Many other distributed ML frameworks focus on data analysis after data has been stored With Ignite being a hybrid OLTP/OLAP, I’d like to focus on performing ML algorithms with transactional data

12 Ignite-ML Use Case Ignite-ML Ignite-ML Ignite-ML In OLTP Feedback OLAP
Possible feedback from unsupervised learning Normalize and classify incoming data using supervised learning and give feedback OLAP Other Storage Other Storage HDFS Other Storage Normalize and classify incoming signals Suppose we currently know of two classes. Say class 1 and 2, but we receive data we cannot classify with confidence… We can start to perform unsupervised learning, which may eventually lead to a new class

13 MLLib K-means Clustering

14 Ignite-ML Knn Clustering
NOTE: It should be obvious that the K-Means and Knn algorithms are very different. However, these slides are meant to portray the different in syntax and semantics This is an example of my Knn APP that utilizes my Ignite-ML library (available in Github)

15 Ignite-ML Extensions App Ignite Framework Ignite-ML Default
Register custom or new machine learning algorithms by adding requests, response, and handler classes Ignite-ML Extensions App Ignite Framework Ignite-ML Default Ignite-ML-API Ignite-ML-Executor Custom Requests Exceptions Handlers Executors Requests Responses Responses Executor Interfaces Handlers

16 Further Work Continue to develop Ignite-ML
Hopefully get more integrated with the Apache Ignite project (become contributor) Lay foundation for Doctoral studies Explore the idea of TDML (Transacted Distributed Machine Learning) Needs some way of providing additional configuration Integrate more of the caching features of Ignite Better plan for the integration of supervised and unsupervised learning

17 Community

18 Citation https://ignite.apache.org/
2015/slides/DmitriySetrakyan_BeyondTheDataGridFastDataProcessingWithApacheIgniteincubating.pdf


Download ppt "Machine Learning Library for Apache Ignite"

Similar presentations


Ads by Google