Download presentation
Presentation is loading. Please wait.
1
Machine Learning Library for Apache Ignite
Examiners Dr. Scott Spetka Dr. Bruno Andriamanalimanana Dr. Roger Cavallo Ignite-ML Machine Learning Library for Apache Ignite Corey Pentasuglia Masters Project 5/11/2016
2
Masters Project Objectives
Research DML (Distributed Machine Learning) Preparation for Doctoral studies Compare current DML frameworks Develop a library built over Apache Ignite for Machine Learning Currently, nothing available to perform Machine Learning in Ignite framework Spark (sister project of Ignite provides MLlib) Compare Ignite & Spark Apache Ignite might be a better framework for DML (especially in a practical sense) Develop Ignite-ML as a practical library for attempting ML on transactional data as well as analytical processing (many others focus only on analytics). Develop idea of TDML (Transactional DML) Comparison Paper Summarize findings in a short paper
3
What is Apache Ignite? In Memory Data Fabric
An open source Apache Incubator project Started and still mostly maintained by a company named GridGain Ignite contains several key components for high performance computing within a distributed architecture
4
Compute Grid Designed for high performance, low latency, and scalability Availability is definitely considered. Jobs will execute as long as there is at least one node Failover Included a load balancer to orchestrate jobs that have failed
5
Compute Grid (Key Benefits)
Fault Tolerance If a node fails, jobs will automatically be transferred over to another node (if available) Load Balancing Automatic load balancing will occur to allow an efficient distribution of work among the available nodes Job Scheduling Priority can be set for tasks that run on the grid, however by default tasks will be worked off randomly Direct MapReduce API
6
Ignite vs. Spark
7
OLTP vs. OLAP
8
Ignite vs. Spark (Cont.) Apache Ignite (Hybrid OLTP & OLAP)
Spark (OLAP) In-Memory Treats memory as primary storage Better indexing Avoids (de)serialization Reduced latency RDD (Resilient Distributed Datasets) Real streaming (No delays) Utilizes Off-heap memory Avoid garbage collection pauses In-Memory SQL indexes Avoids full scans of datasets Map Reduce Fully compatible with Hadoop MR APIs Support for legacy MR code In-Memory Used only for processing RDD (Resilient Distributed Datasets) Created beforehand Is immutable Language Support Scala, Java, Python, and R (Ignite – Scala, and Java)
9
Grid Configuration The lab machines selected can be seen below:
More machine could easily be added, however I have been utilizing these four lab machines
10
Project Background Initially just developed some code to perform KNN in Apache Ignite JavaML & Apache Ignite Utilized JavaML to perform KNN on the compute grid Determined that an extensible library would be more useful to others Switched to Weka and Apache Ignite Weka is a more well adopted ML library Developed the start of an extensible architecture Allows others to plugin in additional ML algorithms Attempts to auto-scale based upon cluster size
11
Ignite-ML Ignite-ML is my own project built on top of the Apache Ignite In- Memory data fabric ( grid/tree/master/ignite-ml) The library consists of API and Executor sub modules The idea is that the library provides an extensible entry point for plugging in Machine Learning algorithms to Ignite The API contains custom defined exceptions, request objects, response objects, and handlers Many other distributed ML frameworks focus on data analysis after data has been stored With Ignite being a hybrid OLTP/OLAP, I’d like to focus on performing ML algorithms with transactional data
12
Ignite-ML Use Case Ignite-ML Ignite-ML Ignite-ML In OLTP Feedback OLAP
Possible feedback from unsupervised learning Normalize and classify incoming data using supervised learning and give feedback OLAP Other Storage Other Storage HDFS Other Storage Normalize and classify incoming signals Suppose we currently know of two classes. Say class 1 and 2, but we receive data we cannot classify with confidence… We can start to perform unsupervised learning, which may eventually lead to a new class
13
MLLib K-means Clustering
14
Ignite-ML Knn Clustering
NOTE: It should be obvious that the K-Means and Knn algorithms are very different. However, these slides are meant to portray the different in syntax and semantics This is an example of my Knn APP that utilizes my Ignite-ML library (available in Github)
15
Ignite-ML Extensions App Ignite Framework Ignite-ML Default
Register custom or new machine learning algorithms by adding requests, response, and handler classes Ignite-ML Extensions App Ignite Framework Ignite-ML Default Ignite-ML-API Ignite-ML-Executor Custom Requests Exceptions Handlers Executors Requests Responses Responses Executor Interfaces Handlers
16
Further Work Continue to develop Ignite-ML
Hopefully get more integrated with the Apache Ignite project (become contributor) Lay foundation for Doctoral studies Explore the idea of TDML (Transacted Distributed Machine Learning) Needs some way of providing additional configuration Integrate more of the caching features of Ignite Better plan for the integration of supervised and unsupervised learning
17
Community
18
Citation https://ignite.apache.org/
2015/slides/DmitriySetrakyan_BeyondTheDataGridFastDataProcessingWithApacheIgniteincubating.pdf
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.