Apache Ignite Compute Grid Research Corey Pentasuglia.

Slides:



Advertisements
Similar presentations
Distributed Processing, Client/Server and Clusters
Advertisements

MPI Message Passing Interface
MapReduce Simplified Data Processing on Large Clusters
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Distributed Systems CS
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Spark: Cluster Computing with Working Sets
CLOUD COMPUTING AN OVERVIEW & QUALITY OF SERVICE Hamzeh Khazaei University of Manitoba Department of Computer Science Jan 28, 2010.
Operating Systems High Level View Chapter 1,2. Who is the User? End Users Application Programmers System Programmers Administrators.
GridGain In-Memory Data Fabric:
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 2: Operating-System Structures Modified from the text book.
Algorithms for Self-Organization and Adaptive Service Placement in Dynamic Distributed Systems Artur Andrzejak, Sven Graupner,Vadim Kotov, Holger Trinks.
Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Yavor Todorov. Introduction How it works OS level checkpointing Application level checkpointing CPR for parallel programing CPR functionality References.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Selenium server By, Kartikeya Rastogi Mayur Sapre Mosheca. R
Next Generation of Apache Hadoop MapReduce Owen
Ignite in Sberbank: In-Memory Data Fabric for Financial Services
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Lecture 5. Example for periority The average waiting time : = 41/5= 8.2.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Image taken from: slideshare
Operating System Structures
Apache Ignite Data Grid Research Corey Pentasuglia.
Chapter 4: Threads.
Hadoop Aakash Kag What Why How 1.
Machine Learning Library for Apache Ignite
Introduction to Distributed Platforms
Self Healing and Dynamic Construction Framework:
Spark Presentation.
Network Load Balancing
Async or Parallel? No they aren’t the same thing!
PREGEL Data Management in the Cloud
Introduction to Operating System (OS)
Apache Hadoop YARN: Yet Another Resource Manager
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Software Engineering Introduction to Apache Hadoop Map Reduce
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Chapter 16: Distributed System Structures
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Distributed System Structures 16: Distributed Structures
湖南大学-信息科学与工程学院-计算机与科学系
Chapter 2: System Structures
Operating Systems.
Ch 4. The Evolution of Analytic Scalability
Hadoop Technopoints.
Introduction to Apache
Lecture Topics: 11/1 General Operating System Concepts Processes
Threads Chapter 4.
Hybrid Programming with OpenMP and MPI
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
Multithreaded Programming
Outline Chapter 2 (cont) OS Design OS structure
Charles Tappert Seidenberg School of CSIS, Pace University
Chapter 4: Threads.
System calls….. C-program->POSIX call
Operating System Overview
MapReduce: Simplified Data Processing on Large Clusters
Lecture 29: Distributed Systems
Presentation transcript:

Apache Ignite Compute Grid Research Corey Pentasuglia

What is Apache Ignite? In Memory Data Fabric An open source Apache Incubator project Started and still mostly maintained by a company named GridGain Ignite contains several key components for high performance computing within a distributed architecture

Compute Grid Designed for high performance, low latency, and scalability Availability is definitely considered. Jobs will execute as long as there is at least one node Failover Included a load balancer to orchestrate jobs that have failed

Compute Grid (Key Benefits) Fault Tolerance If a node fails, jobs will automatically be transferred over to another node (if available) Load Balancing Automatic load balancing will occur to allow an efficient distribution of work among the available nodes Job Scheduling Priority can be set for tasks that run on the grid, however by default tasks will be worked off randomly Direct MapReduce API

Ignite Vs. MPI Apache Ignite Grid Any node can be an orchestrator Automatic network association Highly Portable Really just requires Java to execute Runs in virtual environment (has improved) MPI (Message Passing Interface) Beowulf Clustering Has Master Node Requires network configuration Claims portability May be subject to C libraries No overhead of running virtualized

Grid Configuration The lab machines selected can be seen below: While the plain Ignite install can be started and utilized, I have created custom JAR files that contain my code These JARS can be run on any machine that has Java installed Machines that will not be the orchestrator can utilize the plain install of Apache Ignite Code is delivered to remote nodes to be executed

Closure and Runnable/Callable Closure Essentially a Lambda function Block of code that encloses body and any outside variables Ex. ignite.compute().broadcast(() -> System.out.println("Hello World!")); Runnable/Callable Extends either the Java Runnable or Callable Interfaces Runnable does not return results These can be defined to enclose your logic to be executed and simple passed to Ignite

Ignite MapReduce Apache Ignite comes with a simplified in-memory MapReduce Apache Ignite is really able to optimize the MapReduce paradigm by working with data in-memory Personally found most of the Ignite APIs really easy to work with and well developed Configurable result policies WAIT – Waits for remaining jobs to complete REDUCE – Immediately moves the reduce() method FAILOVER – Failover the job to another node

Node Sharing Similar to typical local thread shared state Keeps state on a given node

Collocated Computing Data Locality Ignite provides the ability to configure jobs to run on nodes where data is local Reduces the need for network IO Utilizes a concept similar to affinity to identify the node to execute on

Checkpointing Apache Ignite also provides the ability to “checkpoint” the state of a job that’s running. Protects against failures Ability to restart failed nodes ComputeTaskSession (Class) loadCheckpoint(String) removeCheckpoint(String) saveCheckpoint(String)

Example 1 (Hello World) Utilize four of the Linux lab machines to run Hello World in the Ignite Compute Grid A Java application has been written to broadcast “Hello World” code to each of the nodes By utilizing the following code, one could broadcast only to remote nodes ClusterGroup rmts = ignite.cluster().forRemotes(); ignite.compute().broadcast(() -> System.out.println("Hello World!")); Notice the use of the Cluster group This is a method if defining particular nodes to execute on

Example 2 (Word Count) Utilize four of the Linux lab machines to run an application in the Ignite Compute Grid A Java application has been written to broadcast a world counting closure to each of the nodes Each node will receive a word to be counted The results will be aggregated at the orchestrating node

Masters Project Work Utilize four of the Linux lab machines to run an application in the Ignite Compute Grid Researching Distributed Machine Learning in preparation of Doctoral work Develop a distributed classification application to run in Apache Ignite The application will take a dataset to be used a training data Subsequent datasets can then be classified against the training set using the K-Nearest Neighbors Algorithm Results will be aggregated at the acting masters node

Further Work I’d like to explore more examples with the Apache Ignite Compute Grid It would be interesting to compare latency against MPI Working on Master Project utilizing Apache Ignite

Community

Citation (Entire website, documentation, images, and linked videos) GitHub -