ICCCN14 CPS Cloud Panel Jun Wang

Slides:

Advertisements

Similar presentations

Threads, SMP, and Microkernels

Advertisements

Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

UNCLASSIFIED: LA-UR Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National.

DISTRIBUTED COMPUTING

07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

DISTRIBUTED COMPUTING

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.

Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.

PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.

Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

Load Rebalancing for Distributed File Systems in Clouds.

Data-Centric Systems Lab. A Virtual Cloud Computing Provider for Mobile Devices Gonzalo Huerta-Canepa presenter 김영진.

BIG DATA/ Hadoop Interview Questions.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real.

Connected Infrastructure

Virtualization for Cloud Computing

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Latency and Communication Challenges in Automated Manufacturing

Organizations Are Embracing New Opportunities

Chapter 1: Introduction

Chapter 1: Introduction

University of Maryland College Park

Introduction to Distributed Platforms

Types of Operating System

Distributed Network Traffic Feature Extraction for a Real-time IDS

Edinburgh Napier University

Spark Presentation.

Chapter 1: Introduction

Chapter 1: Introduction

Grid Computing.

Parallel Density-based Hybrid Clustering

Connected Infrastructure

Introduction to Operating System (OS)

Chapter 1: Introduction

Introduction to Networks

Chapter 1: Introduction

Hadoop Clusters Tess Fulkerson.

TYPES OFF OPERATING SYSTEM

Chapter 1: Introduction

Chapter 1: Introduction

Chapter 16: Distributed System Structures

湖南大学-信息科学与工程学院-计算机与科学系

CS110: Discussion about Spark

Hadoop Technopoints.

CLUSTER COMPUTING.

CSE8380 Parallel and Distributed Processing Presentation

Architectures of distributed systems Fundamental Models

Chapter 1: Introduction

Architectures of distributed systems Fundamental Models

Interpret the execution mode of SQL query in F1 Query paper

Network+ Guide to Networks, Fourth Edition

CS 345A Data Mining MapReduce This presentation has been altered.

Chapter 1: Introduction

Chapter 1: Introduction

Chapter 1: Introduction

Architectures of distributed systems

Introduction To Distributed Systems

Architectures of distributed systems Fundamental Models

Database System Architectures

Chapter 1: Introduction

Operating System Overview

Chapter 1: Introduction

Presentation transcript:

Scaling up Performance and Scaling out Storage for Cyber Physical System: Research and Opportunities ICCCN14 CPS Cloud Panel Jun Wang University of Central Florida, Orlando In this talk, I will present our … 11/28/2018 Dr. Jun Wang @ University of Central Florida

Dr. Jun Wang @ University of Central Florida A sample small-scale CPS system: Energy management cloud system by Shenzhen KINGSINE ELECTRIC AUTOMATION CO., LTD. Analysis Tools 系统结构图 Realtime DB Model Interface Data end users Data sources 11/28/2018 Dr. Jun Wang @ University of Central Florida

Dr. Jun Wang @ University of Central Florida 系统总体设计 Energy Monitor, Management, And Analysis 通讯网络结构 Energy Data Center Onsite Data Sampling Data Collector Server LAN Wireless Communication Switches Smart Power Meters 11/28/2018 Dr. Jun Wang @ University of Central Florida

Dr. Jun Wang @ University of Central Florida Motivation What is needed, therefore, are software and hardware infrastructures that can support the needs of next-generation large-scale CPS. These new large-scale CPS require high QoS fidelity and a high degree of asset sharing, and must support a large number of system components. 11/28/2018 Dr. Jun Wang @ University of Central Florida

Dr. Jun Wang @ University of Central Florida Challenges Workloads: It is hard to predict the exact load because CPS are influenced by a wide array of physical phenomena for which science has not developed accurate or fast predictive models. Different datasets exhibit distinct characteristics (such as the data from traffic, power grid fluctuations, human movement, and changing weather patterns ) 11/28/2018 Dr. Jun Wang @ University of Central Florida

Limitations of current cloud solutions: Existing cloud computing techniques cannot be directly applied to CPS. Flexibility in cloud is orthogonal to real-time QoS needs. Auto-scaling elastic compute and storage needs to take different datasets own characteristics into consideration Virtualization techniques are not helpful in a real-time QoS system Delay and jitter MapReduce and Hadoop are offline solutions by default 11/28/2018 Dr. Jun Wang @ University of Central Florida

Dr. Jun Wang @ University of Central Florida Opportunities Perform In-situ data processing in original data collector(s) Data are too heavy to move Many raw data are useless, do not need to be saved permanently, and transferred to the cloud server Preprocess raw data locally to extract useful information, such as traffic surveillance video or security monitor data. 11/28/2018 Dr. Jun Wang @ University of Central Florida

Dr. Jun Wang @ University of Central Florida Opportunities – cont… Consolidate all of available smart resources near emergency area and constitute an ad-hoc regional cloud and storage infrastructure Handle spiky traffic near emergency areas This may be feasible as smart buildings, smart vehicles and smart phones could be quickly organized by some unified mechanisms, protocol, interfaces and contribute their computing and storage to enable an in-situ data processing 11/28/2018 Dr. Jun Wang @ University of Central Florida

Dr. Jun Wang @ University of Central Florida Opportunities – cont… MapReduce and Hadoop are in-appropriate to perform real-time analytics and interactive visualization. A potential candidate will be a Spark framework developed by Berkeley AMP lab Spark Streaming is designed with the goal of providing near real time processing with approximately one second latency to such programs. It enables in-memory analytics. Some of the most commonly used applications include web site statistics/analytics, intrusion detection systems, and spam filters. 11/28/2018 Dr. Jun Wang @ University of Central Florida

Dr. Jun Wang @ University of Central Florida Opportunities – cont… Spark Streaming seeks to support these queries while maintaining fault tolerance similar to batch systems that can recover from both outright failures and stragglers Spark Streaming allows high-throughput, fault-tolerant stream processing of live data streams More ideal features: Alternative machine learning algorithms, distributed block management and memory management 11/28/2018 Dr. Jun Wang @ University of Central Florida

Dr. Jun Wang @ University of Central Florida Opportunities – cont… Scalable data locality-aware scheduling 11/28/2018 Dr. Jun Wang @ University of Central Florida

Data Locality-aware Scheduling Collecting the information of data distribution associating with compute nodes. Computing proper tasks for parallel processes such that they can achieve maximum degree of data locality computation. Sending tasks assignment to parallel processes. We propose a scalabel locality-aware middleware for these scientific application. Our SLAM will allow these applications to benefit the locality computation with the use of HDFS. 11/28/2018 Dr. Jun Wang @ University of Central Florida

Long Latency Scheduling Problem for Interactive Application Scheduling a task usually causes hundreds of milliseconds of latency needs to know the local data on the compute node. tens of workers issue task requests simultaneously. these requests are handled sequentially. However, for interactive data processing like in a real-time QoS context, a job should be finished in second Interactive gene search, visualization, log analysis/ error detection This is the workflow of scheduling, to assign a task to a worker, the scheduler usually needs to collect the locality metadata. And the idle message from workers are put into a message Q. With the problem size becoming larger, such a scheduling delay is unaccepteable for interactive data processing. 11/28/2018 Dr. Jun Wang @ University of Central Florida

A Scalable Scheduling Architecture We develop a Scalable Scheduling Architecture for Interactive Analysis Programs, which could be potentially applied to CPS Cloud Multiple schedulers: keep track of data processing tasks. Each worker process: use a novel Modulo-based priority method to schedule its own local tasks (as long as local data exists). Dr. Jun Wang @ University of Central Florida We present a Scalable scheduling to solve the scheduling delay problems. 11/28/2018

Interactive workflow between workers and schedulers If there are unprocessed tasks with local data on the worker hard drive, the worker tries to decide a local task for itself. The worker process selects a candidate task using the Modulo-based priority method and sends a local task execution request to a proper scheduler. The scheduler replies with an approval message to the worker. The worker decodes the reply message and either executes the approved task or tries to schedule some different task if the request is not approved. In our ScalScheduling architecuture, we use multiple schedulers to track the data processing tasks. And we allow each worker to use a modulo-based priority method to schedule its local tasks. 11/28/2018 Dr. Jun Wang @ University of Central Florida

Modulo-based method for Self-scheduling Modulo-based method is to associate the local tasks on worker processes with different priority The priority for the same task on different worker processes varies. Through this way, we can reduce the conflict of scheduling the same task on different workers. Let n denote the number of workers, f denote the number of total tasks, and m denote the number of schedulers. The priority of the task on the worker is computed as, Modulo-based method is to associate the local tasks on each worker with different priority, calculated based on the information of task, worker and scheduler processes. The priority for the same task on different workers varies. Through this way, we can reduce the conflict of execution requests on the same task from different workers. For instance, the task on the right has higher priority; s1 and s2 are the two schedulers each controlling 6 tasks. 11/28/2018 Dr. Jun Wang @ University of Central Florida

Interactive workflow between schedulers and workers If there are no remaining local tasks, the worker enters the remote execution mode The worker sends a remote task assignment request to a scheduler, which is chosen using a probabilistic decision based on the number of unassigned tasks controlled by each scheduler. The scheduler schedules a task for the worker by considering the system load balance and sends an assignment message to the worker. The worker receives the message and executes a assigned remote task, or sends another remote task request to a different scheduler. In our ScalScheduling architecuture, we use multiple schedulers to track the data processing tasks. And we allow each worker to use a modulo-based priority method to schedule its local tasks. 11/28/2018 Dr. Jun Wang @ University of Central Florida

Dr. Jun Wang @ University of Central Florida Experimental Setup Platform: 128 nodes / 256 cores, dual 1.6GHz AMD Opteron processors, 16GB of memory, Gigabit Ethernet, and a 2TB Western Digital SATA disk drive. Linux version: CENTOS55-64 with kernel 2.6. Co-located storage: Hadoop distributed file system Parallel programming model: MPICH [1.4.1] In our test, we select ‘nt’database, which have the size of 45 GB. The network file system used for comparison is PVFS and NFS. The distributed file system is Hadoop file system. 11/28/2018 Dr. Jun Wang @ University of Central Florida

Locality testing (data distribution) Three data distribution with standard deviation values of = 6.05, 7.42 and 8.82 and the average number of data fragments local to each process is 30. For data distribution, a smaller implies the data is more evenly distributed. 11/28/2018 Dr. Jun Wang @ University of Central Florida

Degree of Locality Comparison In all three cases, ScalScheduling achieved slightly higher degrees of data locality computation in all three cases. 11/28/2018 Dr. Jun Wang @ University of Central Florida

Longest Waiting-Assignment In general, as the longest wait time increases, so does the parallel execution time. The figure shows that for MonolithicScheduling the longest waiting- assignment time greatly grows with the increasing number of processes. 11/28/2018 Dr. Jun Wang @ University of Central Florida

Average Delay Time Comparison Much shorter delay times for ScalScheduling than Monolithic Scheduling when the size of data locality information reaches 40KB (data to be processed tasks are around 300GB). Size of data locality information(kb) 11/28/2018 Dr. Jun Wang @ University of Central Florida

Dr. Jun Wang @ University of Central Florida Thank you! 11/28/2018 Dr. Jun Wang @ University of Central Florida