Assoc. Prof. Marc FRÎNCU, PhD. Habil.

Slides:



Advertisements
Similar presentations
Distributed Systems CS
Advertisements

University of Notre Dame
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
FI-WARE – Future Internet Core Platform FI-WARE Cloud Hosting July 2011 High-level description.
Server Platforms Week 11- Lecture 1. Server Market $ 46,100,000,000 ($ 46.1 Billion) Gartner.
Introduction to DoC Private Cloud
Virtualization for Cloud Computing
Virtual Desktop Infrastructure Solution Stack Cam Merrett – Demonstrator User device Connection Bandwidth Virtualisation Hardware Centralised desktops.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Virtualization Concept. Virtualization  Real: it exists, you can see it.  Transparent: it exists, you cannot see it  Virtual: it does not exist, you.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Virtualization Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
Improving Network I/O Virtualization for Cloud Computing.
A Virtual Machine Monitor for Utilizing Non-dedicated Clusters Kenji Kaneda Yoshihiro Oyama Akinori Yonezawa (University of Tokyo)
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
CSC 7600 Lecture 28 : Final Exam Review Spring 2010 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS FINAL EXAM REVIEW Daniel Kogler, Chirag Dekate.
Full and Para Virtualization
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
Private Cloud Stack Deep Dive Enterprise Cloud Summit.
Chapter 9: Networking with Unix and Linux. Objectives: Describe the origins and history of the UNIX operating system Identify similarities and differences.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD
IMPROVEMENT OF COMPUTATIONAL ABILITIES IN COMPUTING ENVIRONMENTS WITH VIRTUALIZATION TECHNOLOGIES Abstract We illustrates the ways to improve abilities.
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Intro To Virtualization Mohammed Morsi
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Introduction to Operating Systems Concepts
Virtualization for Cloud Computing
Virtual Machine Monitors
NFV Compute Acceleration APIs and Evaluation
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Workload Distribution Architecture
Introduction to Distributed Platforms
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Spark Presentation.
Harry Xu University of California, Irvine & Microsoft Research
Hadoop Clusters Tess Fulkerson.
University of Technology
Interactive Website (
Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.
Microsoft Ignite NZ October 2016 SKYCITY, Auckland.
Virtualization Techniques
Outline Virtualization Cloud Computing Microsoft Azure Platform
Specialized Cloud Mechanisms
Hadoop Technopoints.
Overview of big data tools
Cloud computing mechanisms
CSE8380 Parallel and Distributed Processing Presentation
Distributed Systems CS
Cloud Computing Architecture
Prof. Leonardo Mostarda University of Camerino
Cloud-Enabling Technology
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Virtualization Dr. S. R. Ahmed.
A Virtual Machine Monitor for Utilizing Non-dedicated Clusters
Lecture Topics: 11/1 Hand back midterms
Can (HPC)Clouds supersede traditional High Performance Computing?
Presentation transcript:

Assoc. Prof. Marc FRÎNCU, PhD. Habil. marc.frincu@e-uvt.ro Big Data Technologies Lecture 4: Scalability: Algorithm + Data + hardware Assoc. Prof. Marc FRÎNCU, PhD. Habil. marc.frincu@e-uvt.ro

Scalability Ability of a system to manage an increasingly volume of work Capacity of a system to grow to process larger data Ideally by doubling the processing power the volume to be processed doubles as well λ - slope

Scalability Horizontal (in/out) Vertical (up/down) Adding processing nodes to existing ones Commonity clusters Group of networked machines by using Gigabit, Infiniband, Myrinet, … Requires data replication and synchronization mechanisms Vertical (up/down) Adding more resources on existing nodes Virtualization Adding more cores, RAM, disk, etc. to a VM Cloud computing (on demand) Limited by the physical capacity of a node

Virtualization Creates a virtual version of an OS, server, storage device, network, … Allows sharing physical resources amont multiple VMs (multi-tenancy) Enables the installation of hardware independent software Enables the configuration of images usable on a wide range of devices VMs are managed by a hypervisor (VMM) Hardware abstraction OS takes control of the hardware through the VMM

Virtualization Classic software stack Virtualized software stack

Containers Lightweight VMs Emulate the OS interface through native interface No VMM OS offers all the required support Examples: Linux containers, Solaris containers, BSD jails Advantages Fast allocation Performance similar to running on OS Lightweight

Containers

Docker Extension of Linux containers (LXC) Previously named dotCloud namespace Restricts what a container can see cgroups Restricts what a container can use from a resource

Scalability Strong Measuring execution time while keeping data volume constant but increasing the no. of processors Expectation: execution time drops k times if k processors are used Weak Measuring execution time while increasing the no. of processors but keeping the work volume per processor constant Expectation: execution time constant

Scalability Mith The more we parallelize code the faster it runs Ideally 2x resources = 2x faster In reality Code is not 100% parallelizable Communication & IO Resources are limited By adding resources we do get an improvement but it is limited σ – percentage of code not parallelizable

Law of universal scalability The more load the system receives the less work it will perform k – communication penalty coefficient Sweet point There is no purpose to add more resources beyond it

Examples Community detection in social networks Weather forecast

Communication price Communication  low speedup Communication price: More processors  drop in speedup Advantage of hybrid approach Communication price

Communication advantage Example: matrix multiplication OpenMP For small dimensions: advantage of shared memory For large dimensions: application does not scale MPI For small dimensions: communication cost For large dimensions: scalability(throughput, speedup)

Impact of data & algorithm For the same algorithm different data can impact its scalability Example: graph processing Platform Amazon EC2 m3.large (2 Intel Xeon E5-2670 cores, 7.5 RAM,100GB SSD, 1 GB Ethernet) 2 data sets: CARN, WIKI No. of nodes: 3, 6, 9 3 algorithms: Hashtag Aggregation At each step compute a statistics about a given tag in the graf Meme Tracking Analyze meme spread in a graph TDPS (Time Dependent Shortest Path) Used in routing Recompute the shortest path at each step

Impact of data & algorithm CARN Large diameter Node distribution: uniform WIKI Small diameter Node distribution: power law Idea Partition graph on many processors Number of interprocessor edges impacts communication Increasing the no. of partitions reduces scalability due to interprocessor communication (TPDS, MEME)

Impact of data & algorithm Setup I/O Processing (% parallel) Shutdown Example: detect influence spread in parallel on large graphs

Impact of parallel APIs Various MPI implementations

Impact of hardware platform Example: weather forecast (WRF) Bluegene scales well No. procs/speedup ratio

Lecture sources https://www.slideshare.net/vividcortex/quantifyin g-scalability-with-the-usl http://www1.chapman.edu/~radenski/research/pa pers/mergesort-pdpta11.pdf https://arxiv.org/pdf/1012.2273.pdf http://serc.iisc.ernet.in/~simmhan/pubs/simmhan -ipdps-2015.pdf https://books.google.ro/books?id=Jtha3wRWCkQ C&pg=PA485&lpg=PA485 http://lass.cs.umass.edu/~shenoy/courses/spring1 6/lectures/Lec06.pdf https://robinsystems.com/blog/containers-deep- dive-lxc-vs-docker-comparison/

Next lecture Data analysis Heterogeneous vs. homogeneous data Independent Dependent Graphs BSP model Data flows Heterogeneous vs. homogeneous data Processing platforms MapReduce Spark Streaming Apache Giraph