Presentation is loading. Please wait.

Presentation is loading. Please wait.

Assoc. Prof. Marc FRÎNCU, PhD. Habil.

Similar presentations


Presentation on theme: "Assoc. Prof. Marc FRÎNCU, PhD. Habil."— Presentation transcript:

1 Assoc. Prof. Marc FRÎNCU, PhD. Habil. marc.frincu@e-uvt.ro
Big Data Technologies Lecture 4: Scalability: Algorithm + Data + hardware Assoc. Prof. Marc FRÎNCU, PhD. Habil.

2 Scalability Ability of a system to manage an increasingly volume of work Capacity of a system to grow to process larger data Ideally by doubling the processing power the volume to be processed doubles as well λ - slope

3 Scalability Horizontal (in/out) Vertical (up/down)
Adding processing nodes to existing ones Commonity clusters Group of networked machines by using Gigabit, Infiniband, Myrinet, … Requires data replication and synchronization mechanisms Vertical (up/down) Adding more resources on existing nodes Virtualization Adding more cores, RAM, disk, etc. to a VM Cloud computing (on demand) Limited by the physical capacity of a node

4 Virtualization Creates a virtual version of an OS, server, storage device, network, … Allows sharing physical resources amont multiple VMs (multi-tenancy) Enables the installation of hardware independent software Enables the configuration of images usable on a wide range of devices VMs are managed by a hypervisor (VMM) Hardware abstraction OS takes control of the hardware through the VMM

5 Virtualization Classic software stack Virtualized software stack

6 Containers Lightweight VMs
Emulate the OS interface through native interface No VMM OS offers all the required support Examples: Linux containers, Solaris containers, BSD jails Advantages Fast allocation Performance similar to running on OS Lightweight

7 Containers

8 Docker Extension of Linux containers (LXC) Previously named dotCloud
namespace Restricts what a container can see cgroups Restricts what a container can use from a resource

9 Scalability Strong Measuring execution time while keeping data volume constant but increasing the no. of processors Expectation: execution time drops k times if k processors are used Weak Measuring execution time while increasing the no. of processors but keeping the work volume per processor constant Expectation: execution time constant

10 Scalability Mith The more we parallelize code the faster it runs
Ideally 2x resources = 2x faster In reality Code is not 100% parallelizable Communication & IO Resources are limited By adding resources we do get an improvement but it is limited σ – percentage of code not parallelizable

11 Law of universal scalability
The more load the system receives the less work it will perform k – communication penalty coefficient Sweet point There is no purpose to add more resources beyond it

12 Examples Community detection in social networks Weather forecast

13 Communication price Communication  low speedup Communication price:
More processors  drop in speedup Advantage of hybrid approach Communication price

14 Communication advantage
Example: matrix multiplication OpenMP For small dimensions: advantage of shared memory For large dimensions: application does not scale MPI For small dimensions: communication cost For large dimensions: scalability(throughput, speedup)

15 Impact of data & algorithm
For the same algorithm different data can impact its scalability Example: graph processing Platform Amazon EC2 m3.large (2 Intel Xeon E cores, 7.5 RAM,100GB SSD, 1 GB Ethernet) 2 data sets: CARN, WIKI No. of nodes: 3, 6, 9 3 algorithms: Hashtag Aggregation At each step compute a statistics about a given tag in the graf Meme Tracking Analyze meme spread in a graph TDPS (Time Dependent Shortest Path) Used in routing Recompute the shortest path at each step

16 Impact of data & algorithm
CARN Large diameter Node distribution: uniform WIKI Small diameter Node distribution: power law Idea Partition graph on many processors Number of interprocessor edges impacts communication Increasing the no. of partitions reduces scalability due to interprocessor communication (TPDS, MEME)

17 Impact of data & algorithm
Setup I/O Processing (% parallel) Shutdown Example: detect influence spread in parallel on large graphs

18 Impact of parallel APIs
Various MPI implementations

19 Impact of hardware platform
Example: weather forecast (WRF) Bluegene scales well No. procs/speedup ratio

20 Lecture sources g-scalability-with-the-usl pers/mergesort-pdpta11.pdf -ipdps-2015.pdf C&pg=PA485&lpg=PA485 6/lectures/Lec06.pdf dive-lxc-vs-docker-comparison/

21 Next lecture Data analysis Heterogeneous vs. homogeneous data
Independent Dependent Graphs BSP model Data flows Heterogeneous vs. homogeneous data Processing platforms MapReduce Spark Streaming Apache Giraph


Download ppt "Assoc. Prof. Marc FRÎNCU, PhD. Habil."

Similar presentations


Ads by Google