By Christos Giavroudis Dissertation submitted in partial fulfilment for the degree of Master of Science in Communication & Information Systems Department.

Slides:



Advertisements
Similar presentations
MEMORY MANAGEMENT Y. Colette Lemard. MEMORY MANAGEMENT The management of memory is one of the functions of the Operating System MEMORY = MAIN MEMORY =
Advertisements

Math for Liberal Studies. There is a list of numbers called weights These numbers represent objects that need to be packed into bins with a particular.
 Review: The Greedy Method
Parallel Computing in Matlab
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
The Efficiency of Algorithms
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Assignment of Different-Sized Inputs in MapReduce Shantanu Sharma 2 joint work with Foto N. Afrati 1, Shlomi Dolev 2, Ephraim Korach 2, and Jeffrey D.
Chapter 10 Operating Systems.
The number of edge-disjoint transitive triples in a tournament.
Reference: Message Passing Fundamentals.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
A Grid-enabled Branch and Bound Algorithm for Solving Challenging Combinatorial Optimization Problems Authors: M. Mezmaz, N. Melab and E-G. Talbi Presented.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Chapter 10 Operating Systems *. 2 Chapter Goals Describe the main responsibilities of an operating system Define memory and process management Explain.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
UNIT - 1Topic - 2 C OMPUTING E NVIRONMENTS. What is Computing Environment? Computing Environment explains how a collection of computers will process and.
Approximation schemes Bin packing problem. Bin Packing problem Given n items with sizes a 1,…,a n  (0,1]. Find a packing in unit-sized bins that minimizes.
Spring 2015 Mathematics in Management Science Bin Packing The Problem The Algorithms.
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.
Analysis of Algorithms
DLS on Star (Single-level tree) Networks Background: A simple network model for DLS is the star network with a master-worker platform. It consists of a.
Parallel Computing with Matlab CBI Lab Parallel Computing Toolbox TM An Introduction Oct. 27, 2011 By: CBI Development Team.
Chapter 10 Operating Systems.
Block1 Wrapping Your Nugget Around Distributed Processing.
Expanding the CASE Framework to Facilitate Load Balancing of Social Network Simulations Amara Keller, Martin Kelly, Aaron Todd.
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
Planning and Scheduling Chapter 3 Jennifer Holland February 2, 2010.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Notes 5IE 3121 Knapsack Model Intuitive idea: what is the most valuable collection of items that can be fit into a backpack?
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Data Structures and Algorithms Lecture 1 Instructor: Quratulain Date: 1 st Sep, 2009.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Asanka Herath Buddhika Kottahachchi
Optimizing Pheromone Modification for Dynamic Ant Algorithms Ryan Ward TJHSST Computer Systems Lab 2006/2007 Testing To test the relative effectiveness.
SNU OOPSLA Lab. 1 Great Ideas of CS with Java Part 1 WWW & Computer programming in the language Java Ch 1: The World Wide Web Ch 2: Watch out: Here comes.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
Thesis: On the development of numerical parallel algorithms for the insetting procedure Master of Science in Communication & Information Systems Department.
Contiguous Memory Allocation Contiguous Memory Allocation  One of the simplest methods for allocating memory is to divide memory into.
The bin packing problem. For n objects with sizes s 1, …, s n where 0 < s i ≤1, find the smallest number of bins with capacity one, such that n objects.
Computability NP complete problems. Space complexity. Homework: [Post proposal]. Find PSPACE- Complete problems. Work on presentations.
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.
Searching Topics Sequential Search Binary Search.
Concurrency and Performance Based on slides by Henri Casanova.
Bin Packing. 2 Background: Suppose you plan to build a wall system for your books, records, and stereo set in your dorm room. The wall system requires.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Chapter 2 Memory and process management
Distributed Network Traffic Feature Extraction for a Real-time IDS
Analysis of Algorithms
Network Load Balancing
I/O Resource Management: Software
Integration of Singularity With Makeflow
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
April 30th – Scheduling / parallel
CSE8380 Parallel and Distributed Processing Presentation
Distributed computing deals with hardware
Discrete Mathematics CMP-101 Lecture 12 Sorting, Bubble Sort, Insertion Sort, Greedy Algorithms Abdul Hameed
Vocabulary Algorithm - A precise sequence of instructions for processes that can be executed by a computer Low level programming language: A programming.
Bin Packing Michael T. Goodrich Some slides adapted from slides from
Presentation transcript:

By Christos Giavroudis Dissertation submitted in partial fulfilment for the degree of Master of Science in Communication & Information Systems Department of Informatics & Communications TEI of Central Macedonia

 Suppose you need to place small objects in large containers with fixed size in order to have the minimum possible space, to use the fewest containers and the whole process can be done as soon as possible.  It is an optimization problem and belongs to combinatorial NP-hard problem

 There is a list of numbers called “weights”  These numbers represent objects that need to be packed into “bins” with a particular capacity  The goal is to pack the weights into the smallest number of bins possible

They have many applications, such as:  Objects coming down a conveyor belt need to be packed for shipping  A construction plan calls for small boards of various lengths, and you need to know how many long boards to order  Tour groups of various sizes need to be assigned to busses so that the groups are not split up  Placing computer files with specified sizes into memory blocks of fixed size.  the recording of a composer’s music where the length of the piece to be recorded are the weight and the bin capacity is the amount of time that can be stored on audio CD ( 80 minutes) and so on.

 There are many variations of this problem such as 1-D, 2-D, 3-D, linear programming packing by weight, packing by cost and so on. Bin Packing Problem, because of the high diversity, encloses many area of our lives.

 In literature there have been developed many heuristics algorithms to solve the Bin Packing Problem. Briefly, we mention that there are two major categories: the classified methods and the unclassified methods.  The classified are Next Fit (NF), First Fit (FF), Best Fit (BF) and the classified are Next Fit Decreasing (NFD), First Fit Decreasing (FFD), Best Fit Decreasing (BFD). Select to be presented in this thesis Best Fit Decreasing (BFD).

 Pack the weights 5, 7, 3, 5, 6, 2, 4, 4, 7, and 4 into bins with capacity 10

 There are many possible solutions

 Pack the weights 5, 7, 3, 5, 6, 2, 4, 4, 7, and 4 into bins with capacity 10  There are many possible solutions

 Pack the weights 5, 7, 3, 5, 6, 2, 4, 4, 7, and 4 into bins with capacity 10  There are many possible solutions

 We saw a solution with 5 bins  Is that the best possible solution?  If we add up all the weights, we get = 47  Lower bound is 47/10 =4.7 where 10 is the capacity and 47 is the sum of all weights.  So, the best we can hope for is 5 bins.

 One heuristic method for packing weights is to look for these “best fits”  Consider all of the bins and find the bin that can hold the weight and would have the least room leftover after packing it.  One-at-a time algorithm  We have to decide what to do with each weight, in order, before moving on to the next one

 Let’s start over and use the best fit decreasing algorithm  Sort the list of weights from the biggest to the smallest

 This time we need to keep track of how much room is left in each bin. When we consider a weight, we look at all the bins that have room for it…

 …and put it into the bin that will have the least room left over  In this case, we only have one bin, so the 7 goes in there.

 For the next weight, we don’t have a bin that has room for it, so we make a new bin 3

 So, we make a new bin

 None of our bins have room for our next weight  So we make a new bin

 None of our bins have room for the next weight  So we make a new bin. 5

 Bin #4 has the least room left over, so that’s where we put our next weight

 Bin #3 has the least room left, so that’s where we put our next weight

 The next weight doesn’t fit into any of our bins, so we need to make a fifth bin

 The next weight fit into Bin#5  The next weight can go into either Bin#1or Bin#2.

 The last weight fits exactly into Bin#5.  The lower bound is: =47/10=4.7  We got the best solution. This is called an optimal solution.

 If the list of weights is very long, or if the bin capacity is very large, this can be impractical.  The weights are tasks that need to be completed  The bins are “processors,” which are the agents (people, machines, teams, etc.) that will actually perform the tasks

 Let’s start over and use the best decreasing fit algorithm in parallel implementation and using two cores with the same example.  Each core uses a bin.  The data set is divided into two parts with a cyclic data partition.  Each piece of data is mapped into a core (bin).

 We have the same example  Sort the list of weights from the biggest to the smallest.

 This time we need to separate the data set into two parts.

 When we consider a weight, we look at all bins that have room for it, and put it into the bin that will have the least room left.

 Now, we look the next weight separate of our data parts. None of our bins have room for the next weight.

 So we make a new bin both of data partition.

 Now, we put it into new bins per core that will have the least room left.

 we examine next weights separate and simultaneous.

 We put them into Bin#3 and Bin#4 respectively because these bins will have the least room left.

 After that, we examine again next weights separate and simultaneous.

 So we make new bins and put them into.

 The last weights have row. Bin #1 has the least room left over for the first part, so that’s where we put our weight. Alike, Bin#2 has the least room left over for the second part, so that’s where we put our weight

 We have the final result below

 This is a different result from the best fit decreasing algorithm. This is called also an optimal solution. The lower bound for the first part is =24/10=2.4. Alike for the second part the lower bound is =23/10=2.3.  We notice that the parallel implementation of algorithm use one more bin than serial, but it is executed in less steps.

 Will the serial execution of algorithm or the parallelization of algorithm be beneficial to us? From the previous examples, we can consider that:  For small datasets is preferable to use the serial algorithm.  For huge datasets is preferable to use the parallel algorithm.

 But, you should run the algorithm serial and then in parallel on different datasets.  In parallel execution, we will execute each dataset to varying number of processors in order to compare our results.  Only then, we say whether parallel processing is a beneficial or not.

 Parallel computing is the use of multiple processors to execute different parts of the same program concurrently.  A parallel computer is a collection of processing elements, that can solve big problems quickly by means of well coordinated collaboration.

 The implementation and the execution of algorithms made in the laboratory "Parallel and Distributed Processing" at the premises of the department of Computer Engineering of the Technological educational institution of Central Macedonia.  For the run of Algorithm computers were used (total 32 Processors), each processor has a 2.4 GHz. We used the software package MATLAB (version 6.2).

To run applications in MATLAB Distributed Computing Server must be done the following steps:  Installation of the MDCE  Run of MDCE  Set up of the network with the admincenter

 First, we set the current directory as \MATLAB\R2010a\toolbox\distcomp\bin  Afterwards, we run the in command window of MATLAB the following command !mdce install  Finally, we enable the MDCE in command window of MATLAB with the following command !mdce start

 We run the command admincenter outside of the MATLAB environment.  This commands occurs in directory  \MATLAB\R2010a\toolbox\distcomp\bin

 We select the menu Add or Find to define the Hosts of the network.

 We create a Jobmanager by the menu Start in Jobmanager part, which is called “BinPackingMan”. In this JobManager, we set up eight hostnames with 4 cores per hostname (PC).

 Close the admincenter window and we set by the menu parallel the Jobmanager.

 Parallel Computing Toolbox™ lets us solve computationally and data-intensive problems using multicore processors, GPUs, and computer clusters.  Totally, we had 32 cores. Then, we could run and executed Best Fit Decreasing algorithm parallel. We took the following useful conclusions on the execution time, the total wastage, the total number of bins used, the speed up and the efficiency on different size of datasets.

function bfd(numberOfPackages,binSize,totalBins,package,totalWastage) tic i=1; while i<=numberOfPackages currentPackage=0; while(currentPackage<binSize && i<=numberOfPackages) min=300; position=0; %position of minimun wastage for j=totalBins:-1:1 if wastage(j)-package(i)>=0 && wastage(j)-package(i)<min min=wastage(j)-package(i); position=j; end a=binSize-(currentPackage+package(i)); if a>=0 && a<min position=0; end if position~=0 && wastage(position)~=0 wastage(position)=wastage(position)-package(i); i=i+1; else if currentPackage+package(i)<=binSize currentPackage=currentPackage+package(i); i=i+1; else break end totalBins=totalBins+1; wastage(totalBins)=binSize-currentPackage; end

for i=1:totalBins totalWastage=totalWastage+wastage(i); end timerValue=toc; str=sprintf('Total wastage: %d', totalWastage); disp(str) str=sprintf('Total bins used: %d', totalBins); disp(str) str=sprintf('Total time elapsed: %f', timerValue); disp(str) str=sprintf('Average values: %f', mean(package)); disp(str) str=sprintf('Standard deviation: %f\n', std(package)); disp(str) end

clear; clc; binSize=100; % stable size of bin for i=[2^10 2^12 2^14 2^16] numberOfPackages=i; % number of package for packing packarray=randi(binSize- 1,numberOfPackages,1); %table of packages random size package=sort(packarray); %sort data table str=sprintf('dataset_%d',i); save(str); end

function a=data_partition(labs,package) n=length(package); for i=1:labs for j=1:n/labs k=j+1; a(i,j)=package(labs*(k-1- mod(i,labs)); end

function c=calc_wastage(binSize,data) for j=1:length(data)/labs k=j+1; min=300; c(i,j)=min+sub_package(labs*(k-1)- mod( i,labs)); end

str=sprintf('-Best Fit Decreasing Algorithm-\n \n'); disp(str); str1=sprintf('dataset_%d',2^10); load(str1); totalBins=0; %total number bins used totalWastage=0; %total wastage bfd(numberOfPackages,binSize,totalBins,package,totalWastage) n=length(package); for tloop=1:10 for labs=[1,2,4,8,16,32] matlabpool('open', 'local', labs); tic totalBins=0; %total number bins used totalWastage=0; %total wastage sub_package=data_partition(labs,package); str=sprintf('----Distributed Best Fit Decreasing Alogorithm for clones:---%d',labs); disp(str) part=n/labs;

spmd(labs) for p=1:labs temp=sub_package(p,:); if (labindex==p) bfd(part,100,0,temp,0); end tt(tloop,labs)=toc; matlabpool('close'); end for labs=[1,2,4,8,16,32] at(labs)=(sum(tt(:,labs))-max(tt(:,labs))-min(tt(:,labs)))/8 end save results_2_10

The command matlabpool enables the parallel language features in the MATLAB language by starting a parallel job that connects this MATLAB client with a number of labs. matlabpool('open', 'local', labs); starts a worker pool using the local parallel configuration and the number of the available labs is labs. matlabpool('close'); stops the worker pool, destroys the parallel job, and makes all parallel language features revert to using the MATLAB client for computing their results.

Total wastage Datasetserialcore 1core 2core 4core 8core 16 2^10 dataset ^12 dataset ^14 dataset ^16 dataset

Total used Bins Datasetserialcore 1core 2core 4core 8core 16 2^10 dataset ^12 dataset ^14 dataset ^16 dataset

Total execution time Datasetcore 1core 2core 4core 8core 16 core 32 2^10 dataset0,10820,0930,1050,12380,15840,2307 2^12 dataset0,28830,14180,12570,14530,19530,2763 2^14 dataset3,10,860560,33770,2570,34590,5208 2^16 dataset47,755912,12523,4051,24791,05731,5077

Thank you!