Alex Bolsoy, Jonathan Suggs, Casey Wenner 0/1 Knapsack Problem Alex Bolsoy, Jonathan Suggs, Casey Wenner Casey
Abstract The intent of this project is to examine multiple non-trivial algorithms for the 0/1 knapsack problem. Our project involves testing the effectiveness of simulated annealing, dynamic programming and genetic algorithms. Results will be compared based on value of results and the time and memory consumption of each algorithm. Casey
Introduction Optimization problem. Given a set of items and a knapsack. The knapsack has a weight limit. Each item has a weight value and a dollar value. Place items in the knapsack to maximize the dollar value. However, the combined weight of the items must be less than or equal to the weight limit of the knapsack. Casey
Formal Problem Statement Given a set of n items from 1 to n, each with a weight wi and value vi, along with a maximum weight capacity W, where xi represents number of instances of item i to place in knapsack. = V Source: Wikipedia.org Johnny
Test Set Data Problem instances from Pisinger’s website Uncorrelated instance 100 items - no correlation between the weight and value of items 995 knapsack capacity Correlated instance 100 items - correlation between the weight and value of items 900 knapsack capacity Alex
Context Combinatorial optimization problem. Applications in resource allocation in a variety of industries. Extensively studied problem, early works dating back to 1897. Tobias Dantzig. NP-hard problem. No known algorithm both correct and polynomial in time. David Pisinger. Danish computer science researcher (University of Copenhagen). Extensive research done on problem. 2004 paper titled “Where are the hard knapsack problems?” https://en.wikipedia.org/wiki/Knapsack_problem, - Alex
Experimental Procedure
Simulated Annealing Used to approximate global maximum in fixed amount of time Includes hill climbing, but sometimes accepts worse total value (V), allowing algorithm to explore other options, avoiding settling on local maxima.
Simulated Annealing While Iterating (T > Tmin): Randomly select item to put in knapsack while new knapsack weight > W randomly remove items until knapsack weight <= W and compare new V with previous V If new V is better, accept combination Else worse V, accept or reject new combination based on probability, determined by temperature and differences in previous V and current V
Simulated Annealing Temperature determined by cooling schedule Start at T0, end at Tmin T approaches Tmin logarithmically each iteration Tk = T0/(1 + ln(1+k)) p = e(currentV - previousV)/Tk) If chance < p: accept worse value T = temperature T0 = initial temperature Tmin = final temperature K = iteration p = acceptance probability chance = (0 <= random float <=1)
Simulated Annealing Analysis O(n log k) Polynomial time n - number of items - deepcopy() item list k - number of iterations Memory 2 arrays of length n, 3 lists Solution Quality Uncorrelated Data - optimized to ~ 95% of best V for max W and max n. Correlated Data - optimized to ~ 99.7% of best V for max W and max n.
Simulated Annealing Results Uncorrelated Data Correlated Data
Simulated Annealing Results Uncorrelated Data Correlated Data
Simulated Annealing Results
Simulated Annealing Results Uncorrelated Data
Genetic Create a population of viable solutions Crossover between viable solutions favor higher V solutions, but include some lower V solutions for diversity Mutation A relatively small chance to change random values in a solution Prevents stagnation Sort values by fitness value Do it all again
Uncorrelated Data Set, Static Iteration count List Length value time 10 2287 0.653883 20 4155.26 0.797712 30 5493.738 0.936894 40 7667.735 1.060108 50 8294.397 1.143656 60 8492.328 1.229237 70 8840.333 1.359726 80 8721.483 1.368402 90 8542.775 1.509574 100 8258.172 1.595548
Uncorrelated Data Set, Static Iteration count Capacity value time 100 1794.702 1.282 200 3113.007 1.346841 300 3991.588 1.373803 400 4775.997 1.403519 500 5464.831 1.427031 600 6078.303 1.454958 700 6681.269 1.480598 800 7227.573 1.49989 900 7733.83 1.529559 995 8185.625 1.544789
Correlated Data Set, Static Iteration count; Capacity Value Time 100 380 1.299699 200 760 1.326531 300 1330 1.297973 400 1710 1.328441 500 2280 1.346761 600 2660 1.376863 700 3230 1.340922 800 3610 1.36154 900 3990 1.398523
Correlated Data Set, Static Iteration count; List Size Value Time 10 3800 0.682154 20 3922.74 0.783603 30 3990 0.810822 40 0.901923 50 0.955656 60 1.028064 70 1.113356 80 1.228072 90 1.286974 100 1.406888
Uncorrelated Given Proportional Iteration count List Size value Time 10 2287 1.912397 20 4156 2.340415 30 5500 2.772088 40 7714.038 3.277923 50 8355.862 3.630583 60 8635.286 3.823392 70 9053.4 4.063441 80 9019.122 4.278554 90 8976.315 4.497939 100 8841.299 4.651179
Uncorrelated Given Proportional Iteration count Capacity value time 100 1861.599 3.69363 200 3281.717 3.893114 300 4255.155 4.032618 400 5066.483 4.177177 500 5775.471 4.344271 600 6548.893 4.443439 700 7251.734 4.543662 800 7815.813 4.605462 900 8400.526 4.690472 995 8854.226 4.86759
Genetic Analysis O(O(F) * (O(C) + O(M)) O(F) fitness method n where n is the number of items in a list O(C) cross over method N log (n) where n is the number of items in the data set O(M) mutation method Constant Simplified Big(O); n log(n) polynomial time Memory, a list of; 2(P(N)) Where P is the population size, and n is the number of items Solution quality; Varies significantly given time and type of data.
Dynamic Programming Make a matrix with the values of the items to find the optimal solution. Backtrack over the matrix to determine which items make up the optimal solution. Bottom-up algorithm.
Dynamic Programming
Analysis O(n*W) Memory - multidimensional integer list of size n*W Pseudo-polynomial time Building the matrix Where n = number of items and W = knapsack capacity Memory - multidimensional integer list of size n*W Solution Quality - finds the best solution every time Difficulty - relatively easy to understand and implement in code In computational complexity theory, a numeric algorithm runs in pseudo-polynomial time if its running time is a polynomial in the numeric value of the input (the largest integer present in the input) — but not necessarily in the length of the input (the number of bits required to represent it), which is the case for polynomial time algorithms. In general, the numeric value of the input is exponential in the input length, which is why a pseudo-polynomial time algorithm does not necessarily run in polynomial time with respect to the input length.
Uncorrelated Data Set Input Time Maximum 0.000248 10 0.011598 2287 20 0.000248 10 0.011598 2287 20 0.020618 4156 30 0.031015 5500 40 0.044474 7758 50 0.054495 8373 60 0.064771 8709 70 0.075074 9147 80 0.085088 90 0.096835 100 0.113236 * Input = (Number of Items) Input Time Maximum 0.006473 100 0.008405 2156 200 0.01211 3544 300 0.020438 4452 400 0.027101 5252 500 0.040692 5978 600 0.04709 6826 700 0.056098 7552 800 0.07202 8150 900 0.090205 8719 995 0.109026 9147 * Input = (Knapsack Capacity)
Uncorrelated Data Set Input Input2 Count Time Maximum 1 7.21E-06 100 1 7.21E-06 100 10 2 0.000205 914 200 20 3 0.001252 1915 300 30 4 0.00378 2822 400 40 5 0.010348 4513 500 50 6 0.017388 5447 600 60 7 0.027871 6624 700 70 8 0.04275 7552 800 80 9 0.063355 8150 900 90 0.094119 8719 995 11 0.1102 9147 * Input = (Knapsack Capacity Number of Items)
Correlated Data Set Input Time Maximum 0.000692 10 0.014425 3800 20 0.000692 10 0.014425 3800 20 0.01984 3990 30 0.030385 40 0.040161 50 0.05156 60 0.061502 70 0.072645 80 0.082975 90 0.093334 100 0.105231 * Input = (Number of Items) Input Time Maximum 0.006303 100 0.008214 380 200 0.012702 760 300 0.020595 1330 400 0.031122 1710 500 0.045495 2280 600 0.058799 2660 700 0.070583 3230 800 0.085894 3610 900 0.105579 3990 * Input = (Knapsack Capacity)
Correlated Data Set Input Input2 Count Time Maximum 10 1 7.81E-05 100 10 1 7.81E-05 100 20 2 0.00078581 380 200 30 3 0.00320459 760 300 40 4 0.00689188 1330 400 50 5 0.0138355 1710 500 60 6 0.02534489 2280 600 70 7 0.03827623 2660 700 80 8 0.05996146 3230 800 90 9 0.09006863 3610 900 0.11723887 3990 * Input = (Knapsack Capacity Number of Items)
Comparisons
Comparisons
Comparisons
Comparisons
Interpretation/Conclusions Genetic Slowest Semi-difficult to implement Not very good for Knapsack Problem Simulated Annealing Fastest algorithm - not always the very best answer Difficult to implement Best option for large datasets Dynamic Programming Most accurate algorithm - gets the highest answer every time Runs out of memory on extremely large datasets Relatively easy to implement Best option for most datasets Alex As long as the capacity of the knapsack is less than the size of the population, the dynamic programming will outperform the genetic algorithm. However, once the capacity becomes greater than the size of the population, the dynamic programming number of operations and memory required will be a lot greater than the genetic algorithms ones. http://www.micsymposium.org/mics_2005/papers/paper102.pdf
Future Work Modified simulated annealing algorithm. Try restarts, exponential cooling schedule Modified dynamic programming algorithm. Reduce size of matrix by only calculating necessary elements. Modified genetic algorithm. Develop a more dynamic termination method. Optimize population size and mutation chance Combinations of different algorithms. Apply algorithms to TSP. Johnny
Five Questions What type of problem is the knapsack problem? Optimization problem What is the computational complexity of the optimization form of the knapsack problem? NP-hard What is the Big-O of a dynamic programming algorithm for the knapsack problem? O(n*W), where n is number of items and W is knapsack capacity What is the Big-O of the genetic algorithm for the knapsack problem? n log (n) Which algorithm that we studied is least effective for the knapsack problem? Genetic
Works Cited https://en.wikipedia.org/wiki/Knapsack_problem https://www.sciencedirect.com/science/article/pii/S0362546X01006587?via%3Dihub http://www.dcs.gla.ac.uk/~pat/cpM/jchoco/knapsack/papers/hardInstances.pdf http://artemisa.unicauca.edu.co/~johnyortega/instances_01_KP/ http://hjemmesider.diku.dk/~pisinger/codes.html http://www.micsymposium.org/mics_2005/papers/paper102.pdf https://www.dataminingapps.com/2017/03/solving-the-knapsack-problem-with-a-simple-genetic-algorithm/ https://en.wikipedia.org/wiki/Simulated_annealing http://what-when-how.com/artificial-intelligence/a-comparison-of-cooling-schedules-for-simulated-annealing-artificial-intelligence/ https://www.ida.liu.se/~zebpe83/heuristic/lectures/SA_lecture.pdf