1 Sometimes it Pays to be Greedy: Greedy Algorithms in Economic Epidemiology Fred Roberts, DIMACS.

1 Sometimes it Pays to be Greedy: Greedy Algorithms in Economic Epidemiology Fred Roberts, DIMACS

2 Optimization Problems in Economic Epidemiology Many problems in Economic Epi can be formulated as optimization problems: Find a solution that maximizes or minimizes some value. Find the optimal location for a hospital. Find the optimal assignment of health care workers to jobs. Optimize investment in health care supplies. Minimize the total cost of a series of medical tests or public health interventions. Control an outbreak with as small an investment in vaccines as possible.

3 Greedy Algorithms Often, the simplest approach to an optimization problem is a greedy algorithm: Choose the best (cheapest, highest-rated,…) available alternative at each step. In general, greedy algorithms will find locally optimal solutions, but not globally optimal ones. Local optimum Global optimum

4 Greedy Algorithms We give examples from Economic Epi: – Some where a greedy solution achieves a global optimum – Others where it doesn’t, but we can either make modifications or get a bound on how far from optimal we are.

5 Outline 1.Four Applications of Classical Operations Research Methods 2.Vaccination Strategies for Control of a Highly Infectious Disease Spreading through a Social Network 3.Algorithms for Sequential Public Health or Medical Decision Making

6 Classic Example I: Assigning Health Care Workers to Jobs n workers W 1, W 2, …, W n m jobs J 1, J 2, …, J m We know which workers are qualified to do which jobs and the cost of using each worker. Goal: assign workers to jobs they are qualified for, each to at most one job, filling as many jobs as possible, and among all ways of filling as many jobs as possible, find the way to do it with minimum total cost. This is known as the Minimum Cost Assignment Problem

7 Assigning Health Care Workers to Jobs Greedy Algorithm : At each stage, add the least expensive worker to those getting job assignments if there is an acceptable (feasible) assignment using that worker and all those who have previously been assigned jobs, switching job assignments if necessary. The greedy algorithm always gives an optimal job assignment.

8 Classic Example II: Investing in Health Care Options Suppose we are faced with a selection of health care options in which to invest. Option i has an estimated cost c i and an estimated value v i. – Alternative health care facilities – Alternative supplies for a clinic – Alternative research programs Problem: Determine which ones to invest in so that the total cost is within budget and the total value is as large as possible. AIDS Prevention Options Option 1: Condoms Option 2: Educational Posters Option 3: Clean Needles to Distribute Option 4: Testing Option 5: Funded Researchers

9 Investing in Health Care Options Knapsack Problem Maximize  i v i x i Subject to  i c i x i ≤ B where x i = number of items i chosen Variants x i = 0 or 1 x i  {0, 1, …, b i } Bounded Knapsack Problem x i is any integer Unbounded Knapsack Problem

10 Investing in Health Care Options Greedy Algorithm Due to George Dantzig 1957 Sort items in decreasing order of value per unit cost: v i /c i Pick as many copies of the first item as possible until no more are possible or until one more would violate  i c i x i ≤ B. Continue in the same way with the second item, then the third, etc. For the unbounded knapsack problem, this algorithm always achieves at least half of the value obtained by the optimal solution. Is this acceptable? It depends on the application: do you need a fast decision?

11 Classic Example III: Locating Health Care Facilities We have a number of users of a planned set of health care facilities. Where do we put the facilities and how do we assign a user to a facility?

12 Locating Health Care Facilities There are two costs:  f i = cost of opening a facility at i  c ij = cost of sending user at j to facility at i Let F = sum of f i over all opened facilities. Let C = sum of costs c ij over all users j. We want to minimize F+ C. Assume that there is no limit to the number of facilities we might open. However, there is a tradeoff between increased cost of more facilities and decreased cost of getting to a nearby facility. This is the Uncapacitated Facility Location Problem Uncapacitated since we have no limit on the number of facilities.

13 Locating Health Care Facilities a e d c b f 1 1 1 1 1 1 Given users at red circled locations, where do we locate facilities to minimize F+C? Cost 0.5 Cost 1 Cost 3Cost 4 Cost 5 Cost 6 Numbers on edges are costs of moving along the edge

14 Locating Health Care Facilities Greedy Algorithm Due to Charikar and Guha (2004) First find a preliminary solution S. Order the nodes of the network in order of increasing cost of locating a facility at the node. Choose p so that if S is the set of the first p facilities, then the cost F + C associated with S is as small as possible. Modify the preliminary solution in a series of steps by randomly selecting nodes to add to S and subsets of nodes to remove from S.

15 Locating Health Care Facilities Charikar and Guha show that, given , the algorithm is guaranteed to achieve a cost F+C that is at most 2F* + 3C* +  (F* + C*) in at most O(nlog(n/  ) steps, where F* and C* are costs associated with an arbitrary optimal solution.

16 Classic Example IV: Rerouting Emergency Vehicles in Case of Floods New initiative in Climate and Health at DIMACS.

17 Extreme Events due to Global Warming We anticipate an increase in number and severity of extreme events due to global warming. More heat waves. More floods, hurricanes.

18 Extreme Events due to Global Warming Areas of Emphasis in DIMACS Climate & Health Initiative Evacuations during extreme heat events Rolling power blackouts during extreme heat events Pesticide applications after floods Emergency vehicle rerouting after floods

19 Minimum Spanning Tree Problem 2 8 10 14 16 20 22 A spanning tree is a tree using the edges of the graph and containing all of the nodes. It is minimum if the sum of the numbers on the edges used is as small as possible. Red edges define a minimum spanning tree. 15 26 28

20 Minimum Spanning Tree Problem Minimum spanning trees arise in many applications. One example: Given a road network, find usable roads that allow you to go from any node to any other node, minimizing the lengths of the roads used. This problem arises in the DIMACS Climate and Health project: Find a usable road network for emergency vehicles in case extreme events leave flooded roads.

21 Minimum Spanning Tree Problem Kruskal’s algorithm (greedy algorithm): –List the edges in order of increasing weight. –For each edge, greedily include it if it does not form a cycle with edges already chosen. –Stop when no more edges can be included. Kruskal’s algorithm gives an optimal solution.

22 Vaccination Strategies for Control of a Highly Infectious Disease Spreading through a Social Network Work with Paul Dreyer and Stephen Hartke

23 The Model: Moving From State to State Social Network = Graph Nodes = People Edges = contact SI model Once in infected state, stay there. Times are discrete: t = 0, 1, 2, … t=0,1,2, … = infected = susceptible

24 Disease Process Highly Infectious Disease: You change your state from to at time t+1 if at least one of your neighbors have state at time t. You never leave state.

25 Vaccination Strategies Let’s say you have a limited amount of vaccine available each time period, say v doses. Whom should you vaccinate?

26 Vaccination Strategies More precisely: What vaccination strategy minimizes number of people ultimately infected if a disease breaks out with one infection? Sometimes called the firefighter problem: alternate fire spread and firefighter placement.

27 Some Results on the Firefighter Problem Thanks to Kah Loon Ng DIMACS for some of the following slides, slightly modified by me

28 Three doses of vaccine per time period (v = 3)

29 v = 3

30 v = 3

31 v = 3

32 v = 3

33 v = 3

34 v = 3

35 v = 3

36 Some questions that can be asked (but not necessarily answered!) Can the fire be contained? How many time steps are required before fire is contained? How many firefighters per time step are necessary? What fraction of all nodes will be saved (burnt)? Does where the fire breaks out matter? Fire starting at more than 1 node? Consider different graphs. Construction of (connected) graphs to minimize damage. Complexity/Algorithmic issues

37 Containing Fires in Infinite Grids L d Fire starts at only one node: d= 1: Trivial. d = 2: Impossible to contain the fire with 1 firefighter per time step

38 Containing Fires in Infinite Grids L d d = 2: Two firefighters per time step needed to contain the fire. 8 time steps 18 burnt nodes

39 Containing Fires in Infinite Grids L d Wang and Moeller (2002): If d  3, 2d-1 firefighters per time step are sufficient to contain any outbreak starting at a single node. Hartke 2004: If d  3, 2d – 2 firefighters per time step are not enough to contain an outbreak in L d. Thus, 2d – 1 firefighters per time step is the minimum number required to contain an outbreak in L d and containment can be attained in 2 time steps.

40 Firefighting on Trees Epidemic starts at the root. Number doses of vaccine: v = 1

41 Firefighting on Trees Greedy algorithm : For each node x, define weight (x) = number descendants of x + 1 Algorithm: At each time step, place firefighter at node that has not been saved such that weight (x) is maximized.

42 Firefighting on Trees Firefighting on Trees: 7 8 9 12 11 32416151 2 6 1211 3 111131 26 22

43 Firefighting on Trees GreedyOptimal = 7 = 9

44 Firefighting on Trees Theorem (Hartnell and Li, 2000): For any tree with one fire starting at the root and one firefighter to be deployed per time step, the greedy algorithm always saves more than ½ of the nodes that any algorithm saves.

45 Algorithms for Sequential Public Health or Medical Decision Making A patient presents with certain symptoms. Which test do we do first? On the basis of the outcome of the first test, which test do we do next? Tests are expensive. So are false positive and false negative results. “Cost” is a combination of cost of testing and cost of false results. In what order should we do tests in order to minimize total “cost”?

46 Algorithms for Sequential Public Health or Medical Decision Making We have several potential interventions for a public health crisis. Assume funds limit us to one intervention at a time. Which intervention do we invest in first? On the basis of the outcome of the first intervention, which do we launch next? Interventions are expensive. So are false positive and false negative assessments of the outcome of our interventions. “Cost” is a combination of cost of the intervention and cost of false results. In what order should we launch the interventions in order to minimize total “cost”?

47 Sequential Diagnosis Problem Such sequential diagnosis problems arise in many areas: –Communication networks (testing connectivity, paging cellular customers, sequencing tasks, …) –Manufacturing (testing machines, fault diagnosis, routing customer service calls, …) –Inspecting containers at ports

48 Sequential Decision Making Problem A physician is looking to determine if a patient has disease x. The doctor has a variety of tests to choose from. In the end, the patient is to be classified into one of several categories. Simple case: 0 = “doesn’t have the disease”, 1 = “does have the disease” Testing scheme: specifies which tests are to be made based on previous observations Blood testendoscopyMRI Stress test

49 Sequential Decision Making Problem We are looking to determine if an epidemic can be controlled. We have a variety of interventions to choose from. In the end, the epidemic is to be classified into one of several categories. Simple case: 0 = controllable, 1 = not controllable Intervention scheme: specifies which interventions are to be made based on assessments of previous interventions. H1N1 Virus. Intervention 1: Close Schools if 15% absenteeism Intervention 2: Close Airports Intervention 3: Tamiflu to health care workers Intervention 4: Invest in vaccine.

50 Sequential Decision Making Problem 0’s and 1’s suggest binary digits (bits) Bit String: A sequence of bits: 0001, 1101, … Boolean Function: A function that assigns to each bit string a 0 or a 1. Bit String xB(x) 001 010B(00) = 1, B(10) = 0 100 111

51 Sequential Decision Making Problem Following in language of medical testing. Patients have attributes related to the disease being tested for, each in a number of states Sample attributes: –White blood cell count –PSA –Creatinin clearance –Fever > 40 degrees Centigrade –Severe cough –Severe fatigue

52 Sequential Decision Making Problem Simplest Case: Attributes are in state 0 or 1 (absent or present, higher than threshold or not) Then: Patient corresponds to a bit string like 011001 So: Classification is a decision function F that assigns each bit string to a category. 011001F(011001) If attributes 2, 3, and 6 are present, assign patient to category F(011001).

53 Sequential Decision Making Problem If there are two categories, 0 and 1 (“has disease” or “doesn’t have disease”), the decision function F is a Boolean function. Example: F(000) = F(111) = 1, F(abc) = 0 otherwise This classifies a patient as positive (sick with the disease) iff he has none of the attributes or all of them. 1 =

54 Binary Decision Tree Approach Tests measure presence/absence of attributes: so 0 or 1 Use two categories: 0, 1 (has disease or doesn’t) Binary Decision Tree: –Nodes are tests or categories –Two arcs exit from each test node, labeled left and right. –Take the right arc when test says the attribute is present, left arc otherwise

55 Binary Decision Tree Approach Reach category 1 from the root by: a 0 L to a 1 R a 2 R 1 or a 0 R a 2 R1 Patient classified in category 1 iff he has a 1 and a 2 and not a 0 or a 0 and a 2 and possibly a 1. Corresponding Boolean function: F(111) = F(101) = F(011) = 1, F(abc) = 0 otherwise. Figure 1

56 Binary Decision Tree Approach This binary decision tree corresponds to the same Boolean function F(111) = F(101) = F(011) = 1, F(abc) = 0 otherwise. However, it has one less test node a i. So, it is more efficient if all tests are equally costly and equally likely. Figure 2

57 Binary Decision Tree Approach The problem of finding the “least cost” binary decision tree for a given Boolean function is very hard (NP-complete). For small n = number of attributes, can try to solve it by trying all possible binary decision trees corresponding to the Boolean function F. Even for n = 4, not practical.

58 Binary Decision Tree Approach Promising Approach: Special Assumptions about Boolean Function F Stroud and Saeger (Los Alamos National Lab) enumerate all “complete, monotone” Boolean functions and calculate the least expensive corresponding binary decision trees. Their method practical for n up to 4, not n = 5.

59 Binary Decision Tree Approach Monotone Boolean Functions: Given two bit strings x 1 x 2 …x n, y 1 y 2 …y n Suppose that x i  y i for all i implies that F(x 1 x 2 …x n )  F(y 1 y 2 …y n ). Then we say that F is monotone. Incomplete Boolean Functions: Boolean function F is incomplete if F can be calculated by finding at most n-1 attributes and knowing the value of the input string on those attributes

60 Complete, Monotone Boolean Functions No. of attributes No. CM Bool. Funs. No. BDTs from CM Bool. Funs. No. BDTs 2 24 3 960 4 11411,8081,079,779,602 56,89463,515,9205 x 10 18 Combinatorial Explosion!

61 Cost Functions Stroud-Saeger method applies to more sophisticated cost models, not just cost = number of tests in the BDT. Cost Complication: How many nodes of the decision tree are actually visited during average procedure toward diagnosis? Depends on “distribution” of the disease. Answer can also depend on probability of test errors and probability a patient has the disease.

62 Cost Functions: Unit Costs Tree Utilization Assume we are given probability of test errors for different tests and a priori probability a patient has the disease. This allows us to calculate “expected” cost of utilization of the tree C util. It also allows us to calculate probability of false positive and probability of false negative.

63 Cost Functions OTHER COSTS: Cost of false positive: Cost of additional tests. –If it means beginning a series of treatments, it could be expensive, not to mention psychological cost to patient. Cost of false negative: –Complex issue. –What is cost of patient going untreated?

64 Cost Function used for Evaluating the Decision Trees C Tot = C FalsePositive *P FalsePositive + C FalseNegative *P FalseNegative + C util C FalsePositive is the cost of false positive (Type I error) C FalseNegative is the cost of false negative (Type II error) P FalsePositive is the probability of a false positive occurring P FalseNegative is the probability of a false negative occurring C util is the expected cost of utilization of the tree. P FalsePositive and P FalseNegative are calculated from the tree. C util is calculated from tree and probability of disease and probability of test errors. C FalsePositive, C FalseNegative are input – given information.

65 Stroud Saeger Results Using this cost function C tot, Stroud-Saeger found an algorithm for enumerating all BDTs coming from complete, monotone Boolean functions and then ranking all trees in terms of their total costs. The method is feasible for n = 3 or 4 types of tests, not for n > 4. D. Madigan, S. Mittal, F. Roberts: A new approach: Searching through a Generalized Tree Space Idea: Sometimes adding more possibilities results in being able to do more efficient searches. We expand the space of trees from those corresponding to Stroud and Saeger’s “Complete and Monotonic” Boolean Functions to “Complete and Monotonic” BDTs.

66 CM Trees Monotonic Decision Trees –A binary decision tree will be called monotonic if all the left leaves are class “0” and all the right leaves are class “1”. Complete Decision Trees –A binary decision tree will be called complete if every type of test occurs at least once in the tree and, at any non-leaf node in the tree, its left and right sub-trees are not identical. CM Tree = complete, monotonic BDT

67 The CM Tree Space No. of attributes Distinct BDTs Trees From CM Boolean Functions Complete, Monotonic BDTs 27444 316,43060114 41,079,779,60211,80866,600 complete, monotonic BDTs

68 Tree Neighborhood and Tree Space Define tree neighborhood by giving four operations for moving from one tree in CM Tree Space to another. We have developed an algorithm for finding low- cost BDTs by searching through CM Tree Space from a tree to one of its neighbors.

69 Tree Space Traversal Naïve Idea: Greedy Search 1.Randomly start at any tree in the CM tree space 2.Find its neighboring trees using the four operations 3.Move to the neighbor with the lowest cost 4.Iterate until we find a minimum –Problem: The CM Tree space is highly multi- modal (more than one local minimum)! –Therefore, we implement a stochastic search algorithm with simulated annealing to find the best tree – a variant of the greedy algorithm.

70 Results: Searching CM Tree Space We were able to perform experiments for 3, 4 and 5 tests, successfully; significantly faster than existing methods of searching through BDTs obtained from complete, monotonic Boolean functions. Results show improvement compared to existing extensive search methods. –They found the optimal tree almost half the time –They often found a less costly tree than the best tree arising from a complete, monotone Boolean function.

71 Conclusion: Sometimes it Pays to be Greedy

1 Sometimes it Pays to be Greedy: Greedy Algorithms in Economic Epidemiology Fred Roberts, DIMACS.

Similar presentations

Presentation on theme: "1 Sometimes it Pays to be Greedy: Greedy Algorithms in Economic Epidemiology Fred Roberts, DIMACS."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Sometimes it Pays to be Greedy: Greedy Algorithms in Economic Epidemiology Fred Roberts, DIMACS.

Similar presentations

Presentation on theme: "1 Sometimes it Pays to be Greedy: Greedy Algorithms in Economic Epidemiology Fred Roberts, DIMACS."— Presentation transcript:

Similar presentations

About project

Feedback