Efficient and Effective Practical Algorithms for the Set-Covering Problem Qi Yang, Jamie McPeek, Adam Nofsinger Department of Computer Science and Software.

Slides:



Advertisements
Similar presentations
Lecture 15. Graph Algorithms
Advertisements

Recap: Mining association rules from large datasets
Lecture 24 MAS 714 Hartmut Klauck
A backtrack data structure and algorithm
Traveling Salesperson Problem
1 Maximum flow sender receiver Capacity constraint Lecture 6: Jan 25.
Certification of Computational Results Greg Bronevetsky.
FPGA Technology Mapping Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
The Heap ADT In this section of notes you will learn about a new abstract data type, the heap, as well how heaps can be used.
Reducing the collection of itemsets: alternative representations and combinatorial problems.
Discussion #36 Spanning Trees
© 2006 Pearson Addison-Wesley. All rights reserved14 A-1 Chapter 14 Graphs.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
CS 206 Introduction to Computer Science II 10 / 14 / 2009 Instructor: Michael Eckmann.
Object (Data and Algorithm) Analysis Cmput Lecture 5 Department of Computing Science University of Alberta ©Duane Szafron 1999 Some code in this.
Heaps & Priority Queues Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
CS 206 Introduction to Computer Science II 10 / 29 / 2008 Instructor: Michael Eckmann.
Transforming Infix to Postfix
CSE 589 Applied Algorithms Course Introduction. CSE Lecture 1 - Spring Instructors Instructor –Richard Ladner –206.
By: Jamie McPeek. 1. Background Information 1. Metasearch 2. Sets 3. Surface Web/Deep Web 4. The Problem 5. Application Goals.
University of British Columbia CPSC 111, Intro to Computation 2009W2: Jan-Apr 2010 Tamara Munzner 1 Yet More Array Practice Lecture 24, Mon Mar
Graphs & Graph Algorithms 2 Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Chapter 9: Graphs Spanning Trees Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
CSE 589 Applied Algorithms Spring Colorability Branch and Bound.
Teaching Teaching Discrete Mathematics and Algorithms & Data Structures Online G.MirkowskaPJIIT.
CS223 Algorithms D-Term 2013 Instructor: Mohamed Eltabakh WPI, CS Introduction Slide 1.
Computer Science and Software Engineering University of Wisconsin - Platteville 12. Heap Yan Shi CS/SE 2630 Lecture Notes Partially adopted from C++ Plus.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Week -7-8 Topic - Graph Algorithms CSE – 5311 Prepared by:- Sushruth Puttaswamy Lekhendro Lisham.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
CHP-4 QUEUE.
Design and Analysis of Algorithms CSC201 Shahid Hussain 1.
Java Collections An Introduction to Abstract Data Types, Data Structures, and Algorithms David A Watt and Deryck F Brown © 2001, D.A. Watt and D.F. Brown.
Course Web Page Most information about the course (including the syllabus) will be posted on the course wiki:
Chapter 3 Sec 3.3 With Question/Answer Animations 1.
Higher Grade Computing Studies 4. Standard Algorithms Higher Computing Software Development S. McCrossan 1 Linear Search This algorithm allows the programmer.
Standard Algorithms –search for an item in an array –count items in an array –find the largest (or smallest) item in an array.
Lecture 10 Applications of NP-hardness. Knapsack.
Boolean Minimizer FC-Min: Coverage Finding Process Petr Fišer, Hana Kubátová Czech Technical University Department of Computer Science and Engineering.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Spanning Trees CSIT 402 Data Structures II 1. 2 Two Algorithms Prim: (build tree incrementally) – Pick lower cost edge connected to known (incomplete)
1 Binary Search Trees (Continued) Study Project 3 Solution Balanced Binary Search Trees Balancing Operations Reading: L&C 11.1 – 11.4.
SNU OOPSLA Lab. 1 Great Ideas of CS with Java Part 1 WWW & Computer programming in the language Java Ch 1: The World Wide Web Ch 2: Watch out: Here comes.
Priority Queues Two kinds of priority queues: Min priority queue. Max priority queue. Nov 4,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
MA/CSSE 473 Days Answers to student questions Prim's Algorithm details and data structures Kruskal details.
Some Computation Problems in Coding Theory
Minimum Spanning Trees CSE 373 Data Structures Lecture 21.
Survivable Paths in Multilayer Networks Marzieh Parandehgheibi Hyang-won Lee Eytan Modiano 46 th Annual Conference on Information Sciences and Systems.
Computer Science: A Structured Programming Approach Using C1 8-7 Two-Dimensional Arrays The arrays we have discussed so far are known as one- dimensional.
MA/CSSE 473 Day 34 MST details: Kruskal's Algorithm Prim's Algorithm.
1 22c:31 Algorithms Minimum-cost Spanning Tree (MST)
Matrix Multiplication The Introduction. Look at the matrix sizes.
CHAPTER 51 LINKED LISTS. Introduction link list is a linear array collection of data elements called nodes, where the linear order is given by means of.
CS 721 Project Implementation of Hypergraph Edge Covering Algorithms By David Leung ( )
Minimum Spanning Tree Graph Theory Basics - Anil Kishore.
The Set-covering Problem Problem statement –given a finite set X and a family F of subsets where every element of X is contained in one of the subsets.
EGR 115 Introduction to Computing for Engineers
Lesson 5-15 AP Computer Science Principles
Greedy Algorithms / Minimum Spanning Tree Yin Tat Lee
Definition In simple terms, an algorithm is a series of instructions to solve a problem (complete a task) We focus on Deterministic Algorithms Under the.
MA/CSSE 473 Day 33 Student Questions Change to HW 13
Minimum Spanning Trees
What is Computer Science About? Part 2: Algorithms
Approximation Algorithms for the Selection of Robust Tag SNPs
A Few Sample Reductions
Chapter 9: Graphs Spanning Trees
Presentation transcript:

Efficient and Effective Practical Algorithms for the Set-Covering Problem Qi Yang, Jamie McPeek, Adam Nofsinger Department of Computer Science and Software Engineering University of Wisconsin at Platteville

The Set-Covering Problem Given N sets, let X be the union of all the sets. A cover of X is a group of sets from the N sets such that every element of X belongs to a set in the group. The set-covering problem is to find a cover of X of the minimum size.

Matrix Representation of the Set- covering Problem abcde f S S S S Number of sets: N = 4 Number of elements: M = 6 One cover: S1, S3, S4 One minimal cover: S1, S3 Not a cover: S1, S2, S4 (a is not covered)

NP-Hard Problem Introduction to Algorithms by T. H. Cormen, C.E. Leiserson, R. L. Rivest The Set-covering problem has been proved to be NP hard A Greedy Algorithm

Algorithm Greedy ResultCover : The minimum cover to be found. Uncovered : The set of elements not covered yet. 1. Set ResultCover to the empty set 2. Set Uncovered to the union of all sets 3. While Uncovered is not empty a. select a set S that is not in ResultCover and covers the most elements of Uncovered b. add S to ResultCover c. remove all elements of S from Uncovered

Algorithm Check And Remove (CAR) Identifying Redundant Search Engines in a Very Large Scale Metasearch Engine Context 8th ACM International Workshop on Web Information and Data Management The set-covering problem is equivalent to the problem of identifying redundant search engines on the Web Algorithm CAR is much faster than Algorithm Greedy

Algorithm CAR (Check And Remove) 1. Set ResultCover to the empty set 2. For each set S a. determine if S has an element that is not covered by ResultCover b. add S to ResultCover if S has such an element c. exit the for loop if ResultCover is a cover of X 3. For each set S in ResultCover a. determine if S has an element that is not covered by any other set of ResultCover b. Remove S from ResultCover if S has no such an element

Example abcde f S S S S Set ResultCover UnCovered {} {a, b, c, d, e, f} S1 {S1} {a, d, f} S2 {S1, S2} {a, f} S3 {S1, S2, S3} {} Removing S2 {S1, S3} {}

Time Complexity Algorithm Greedy O(M * N * min(M, N)) Algorithm CAR O(M * N) N: number of sets M: number of elements of the union X

CPU Time Actual Cover Size CPU Time (Sec) Greedy CAR

Cover Size Cover Sizes of the Two Algorithms Actual Greedy CAR

Implementation Details Read data Binary search tree BitMap indicating which sets cover an element Convert the tree to an array of BitMaps Matrix representation of the set-cover problem Find a cover

Binary Search Tree and BitMap element Number of sets (N) is known Number of elements of each set is known The total number of elements is unknown Reading elements of one set at a time BitMap size N which sets cover the element a column of the matrix

Array of Column BitMaps element Row Operations Find the number of elements in a set that are not covered by the result cover Determine if a set contains an element that is not covered by the result cover Determine if a set in the result cover has an element that is not covered by any other sets in result cover … e 1 e 2 e 3 e 4 e m-1 e m

Array of Row BitMaps element It takes some time to convert column BitMaps to row BitMaps. But all row operations are performed within a row BitMap.

CPU Time Running Times (seconds) of the Greedy Algorithm Col Row Running Times (seconds) of the CAR algorithm Col Row The CPU time includes the time to convert column BitMaps to row BitMaps, but not the time to build the tree.

CPU Time (Row BitMap) Running Times (seconds) of the Two algorithms Greed CAR

Algorithm Greedy 1. Set ResultCover to the empty set 2. Set Uncovered to the union of all sets 3. While Uncovered is not empty a. select a set S that is not in ResultCover and covers the most elements of Uncovered b. add S to ResultCover c. remove all elements of S from Uncovered

Algorithm Greedy Update UncoveredCount: number of elements of a set not covered by ResultCover 1. Set ResultCover to the empty set 2. Set Uncovered to the union of all sets 3. For each set, set the UncoveredCount to the size of the set 4. While Uncovered is not empty a.select a set that has the largest value of UncoveredCount among all sets not in ResultCover b.add the set to ResultCover c.remove all elements of the set from Uncovered d.update the value of UncoveredCount for each set not in ResultCover

Update Uncovered Count For each element in the set to be added to the ResultCover If the result cover does not covers it For each set not in the result cover If the set contains the element uncovered count is decremented by one

Time Complexity Algorithm Greedy O(M * N * min(M, N)) Algorithm CAR O(M * N) Algorithm Greedy Update O(M * N)

CPU Time Running Times (seconds) of the Two algorithms Update CAR

Algorithm List And Remove (LAR) Implemented the matrix using linked list instead of array of BitMaps Algorithm Update plus the remove phase from algorithm CAR

Linked List for Matrix e1e2e3e4e5e6e7 S5 S4 S3 S2 S1

CPU Time Running Times (seconds) of the Two algorithms LAR CAR

Cover Size Cover Sizes of the Two algorithms LAR CAR

Cover Size (Different Data Sets) Cover Sizes of the Two algorithms Actual LAR CAR

Summary Algorithm LAR runs faster than Algorithm CAR Algorithm LAR generates smaller cover sets than Algorithm CAR Algorithm: Updating vs. searching every time Data Structure: Link list vs. array of BitMaps

Questions?