The computation of hitting sets: Review and new algorithms

The computation of hitting sets: Review and new algorithms
Li Lin , Yunfei Jiang Department of Mathematics, Jinan University, Guangzhou, PR China Institute of Computer Software, Sun Yat-sen University, Guangzhou, PR China From Information Processing Letters 86 (2003) 報告人：張家榮

Overview Introduction Backgrounds BHS-tree Boolean algorithm
Empirical results Some ideas

Minimal hitting set First used by R. Reiter 1987 in “A theory of diagnosis from first principles, Artificial Intelligence ” Can be used to solve minimum set cover problem, diagnosis problem and teachers and courses problem.

Minimal hitting set Given a collection C={Si │i ∈ N} of sets of elements from some universe U, a hitting set is a set S µ U such that S  Si  ® for all i. “Minimal” means no element in S can be deleted. For example, C={ {M1, M2, A1}, {M1, A1, A2, M3}}. The minimal hitting sets are {M1}, {A1}, {M2, A2}, {M2, M3}.

Minimum set cover problem
Definition: A set of sets whose union has all members of the union of all sets. The set cover problem is to find a minimum size set. Formal Definition: Given a set S of sets, choose c ⊆ S such that∪ c = ∪ S.

Reduced to MHS a b c d e 1 2 1 2 3 3 2 3 5 4 4 5 5 1 2 3 4 5 e c a d a

Diagnosis problem Some components of a system may cease to operate as designed and cause the discrepancy between the expected behavior of the system and the observed behavior

Diagnosis problem (cont.)
(1) compute the collection of all minimal conflict sets (2) transform the conflict sets into diagnoses. A minimal conflict set is a minimal set of components, such that the assumption that each of these components is behaving correctly is inconsistent with the system description and the observation.

How? (1)Conflict sets Not my business (2)Diagnoses
Using minimal hitting set

Backgrounds (R. Reiter,1987)
Hitting sets can be computed by HS-trees. Problem: the size of the tree grows exponentially Minimal hitting sets can be efficiently obtained by a pruned HS-tree. Problem: If the construction of the tree starts with an unfavorable set, the hitting sets may be deleted by pruning;

Backgrounds (cont.) Greiner (1989) has revised the HS-tree into an HS-DAG in which the minimal hitting sets are not be deleted. Separately, Reggia (1983) found the relationship between set-covering problem and hitting sets Haenni (1997) found that the inversions of the hypergraph are the minimal hitting sets.

Backgrounds (cont.) Vinterbo (2000) presented approximate hitting sets, He also used genetic algorithms. Wotawa (2001) presented the HST-tree algorithm. The implementation of the algorithm can be done in a straightforward way, so an efficient implementation is possible.

Notations Minimal set cluster MCS = {C1,C2,…Cn}
Each node is a tuple (C,H), where C and H are set clusters. The root node is (MCS, {}). The left and right children of a node are denoted by (Cl ,Hl ) and (Cr ,Hr ), respectively. function μ is used for the deletion of non-minimal conflict/hitting sets

BHS-tree The tree is defined recursively as follows.
if C = {}, then the BHS-tree is empty else select any element a ∈ ∪ Ci, (Cl = {Ci -{a} | a ∈ Ci }, Hl = {a}) and (Cr = {Ci | a /∈ Ci }, Hr = {}).

Example

Algorithm Step 1. If a node is leaf node, then MHS of this node is H; else run Steps 2 and 3 recursively. Step 2. Replace every parent node H with { H ,{ ml ∪ mr | ml ∈ Hl , mr ∈ Hr } }. Step 3. Minimize H at the root node with the functionμ until it comprises all minimal hitting sets

Online? When a new measurement is added to the conflict sets it is not necessary to compute again the old conflict sets, but it is only necessary to add a new branch to the BHS-tree.

Add conflict set <5, 6>

Boolean algorithm The conflict sets CS are presented as CNFs where each atom is negative. Example Suppose CS = {C1,C2, ,Cm} is a conflict set cluster, where Ci = {ei1, ei2, ,ein}, Conflict-set Boolean formula (CSF)= ~e11 ~e12 …~e1n1 + ~e21 ~e22 …~e2n2 +…+ ~em1 ~em2 …~emnm

Boolean algorithm(cont.)
A hitting set H is presented as a disjunction of its elements. Example H={h1,h2, ,hn} Hitting-set Boolean formula (HF)=… h1h2. …hn Why? -Theorem 1 If H is a HS of CS, CSF · HF=0

Definition C is a boolean formula H(C) function is defined recursively
H(0)=1,H(1)=0 H( ~e )=e H( ~e · C ) = e + H( C ) H( ~e + C ) = e · H( C ) else 5) H( C ) = e ·H(C1)+H(C2) where C1 ⊆ C and ~e /∈ C1 and C2 ={c | c ∪ {~e} ∈ C} ∪C1

Theorem 2 Suppose CSF is a Boolean formula of CS, then H(CSF) is a Boolean formula of HS of CS The proof can be done by using mathematical introduction over the size of CS, k=|CS|

Proof of theorem 2 1)..4) are straightforward
Suppose that, when k · n, situation (5) is proved. Now take k = n + 1. For an arbitrary element e ∈ ∪ S∈CS S, H(CSF) = e ·H(CSF1)+H(CSF2), while CSF1 and CSF2 are as defined.

Proof of theorem 2(cont.)
By assumption, H(CSF1) is HS of CS1. CS1 ⊆ CS and ~e /∈ CS1,so e ·H(CSF1) must be HS of CS. Obviously, H(CSF2) is HS of CS. So, H(CSF) must be HS of CS Minimum property can be proved by Boolean absorption properties.

Empirical results In ANSI C (UNIX)
(SGI 2200 Origin, CPU 4400 MHz, MIPS R12000 (IP27) processors, main memory 2 GB, OS IRIX 64 Release 6.5.) HS-tree ( □ ) BHS-tree ( + ) Boolean algebraic ( O )

Empirical results (cont.)
Boolean algebraic algorithm needs less memory or running time. It is not sensitive to the selection sequence of element Only |CS| and ∪ c∈CS C influence the efficiency.

Difference between BHS-tree and Boolean algebraic
(1) BHS-tree: Binary tree structures Boolean algebraic: list structures (2) BHS-tree: Two steps constructing the binary-tree using it to compute the hitting sets recursively. Boolean algebraic: One step

Approximation Algorithms for the Selection of Robust Tag SNPs
Kui Zhang Ting Chen This talk is about how to handle SNP genotyping with missing data. My name is Yao-Ting Huang and my advisor is Kun-Mao Chao. And we have two coauthors not here today, They are Prof. Zhang and Prof. Chen. Yao-Ting Huang Kun-Mao Chao Dept. Computer Science & Information Engineering, National Taiwan University Dept. Biostatistics, University of Alabama at Birmingham, USA Dept. Biological Sciences, University of Southern California, USA 2019/2/24

Transformation Each SNP can distinguish partial pairs of patterns.
S1 can distinguish (P1, P3), (P1, P4), (P2, P3), and (P2, P4). S2 can distinguish (P1, P4), (P2, P4), (P3, P4). S3 S4 S1 S2 To solve this problem, we first take a closer look at the function of each SNP. If we pick SNP 1, we can be sure that we can distnguish patterns 1 and 3, Because they are in different color at this SNP locus. And we formulate this relation into a bipartite graph. SNP 1 can also distinguish patterns 1 and pattern 4. and so on. (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) There are pairs of patterns

Observation 1: Tag SNPs The SNPs can form a set of tag SNPs iff
each pair of patterns is covered by at least one edge from the SNPs. e.g., S1 and S3 can form a set of tag SNPs. e.g., S1 and S2 can not be tag SNPs. S3 S1 S2 One unanswered question is what kind of SNPs can be tag SNPs. We can easily answer this question by seeing if the bottom nodes in the graph are all covered by edges from them. For example, SNPs 1 and 3 are tag SNPs. And SNPs 1 and 2 are not tag SNPs. Because patterns 1 and 2 are not covered. So we can not distinguish patterns 1 and 2. (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) Each pair of patterns is covered by at least one edge

Observation It is the same with the minimal set cover problem .
Minimal set cover problem can be transformed to the minimal hitting set problem.

Observation (cont.) CSF = ( ~S3 ~S4+ ~S1 ~S3 + ~S1 ~S2 ~S4 + ~S1 ~S4 + ~S1 ~S2 ~S3 + ~S2 ~S3 ~S4 ) = ~S3 ~S4 + ~S1 ~S3 + ~S1 ~S4 H(CSF) = S3 ·H( ~S1 ~S4 ) +H(~S4 + ~S1 + ~S1 ~S4 ) = S3 · S1 + S3 · S4 + S4 · S1 So, S1 and S3 can form a set of tag SNPs. so can S3 and S3 ,S1 and S4

Observation 2: Missing Data
P1 P2 P3 P4 S3 S1 S2 If a SNP is genotyped as missing data, it is the same as the removal of its node and edges. S4 S3 S4 S1 S2 Another important question is what’s the effect of missing data? It is easy to tell by this graph because it’s just like removing the node and edges from the graph. (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) Suppose S4 is genotyped as missing data

Problem Reformulation
S3 S4 S1 S2 To tolerate m missing tag SNPs, we need to find a set of SNPs such that each pair of patterns is covered by (m+1) edges. e.g., We wish to find a set of robust tag SNPs that tolerates 1 missing tag SNP. S4 S3 S1 From the above two observations, we claim that if we wanna tolerate m missing data, We have to guarantee that each bottom node is covered by at least m plus 1 edges. For example, if we wanna tolerate one missing data, SNPs 1 3 and 4 can be robust tag SNPs. Because each node is covered by at least two edges. (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) Each pair of patterns is covered by at least two edges

Modification of boolean algorithm
H1(0)=1,H1(1)=0, H0( C ) =1 H1( ~e )=e Hi( ~e · C ) = e · Hi-1( C ) + Hi ( C ) Hi ( C ) = 0 if there exists a CS ∈ C which has elements less than I Hi( ~CS+C ) = CS · Hi( C ) if CS has elements equal to I Hi( C ) = e ·Hi-1(C1)+Hi(C2) where C1 ⊆ C and ~e /∈ C1 and C2 ={c | c ∪ {~e} ∈ C} ∪C1

Modification of boolean algorithm (cont.)
To tolerate m missing tag SNPs, we can use Hm+1(CSF) Example H2(CSF)= S3 S4 ·H2( ~S1 ~S3 + ~S1 ~S4) = S1 S3 S4 ·H2 (~S1 ~S4) = S1 S3 S4

The computation of hitting sets: Review and new algorithms

Similar presentations

Presentation on theme: "The computation of hitting sets: Review and new algorithms"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The computation of hitting sets: Review and new algorithms

Similar presentations

Presentation on theme: "The computation of hitting sets: Review and new algorithms"— Presentation transcript:

Similar presentations

About project

Feedback