Download presentation
Presentation is loading. Please wait.
Published byΜυρίνα Λιάπης Modified over 5 years ago
1
Rotate! Base Clique Motifs for bipartite graph G9.1
Investors(1,2,3,4,5) recommending Stocks(A,B,C,D,E) 1 2 3 4 5 A B C D E 2 1 3 4 5 I A B C D E S Adjacency Matrix 1 2 4 3 A B C D E SI-Raster Edge Table (Traditional) 1 3 4 2 ExpBase SI cTrees B A C D E 1 3 2 4 Base SI cTrees S I 5 All bipartite graph BcTrees are induced subgraphs (and cliques). All bipartite graph EBcTrees are maximal cliques. How do we mine for other motifs? Is motif mining even useful in the Investor-Stock case? (Maybe it would be useful to know that the 3-3 motif occurs many times (3 investors recommending 3 stock). Motifs seem to be of greatest interest in the context of Gene-Gene or Protein-Protein interaction graphs in which the two label sets are the same and therefore there is just one Base cTreeSet and one EBcTreeSet to create (easier) and the h-k motifs are not distinct from the k-h motifs. A key question is: in PPI graphs, would the counts of Expanded Base Clique Motifs provide important information? 1 2 3 4 5 A B C D E SI-Raster Edge Map ExpBase IS cTrees 1 2 4 5 3 1 4 2 3 5 Base IS cTrees I S B A C D E 2 1 3 4 5 3Lev Stride=5 NPZ SI pTrees Lev=2 Lev=1 Lev=0 Create Expanded Base cTrees Rotate! The number of isomorphic copies of an ExpandedBaseClique Motif (EBCM) can be counted by analyzing cTree counts: I Adjacency Matrix B A C D E S 1 B A C D E 1 2 3 4 IS-Raster EdgeTbl Thus for this bipartite graph there are: 2 3,3 SI EBC Motifs 2 4,2 SI EBC Motifs 1 2,4 SI EBC Motif 1 5,1 SI EBC Motif In addition, we have: 2 1,4 SI BC Motifs 2 4,1 SI BC Motifs 2 1,3 SI BC Motifs 1 3,1 SI BC Motif 1 1,2 SI BC Motif 1 B A C D E 2 3 4 5 IS-Raster EdgeMap 3Lev Stride=5 NPZ IS pTrees Lev=2 Lev=1 Lev=0 2 1 3 4 5
2
Base Clique Motifs for unipartite graph G1
Base Clique Motifs for unipartite graph G1.1 Proteins(1,2,3,4,5) interactions 1 2 1 3 4 5 Adjacency Matrix 3 2 4 1 5 Edge Table 2 1 3 4 5 Base cTrees 2 3 4 5 2 1 3 4 5 3Lev Stride=5 NPZ pTrees Lev=2 Lev=1 Lev=0 1 2 3 4 5 Edge Map 1 2 4 EB cTrees (oa) Create Expanded Base cTrees The number of isomorphic copies of an EBC Motif can be counted by analyzing cTree counts: Thus for this unipartite graph there are: 1 2,2 EBC Motif 1 4,1 EBC Motif In addition: ,3 BC Motifs 11 1,2 BC Motifs
3
Stock Day Investor cTrees Day Stock Investor cTrees Stock Investor
Base CliqueTrees for 3HG2 TriEdgeTable (S,D,I,R) has 6 key sort orders, SDI,DSI,SID,ISD,DIS,IDS. The Adjacency Matrix (data cube) has 1 for each existing TriEdge (that Investor recommended that stock on that day). There are 6 Base cTreeSets and 1 operator, aoa, to generate Expanded Base cliqueTrees. S A B … D α 1 I 2 3 4 5 R C E CtI Stock Day Investor cTrees S (1st sort dim) D (2nd) I (3rd) 4Level Stride=5 rasterSDI NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 3 4 2 5 4Level Stride=5 rasterDSI NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 3 4 2 5 D … A S B C E 1 I 2 3 4 5 R Day Stock Investor cTrees CtI 4Level Stride=5 rasterSID NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 5 2 4 3 B A C D E 2 1 3 4 5 CtD Stock Investor Day cTrees S … I α R
4
Investor Stock Day cTrees Day Investor Stock cTrees Investor Day
Base CliqueTrees for 3HG2 TriEdgeTable (S,D,I,R) last 3 key sort orders, ISD,DIS,IDS. 4Level Stride=5 rasterISD NPZ pTrees (same as SID on pevious slide) Lev=3 Lev=2 Lev=1 Lev=0 1 5 2 4 3 B A C D E 2 1 3 4 5 CtD Investor Stock Day cTrees … α ISDR 4Level Stride=5 rasterDIS NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 4 5 3 1 4 B A C D E 2 3 5 Day Investor Stock cTrees CtS … DISR 1 4 B A C D E 2 3 5 Investor Day Stock cTrees CtS … α IDSR 4Level Stride=5 rasterIDS NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 4 5 3
5
Stock Day Investor Base cTrees Day Stock Investor Base cTrees Stock
Maximal Base CliqueTrees for 3HG2 Stock Day Investor Base cTrees B A C D E 2 1 3 4 5 CtI 1 4 3 2 5 1 4 3 2 5 aoa oaa (all of these will be Max Cliques) We can count the S=1 D=1 I=4 motifs? COMBO(5,4)=5 = 11 113? 10+6C(4,3)+C(5,3) = 54 112? C3,2+6C4,2+C5,2 = 83 Day Stock Investor Base cTrees B A C D E 2 1 3 4 5 CtI 1 3 4 2 5 1 3 4 2 5 aoa oaa (all of these Max Cliques, only 3 new ones) Stock Investor Day Base cTrees B A C D E 2 1 3 4 5 CtD 1 2 5 3 4 1 2 5 3 4 aoa oaa (all of these Max Cliques, only 3 new ones)
6
Investor Stock Day cTrees Day Investor Stock cTrees Investor Day
Base CliqueTrees for 3HG2 last 3. Investor Stock Day cTrees 2 1 3 4 5 1 5 2 4 3 1 2 5 4 3 1 2 5 4 3 B A C D E aoa oaa (all of these will be Max Cliques) Day Investor Stock cTrees 1 4 5 3 1 2 4 5 3 1 3 2 4 5 2 1 3 4 5 B A C D E aoa oaa (all of these will be Max Cliques) Investor Day Stock cTrees 2 1 3 4 5 1 4 5 3 1 3 4 5 2 1 2 3 4 5 B A C D E aoa oaa (all of these will be Max Cliques)
7
Maximal Base CliqueTrees for 3HG2
aoa then oaa on the 6 cTrees (removing duplicates - no covers since aoa then oaa gives Maximal Cliques only). We get 34 MCs below. Theorem: These 34 MCs are the only Maxmal Cliques. Proof: Let Let C be MaxClique, v1Part1(C), w1Part2(C), {z1..zn}=Part3(C). Apply aoa to that BaseClique, B. aoa(B)={v1,w1..wm,z1..zn} is a clique W={w1..wm}Part2(C) else C is not max. oaa(aao(B))={v1..vk,W,Z} is clique. V={v1..vk}Part1(C) else C not mx. Thus {V,W,Z} is a MaxClique C and therefore {V,W,Z}=C. Thus C is one of the Expanded Base Cliques under aoa then oaa. General thm: {a..ao(a..oa(…oa..a(B)|B=BaseClique} is the MaxCliqueSet. Thus, for a bipartite graph, ao(B) is MCS. 1 2 5 3 4 B A C D E B A C D E 1 4 3 2 5 2 1 3 4 5 1 2 5 4 3 B A C D E 1 3 4 2 5 B A C D E 1 5 2 4 3 B A C D E 1 2 3 4 5 B A C D E 1 4 3 2 5 B A C D E 1 5 2 4 3 B A C D E 1 3 2 4 5 B A C D E 1 2 5 3 4 B A C D E 1 2 5 3 4 B A C D E
8
Stock-Day-Investor BaseCliqueTrees (leaves Inv)
Base CliqueTrees for 3PART HyperGraph, 3PHG2 {12345}=Investors recommending Stocks={ABCDE} on Days={,,,,}, 74 recommendations ACD 124 ABCDE 1234 ABCDE 124 AE 124 A 123 ABCD 12 B ABCD 12 ABE 14 ABCDE 2345 ABCDE 12 CD 1234 CD 1234 CDE 234 CDE 234 oaa results E E E E A B C C C D D D D 2 aoa results Stock-Day-Investor BaseCliqueTrees (leaves Inv) ACD ABCDE ABCDE AE ABCD ABCD ABE ABCDE ABCDE CD CD CDE CDE ACDE ABCDE oaa ABCD aoa on these CD D 1 3 1 3 1 4 1 3 1 3 1 2 1 2 1 5 1 3 1 2 1 4 1 2 1 4 1 3 1 2 1 4 1 2 1 4 1 3 1 1 3 1 2 1 4 1 3 1 3 ABCDE CD CD A B C D oaa ABCD 124 ABC 124 ABCD ABCD 124 ABCE aoa on these ACD AE CD 124 C 124 D 124 ACD ABCDE AE CD CD E A B C D D C E E E A B C C D D D C 12 C E Stock-Investor-Day BaseCTrees (leaves Days) AC 1 ABCDE oaa results ACDE 2 AE 4 ABC 1 ABCDE 2 ABCDE ABCDE CDE ACDE ABCD ABCE aao results C 12 C E 2 E B A C D E 1 2 3 4 5 CtS CtD CtI 1 5 1 5 1 2 1 4 1 4 1 4 1 2 1 2 1 1 5 1 5 1 3 1 3 1 3 1 5 1 3 1 3 1 3 1 5 1 3 1 4 AC 1 ABCDE ACDE 2 AE 4 ABC 1 ABCDE 2 ABCDE ABCDE CDE ACDE ABCD ABCE ABCDE ABE Inv-Day-Stock BaseClTrees (leaves Stocks) aoa results ABCDE ABE aao results ACDE ABCDE B A C D E 1 2 3 4 5 CtS CtD CtI 1 4 1 5 1 3 1 4 1 1 5 1 5 1 5 1 5 1 5 1 5 1 1 5 1 5 1 5 1 4 1 4 1 3 aao ABCDE ABCDE 23 ABCDE 124 aoa on these ABCDE ABCDE 12 ABCDE ABCDE 2 ABCDE 1 oaa AB ABC 12 ABCDE 12 aao on these AC 12 A B C 12 AC 1 ACDE 2 ABCDE ABCDE ABCDE ABCDE ABCDE ABCDE ABCDE 2 ABCDE ABCDE ABCDE
9
Edge Count Clique Thms Graph C is a clique iff |EC||PUC|=COMB(|VC|,2)|VC|!/((|VC|-2)!2!)
(VC,EC) is a k-clique iff induced k-1 subgraph, (VD,ED) is a (k-1)-clique. Apriori Clique Mining Alg Uses an ARM-Apriori-like downward closure property: CSkkCliqueSet, CCSk+1Candidatek+1CliqueSet. By SGE, CCSk+1= all s of CSk pairs w k-1 common vertices. Let CCCSk+1 be a union of 2 k-cliques w k-1 common vertices. Let v,w be the kth vertices (different) of the w k-cliques: CCSk+1 iff (PE)(v,w)=1. Breadth-1st Clique Alg: CLQK=all Kcliques. Find CLQ3 w CS0. A Kclique and 3clique sharing an edge form a (K+1)clique iff all K-2 edges from the non-shared Kclique vertices to the non-shared 3clique vertex exist. Next find CLQ4, then CLQ5, … Depth-1st Clique Alg: Find a Largest MaxClique v. If (x,y)E and Count(NewPtSet(v,w,x,y)CLQ3pTree(v,w)&CLQ3pTree(x,y)): 0, 4 v’s form a max4Clique (i.e., v,w,x,y). 1, 5 v’s form a max5Clique (i.e., v,w,x,y,NewPt) 2, 6 v’s form max6Clique if NewPairE, else form 2 max5Cliques. 3, 7 v’s form max7Clique if each NewPairE, elseif 1 or 2 NewPairsE each 6VertexSets (vwxy + 2 EdgeEndpts) form Max6Clique, elseif 0 NewPairsE, each 5VertexSet (vwxy + 1 NewVertex) forms maximal 5Clique…. Theorem: hCliqueNewPtSet, those h vertices together with v,w,x,y form a maximal h+4Clique, where NPS(v,w,x,y)=CLQ3(v,w)&CLQ3(x,y). GRAPH (linear edges, 2 vertices) kHYPERGRAPH (edges=k vertices) kPARTITE GRAPH (V=!Vi i=1..k (x,y)Ex,ysame Vi ) kPARTITE HYPERGRAPH (V=!Vi i=1..k (x1..xk)Exj,xjsame Vi ) 2graph=2hypergraph. Bipartite Clique Mining finds MaxCliques at cost of pairwise &s. Each LETpTreeMCLQ unless pairwise & with same count.A&B, B w Ct(A&B)=Ct(A) is a MCLQ. potential for a k-plex [k-core] mining alg here. Instead of Ct(A&B)=Ct(A), consider. E.g., Ct(A&B)=Ct(A)-1. Each such pTree, C, would be missing just 1vertex (1 edge). Taking any MCLQ as above, ANDing in CpTree would produce a 1-plex. ANDing in k such C’s would produce a k-plex. In fact, suppose we have produced a k-plex in such a manner, then ANDing in any C with Ct(C)=Ct(A)-h would produce a (K+h)-plex. &i=1..nAi is a [i=1..nCt(Ai)]-Core Tripartite Clique Mining Algorithm? In a Tripartite Graph edges must start and end in different vertex parts. E.g., PART1=tweeters; PART2=hashtags; PART3=tweets. Tweeters-to-hashtags is many-to-many? Tweeters-to-tweets is many-to-many (incl. retweets)?; hashtags-to-tweets is many-to-many? Multipartite Graphs Bipartite, Tripartite (have 2,3 PARTs resp.) … The rule is that no edge can start and end in the same PART. HyperClique Mining: A 3hyperGraph has 3 vertex PARTS and each edge is a planar triangle (vertex triple), one from each PART. Stock recommender is 3PARThyperGraph (Investors, Stocks, Days) A triangular "edge" connects Investor #k, Stock X, and Day n if k recommended X on day n. A 3PARThyperClique is a community s.t. all the investors in the clique recommend all the stocks in the clique on each of the days in the clique (A strong signal?) Tweet example: PART1=tweeters; PART2=hashtags; PART3=tweets. Conjecture: KmultiCliques and KhyperCliques in 1-1 corresp. (K vertex set)? So, one of the mining processes only? Represent these common objects w cliqueTrees (cTrees). Cliques, Kplexes. Kcores are subgraphs (communities) defined using internal edge count. A Motif is a subgraph defined using external “isomorphisms in the graph” counting. A motif must occur (isomorphically) in the graph more times than “expected”. Criticism: Some authors argue[62] motif structure does not necessarily determine function. Recent research[64] shows the connections of a motif to the network, is too important to draw function inferences just from local structure.[65] Research shows certain topological features of biological networks naturally give rise to canonical motifs,.[66] Are Stock-Inv or Stock-Inv-Day Motifs useful? Some questions/theorems/thoughts: All K-Paths are isomorphic (thus, there’s alway a Kpath motif) A ShortestKPath is an Induced subgraph. What does sequence FG(1PathMotif)=|V|, FG(2PathMotif),…tell us? Sequence of FG(Shortest1Path), FG(Shortest2Path), …? Sequence FG(MaxShortest1Path), FG(MaxShortest2Path)… tell us? where a MaxS2P is not part of a S3P. Extend to HyperEdges? What is a path in, e.g., a 3HyperGraph? Both? 2HGInterface3HyperGraphPath. 1HGI3HGP. (In general, hHGIkHGP, where 0<h<k) At the other extreme (all SPs are length=1: Or? I’ll bet most important motifs, M(V’,E’) in G are “Shortest Path Motifs”: x,yV’, a G-ShortestPath in M running from x to y. I.e., M is made up of G-SPs. A Clique is a SPMotif (made up entirely of Shortest1Paths)
10
MOTIFs: Cliques, k-plexes, k-cores and other communities are subgraphs defined by internal edge count. A Motif is a subgraph defined by isomorphism count(external). Wikipedia: motifs are recurrent and statistically significant sub-graphs or patterns. They may reflect functional properties. Motif detection is computationally challenging. Most find induced Motifs. A graph, G′, is a subgraph of G (G′⊆G) if V′⊆V and E′⊆E∩(V′×V′). If G′⊆G and G′ contains all ‹u,v›∈E with u,v∈V′, G′ is induced sub-graph. G′ and G are isomorphic (G′↔G), if a bijection f:V′→V with ‹u,v›∈E′⇔‹f(u),f(v)›∈E u,v∈V′. G″⊂G and an isomorphism between G″ and G′, G′ appears in G). The number of appearances G′ in G is the frequency FG of G′ in G, FG(G’). G is recurrent or frequent in G, when FG(G’)>threshold (pattern=frequent subgraph). Motif discovery includes exact counting, sampling, pattern growth. Motif discovery has 2 steps: calculate the # of occurrences; evaluating the significance. Mfinder implements full enumeration and sampling. Brute force exact counting (Milo et al.[3], was computationally feasible only for small motifs of size < 5 vertices. Kashtan et al [9] edge sampling NM alg, estimate concentrations of induced subgraphs for directed or undirected networks starting from an edge (subgraph size 2) then continues choosing random nbr edges until subgraph size=n. Finaly the subgraph is expanded to include all of the edges that exist in the network between these n nodes. It finds motifs up to size=6 and thus, most significant motifs. mfinderSampling: Es=set of picked edges. Vs= set of all nodes that are touched by the edges in E. Initilize Vs and Es=. 1. Pick random edge, e1=(vi,vj). Update Es={e1}, Vs={vi,vj} 2. Make list L of all nbr edges of Es. Omit from L all edges between vertices in Vs Pick random edge e= {vk,vl} from L. Update Es=Es⋃{e}, Vs=Vs⋃{vk,vl}. 4. Repeat 2-3 until |Vs|=n. 5. Calculate the probability to sample the picked n-node subgraph. Apply to G9 below: A 1 3 B 1 3 C 1 6 D 1 4 E 1 8 F 1 8 G 1 a H 1 e I 1 c J 1 5 K 1 4 L 1 6 M 1 3 N 1 3 1 2 3 4 5 6 7 8 9 a b v d e f g h i 1 8 2 1 7 3 1 8 4 1 7 5 1 4 6 1 4 7 1 4 8 1 3 9 1 4 a 1 4 b 1 4 c 1 6 d 1 7 e 1 8 f 1 5 g 1 2 h 1 2 i 1 2 B A C D E F G H I J K L M N
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.