Rotate! Base Clique Motifs Bipartitie graph, G9.1:

Slides:



Advertisements
Similar presentations
CSE 211 Discrete Mathematics
Advertisements

22C:19 Discrete Math Graphs Fall 2014 Sukumar Ghosh.
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
Walks, Paths and Circuits Walks, Paths and Circuits Sanjay Jain, Lecturer, School of Computing.
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
GOLOMB RULERS AND GRACEFUL GRAPHS
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
Lecture 13 Graphs. Introduction to Graphs Examples of Graphs – Airline Route Map What is the fastest way to get from Pittsburgh to St Louis? What is the.
GRAPH THEORY.  A graph is a collection of vertices and edges.  An edge is a connection between two vertices (or nodes).  One can draw a graph by marking.
1 ELEC692 Fall 2004 Lecture 1b ELEC692 Lecture 1a Introduction to graph theory and algorithm.
Trees and Distance. 2.1 Basic properties Acyclic : a graph with no cycle Forest : acyclic graph Tree : connected acyclic graph Leaf : a vertex of degree.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Vertices and Edges Introduction to Graphs and Networks Mills College Spring 2012.
Based on slides by Y. Peng University of Maryland
Module #19: Graph Theory: part II Rosen 5 th ed., chs. 8-9.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Indian Institute of Technology Kharagpur PALLAB DASGUPTA Graph Theory: Introduction Pallab Dasgupta, Professor, Dept. of Computer Sc. and Engineering,
GRAPHS THEROY. 2 –Graphs Graph basics and definitions Vertices/nodes, edges, adjacency, incidence Degree, in-degree, out-degree Subgraphs, unions, isomorphism.
Week 11 - Monday.  What did we talk about last time?  Binomial theorem and Pascal's triangle  Conditional probability  Bayes’ theorem.
Data Structures & Algorithms Graphs
Basic Notions on Graphs. The House-and-Utilities Problem.
Chapter 10 Graph Theory Eulerian Cycle and the property of graph theory 10.3 The important property of graph theory and its representation 10.4.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Chapter 9: Graphs.
Introduction to Graph Theory By: Arun Kumar (Asst. Professor) (Asst. Professor)
Graphs Rosen, Chapter 8. NOT ONE OF THESE! One of these!
Lecture 5.3: Graph Isomorphism and Connectivity CS 250, Discrete Structures, Fall 2011 Nitesh Saxena *Adopted from previous lectures by Zeph Grunschlag.
BCA-II Data Structure Using C Submitted By: Veenu Saini
An Introduction to Graph Theory
IOI/ACM ICPC Training 4 June 2005.
CS 201: Design and Analysis of Algorithms
Stock-Day-Investor BaseCliqueTrees (leaves Inv)
Markov Chains and Random Walks
Reducing Number of Candidates
Graphs Rosen, Chapter 8.
Copyright © Zeph Grunschlag,
Graphs Hubert Chan (Chapter 9) [O1 Abstract Concepts]
CSE 2331/5331 Topic 9: Basic Graph Alg.
The vertex-labelled, edge-labelled graph
The Edge pTree(E), PathTree(PT), ShortestPathvTree(SPT), AcyclicPathTree(APT) and CycleList(CL) of the graph, G1 G1.
Edge Count Clique Alg (EC): A graph C is a clique iff |EC||PUC|=COMB(|VC|,2)|VC|!/((|VC|-2)!2!) SubGraph existence thm (SGE): (VC,EC) is a k-clique.
Chapter 5. Optimal Matchings
GAIO threshold = 15 become: V= D2 H4 GAIO-Ct=
All Shortest Path pTrees for a unipartite undirected graph, G7 (SP1, SP2, SP3, SP4, SP5)
Directed Graphs Directed Graphs 1 Shortest Path Shortest Path
Sentiment Analysis Notes (summarizes Satuday Notes involving Sentiment Analysis and preliminaries) Vertical Graph Analytics Most complex data is modelled.
Based on slides by Y. Peng University of Maryland
Edge Count Clique Alg (EC): A graph C is a clique iff |EC||PUC|=COMB(|VC|,2)|VC|!/((|VC|-2)!2!) SubGraph existence thm (SGE): (VC,EC) is a k-clique.
Edge Count Clique Alg (EC): A graph C is a clique iff |EC||PUC|=COMB(|VC|,2)|VC|!/((|VC|-2)!2!) SubGraph existence thm (SGE): (VC,EC) is a k-clique.
GRAPH (linear edges, 2 vertices) kHYPERGRAPH (edges=k vertices)
The Edge pTree(E), PathTree(PT), ShortestPathvTree(SPT), AcyclicPathTree(APT) and CycleList(CL) of the graph, G1 G1.
What is a Graph? a b c d e V= {a,b,c,d,e} E= {(a,b),(a,c),(a,d),
Edge Count Clique Alg (EC): A graph C is a clique iff |EC||PUC|=COMB(|VC|,2)|VC|!/((|VC|-2)!2!) SubGraph existence thm (SGE): (VC,EC) is a k-clique.
Shortest Path Trees Construction
A Vertical Graph Clustering Technique:
Minimum Spanning Tree Algorithms
APPENDIX Breadth 1st Bipartite Clique Thm on G9 (LETpTrees; exhaustive search; elim if Ct=0|1 AAC; BBC; CCE; DCD; MIM; NIN; B A 1 2 C A 1 3 D A 1.
Graphs and Algorithms (2MMD30)
Lecture 5.3: Graph Isomorphism and Paths
Graphs G = (V, E) V are the vertices; E are the edges.
In taking the inner product of 32 bitwidth Scalar pTreeSets (e. g
GRAPHS Lecture 17 CS2110 Spring 2018.
More Complex Graph Structures? The vertex-labelled, edge-labelled graph TS a e c 5.
Rotate! Base Clique Motifs for bipartite graph G9.1
Closures of Relations Epp, section 10.1,10.2 CS 202.
GRAPHS.
Concepts of Computation
Presentation transcript:

Rotate! Base Clique Motifs Bipartitie graph, G9.1: Inv(1,2,3,4,5) recommend Stock(A,B,C,D,E) 1 2 4 3 A B C D E SI-Raster Edge Table (Traditional) Unipartite G1.1: Proteins(1,2,3,4,5) interactions 1 3 4 2 ExpBase SI cTrees 3 2 4 1 5 Edge Tbl B A C D E 1 3 2 4 Base SI cTrees S I 5 1 2 3 4 5 A B C D E 1 2 3 4 5 2 1 3 4 5 1 2 3 4 5 Adj Matrix 2 1 3 4 5 I A B C D E S Adjacency Matrix 2 1 3 4 5 3Lev Stride=5 NPZ pTrees Lev=2 Lev=1 Lev=0 1 2 3 4 5 Edge Map 1 2 3 4 5 A B C D E SI-Raster Edge Map ExpBase IS cTrees 1 2 4 5 3 2 1 3 4 5 3Lev Stride=5 NPZ SI pTrees Lev=2 Lev=1 Lev=0 1 4 2 3 5 Base IS cTrees I S B A C D E Create EBcTrees Isomorphic EBCMs counted from cTree counts: Rotate! 2 1,4 SI BCMotifs 2 4,1 SI BCMotifs 2 1,3 SI BCMotifs 1 3,1 SI BCMotif 1 1,2 SI BCMotif Create Expanded Base cTrees 2 3,3 SI EBCMotifs 2 4,2 SI EBCMotifs 1 2,4 SI EBCMotif 1 5,1 SI EBCMotif 2 1 3 4 5 Base cTrees B A C D E 1 2 3 4 IS-Raster EdgeTbl The number of isomorphic copies of an EBC Motif can be counted by analyzing cTree counts: 1 2 3 4 5 I Adjacency Matrix B A C D E S 1 Bipartite BcTrees are induced subgraphs (also cliques), EBcTrees are max cliques. Mine for other motifs? Is motif mining even useful in the Investor-Stock case? (Maybe it would be useful to know that the 3-3 motif occurs many times (3 investors recommending 3 stock). Motifs seem to be of greatest interest in the context of Protein-Protein interaction graphs in which the two label sets are the same and therefore there is just one Base cTreeSet and one EBcTreeSet to create (easier) and the h-k motifs are not distinct from the k-h motifs. Question: in PPI graphs, would the counts of Expanded Base Clique Motifs provide important information? Thus for this unipartite graph there are: 1 2,2 EBC Motif 1 4,1 EBC Motif In addition: 5 1,3 BC Motifs 11 1,2 BC Motifs 1 2 4 EB cTrees (oa) 1 B A C D E 2 3 4 5 IS-Raster EdgeMap 3Lev St=5 NPZ IS pTs Lev=2 Lev=1 Lev=0 2 1 3 4 5

cliqueTrees Stock BCTs I S Investor BCTs S I Stock EBCTs I S Inv EBGTs Bipart G11: Inv(12345) rec Stk(ABCDE) Stock BCTs I S 1 4 2 3 5 B A C E D NPZpTr st=5 L=2 L=1 L=0 1 3 2 4 Investor BCTs S I 1 3 2 4 5 A C B D E G11 Stock EBCTs I S 1 2 4 3 5 B A C E D 1 2 A B C    3 H1 Stock EBCTs 1 B A C D E 2 3 4 5 EdgeMap EdgeTbl Adj Matrix Graph Traditional data structures 1 A B C D E 2 3 4 5 New DSs: NPZpT st=5 L=2 L=1 L=0 1 4 5 3 oa oa Stock EBCTs I S 1 4 2 3 5 A B D C E Inv EBGTs 1 3 2 5 4 B A C E D =C a MaxClique.Then 1 of must be a BC, say Expanding it gives C. Thus, for Bipartite Graphs, every MaxClique is an EBCT. 1 H1: On Day() I(123) recommend S(ABC) NPZ pTree (stride=3) L=3 L=2 L=1 L=0 1 2 3 =C a MaxClique.Then 1 of must be a BC, say Expanding it gives C. Thus, for Tripartite Graphs, every MaxClique is an EBCT. 1 1 2 A B C    3 DI StockBaseCliqueTrees D I S 1 2 A B C    3 DI StockBaseCliqueTrees D I S oaa aoa 1 2 3 A B C    1 2 3 A B C    aoa oaa 1 2 3 A B C    1 2 A B C    3

Stock Day Investor cTrees Day Stock Investor cTrees Stock Investor Base CliqueTrees for 3HG2 TriEdgeTable (S,D,I,R) has 6 key sort orders, SDI,DSI,SID,ISD,DIS,IDS. The Adjacency Matrix (data cube) has 1 for each existing TriEdge (that Investor recommended that stock on that day). There are 6 Base cTreeSets and 1 operator, aoa, to generate Expanded Base cliqueTrees. S A B … D α      1 I 2 3 4 5 R C E CtI Stock Day Investor cTrees S (1st sort dim) D (2nd) I (3rd) 4Level Stride=5 rasterSDI NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 3 4 2 5 4Level Stride=5 rasterDSI NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 3 4 2 5 D   … A S B C E 1 I 2 3 4 5 R Day Stock Investor cTrees    CtI 4Level Stride=5 rasterSID NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 5 2 4 3 B A C D E      2 1 3 4 5 CtD Stock Investor Day cTrees S … I α R

Investor Stock Day cTrees Day Investor Stock cTrees Investor Day Base CliqueTrees for 3HG2 TriEdgeTable (S,D,I,R) last 3 key sort orders, ISD,DIS,IDS. 4Level Stride=5 rasterISD NPZ pTrees (same as SID on pevious slide) Lev=3 Lev=2 Lev=1 Lev=0 1 5 2 4 3 B A C D E      2 1 3 4 5 CtD Investor Stock Day cTrees … α ISDR 4Level Stride=5 rasterDIS NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 4 5 3 1 4 B A C D E      2 3 5 Day Investor Stock cTrees CtS … DISR 1 4 B A C D E      2 3 5 Investor Day Stock cTrees CtS … α IDSR 4Level Stride=5 rasterIDS NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 4 5 3

Stock Day Investor Base cTrees Day Stock Investor Base cTrees Stock Maximal Base CliqueTrees for 3HG2 1 4 3 2 5 Stock Day Investor Base cTrees B A C D E      2 1 3 4 5 CtI 1 4 3 2 5 aoa oaa (all of these will be Max Cliques) We can count the S=1 D=1 I=4 motifs? 6 + COMBO(5,4)=5 = 11 113? 10+6C(4,3)+C(5,3) = 54 112? 7+10C3,2+6C4,2+C5,2 = 83 Day Stock Investor Base cTrees B A C D E      2 1 3 4 5 CtI 1 3 4 2 5 1 3 4 2 5 aoa oaa (all of these Max Cliques, only 3 new ones) Stock Investor Day Base cTrees B A C D E      2 1 3 4 5 CtD 1 2 5 3 4 1 2 5 3 4 aoa oaa (all of these Max Cliques, only 3 new ones)

Investor Stock Day cTrees Day Investor Stock cTrees Investor Day Base CliqueTrees for 3HG2 last 3. Investor Stock Day cTrees 2 1 3 4 5 1 5 2 4 3 1 5 4 2 3 1 2 5 4 3 B A C D E      aoa oaa (all of these will be Max Cliques) Day Investor Stock cTrees      1 4 5 3 1 2 4 5 3 1 3 2 4 5 2 1 3 4 5 B A C D E aoa oaa (all of these will be Max Cliques) Investor Day Stock cTrees 2 1 3 4 5 1 4 5 3 1 3 4 5 2 1 2 3 4 5      B A C D E aoa oaa (all of these will be Max Cliques)

Maximal Base CliqueTrees for 3HG2 aoa then oaa on the 6 cTrees (removing duplicates - no covers since aoa then oaa gives Maximal Cliques only). We get 34 MCs below. Theorem: These 34 MCs are the only Maxmal Cliques. Proof: Let C be MaxClique, v1Part1(C), w1Part2(C), {z1..zn}=Part3(C). Apply aoa to that BaseClique, B. aoa(B)={v1,w1..wm,z1..zn} is a clique W={w1..wm}Part2(C) else C is not max. oaa(aao(B))={v1..vk,W,Z} is clique. V={v1..vk}Part1(C) else C not mx. Thus {V,W,Z} is a MaxClique  C and therefore {V,W,Z}=C. Thus C is one of the Expanded Base Cliques under aoa then oaa. General thm: {a..ao(a..oa(…oa..a(B)|B=BaseClique} is the MaxCliqueSet. Thus, for a bipartite graph, the MCS is {ao(B) | B a BaseClique}. (Seems to say that only one of the 6 cTrees will generatea all of MCS?) 1 4 3 2 5 1 2 5 3 4 B A C D E      B A C D E      2 1 3 4 5 1 2 5 4 3 B A C D E      1 3 4 2 5 B A C D E      1 5 2 4 3 B A C D E      1 2 3 4 5 B A C D E      1 4 3 2 5 B A C D E      1 5 2 4 3 B A C D E      1 3 2 4 5 B A C D E      1 2 5 3 4 B A C D E      1 2 5 3 4 B A C D E     

Maximal Base CliqueTrees for 3HG2 aoa then oaa on the 6 cTrees (removing duplicates - no covers since aoa then oaa gives Maximal Cliques only). We get 34 MCs below. Thm: The 34 MCs are only MaxCliques. Pf: C=MaxClique={V,W,Z}. aoa{v,w,Z})={v,W’,Z}, WW’. oaa(aoa{v,w,Z})={V’,W’,Z}, V’V If w’W’-W then v’V-V’ (w’C, v’C) and then aoa{v’,w’,Z}={v’,W”,Z}, {w’,W} 1 4 3 2 5 1 2 5 3 4 B A C D E      B A C D E      2 1 3 4 5 1 2 5 4 3 B A C D E      1 3 4 2 5 B A C D E      1 5 2 4 3 B A C D E      1 2 3 4 5 B A C D E      1 4 3 2 5 B A C D E      1 5 2 4 3 B A C D E      1 3 2 4 5 B A C D E      1 2 5 3 4 B A C D E      1 2 5 3 4 B A C D E     

Stock-Day-Investor BaseCliqueTrees (leaves Inv) Base CliqueTrees for 3PART HyperGraph, 3PHG2 {12345}=Investors recommending Stocks={ABCDE} on Days={,,,,}, 74 recommendations ACD  124 ABCDE  1234 ABCDE  124 AE  124 A  123 ABCD  12 B  12345 ABCD  12 ABE  14 ABCDE  2345 ABCDE  12 CD  1234 CD  1234 CDE  234 CDE  234 oaa results E  234 E  23 E  1234 E  124 A  1234 B  12345 C  1234 C  23 C  124 D  1234 D  23 D  124 D  2 aoa results Stock-Day-Investor BaseCliqueTrees (leaves Inv) ACD  124 ABCDE  1234 ABCDE  124 AE  124 ABCD  12 ABCD  12 ABE  14 ABCDE  2345 ABCDE  12 CD  1234 CD  1234 CDE  234 CDE  234 ACDE  24 ABCDE  23 oaa ABCD  1234 aoa on these CD  1234 D  1234 1 3 1 3 1 4 1 3 1 3 1 2 1 2 1 5 1 3 1 2 1 4 1 2 1 4 1 3 1 2 1 4 1 2 1 4 1 3 1 1 3 1 2 1 4 1 3 1 3 ABCDE  1234 CD  1234 CD  1234 A  1234 B  12345 C  1234 D  1234 oaa ABCD  124 ABC  124 ABCD  124 ABCD  124 ABCE  124 aoa on these ACD  124 AE  124 CD  124 C  124 D  124 ACD  124 ABCDE  124 AE  124 CD  1234 CD  1234 E  124 A  1234 B  12345 C  1234 D  1234 D  124 C  124 E  234 E  23 E  124 A  1234 B  12345 C  1234 C  124 D  1234 D  23 D  124 C  12 C  123 E  24 Stock-Investor-Day BaseCTrees (leaves Days) AC  1 ABCDE  3 oaa results ACDE  2 AE  4 ABC  1 ABCDE  2 ABCDE  3 ABCDE  4 CDE  3 ACDE  4 ABCD  1 ABCE  1 aao results C  12 C  123 E  2 E  24 B A C D E      1 2 3 4 5 CtS CtD CtI 1 5 1 5 1 2 1 4 1 4 1 4 1 2 1 2 1 1 5 1 5 1 3 1 3 1 3 1 5 1 3 1 3 1 3 1 5 1 3 1 4 AC  1 ABCDE  3 ACDE  2 AE  4 ABC  1 ABCDE  2 ABCDE  3 ABCDE  4 CDE  3 ACDE  4 ABCD  1 ABCE  1 ABCDE  1 ABE  4 Inv-Day-Stock BaseClTrees (leaves Stocks) aoa results ABCDE  1 ABE  4 aao results ACDE  24 ABCDE  23 B A C D E      1 2 3 4 5 CtS CtD CtI 1 4 1 5 1 3 1 4 1 1 5 1 5 1 5 1 5 1 5 1 5 1 1 5 1 5 1 5 1 4 1 4 1 3 aao ABCDE  12345 ABCDE  23 ABCDE  124 aoa on these ABCDE  124 ABCDE  12 ABCDE  23 ABCDE  2 ABCDE  1 oaa AB  1234 ABC  12 ABCDE  12 aao on these AC  12 A  1234 B  12345 C  12 AC  1 ACDE  2 ABCDE  1234 ABCDE  124 ABCDE  2345 ABCDE  12 ABCDE  23 ABCDE  3 ABCDE  2 ABCDE  3 ABCDE  4 ABCDE  1

Edge Count Clique Thms Graph C is a clique iff |EC||PUC|=COMB(|VC|,2)|VC|!/((|VC|-2)!2!) (VC,EC) is a k-clique iff  induced k-1 subgraph, (VD,ED) is a (k-1)-clique. Apriori Clique Mining Alg Uses an ARM-Apriori-like downward closure property: CSkkCliqueSet, CCSk+1Candidatek+1CliqueSet. By SGE, CCSk+1= all s of CSk pairs w k-1 common vertices. Let CCCSk+1 be a union of 2 k-cliques w k-1 common vertices. Let v,w be the kth vertices (different) of the w k-cliques: CCSk+1 iff (PE)(v,w)=1. Breadth-1st Clique Alg: CLQK=all Kcliques. Find CLQ3 w CS0. A Kclique and 3clique sharing an edge form a (K+1)clique iff all K-2 edges from the non-shared Kclique vertices to the non-shared 3clique vertex exist. Next find CLQ4, then CLQ5, … Depth-1st Clique Alg: Find a Largest MaxClique v. If (x,y)E and Count(NewPtSet(v,w,x,y)CLQ3pTree(v,w)&CLQ3pTree(x,y)): 0, 4 v’s form a max4Clique (i.e., v,w,x,y). 1, 5 v’s form a max5Clique (i.e., v,w,x,y,NewPt) 2, 6 v’s form max6Clique if NewPairE, else form 2 max5Cliques. 3, 7 v’s form max7Clique if each NewPairE, elseif 1 or 2 NewPairsE each 6VertexSets (vwxy + 2 EdgeEndpts) form Max6Clique, elseif 0 NewPairsE, each 5VertexSet (vwxy + 1 NewVertex) forms maximal 5Clique…. Theorem:  hCliqueNewPtSet, those h vertices together with v,w,x,y form a maximal h+4Clique, where NPS(v,w,x,y)=CLQ3(v,w)&CLQ3(x,y). GRAPH (linear edges, 2 vertices) kHYPERGRAPH (edges=k vertices) kPARTITE GRAPH (V=!Vi i=1..k (x,y)Ex,ysame Vi ) kPARTITE HYPERGRAPH (V=!Vi i=1..k (x1..xk)Exj,xjsame Vi ) 2graph=2hypergraph. Bipartite Clique Mining finds MaxCliques at cost of pairwise &s. Each LETpTreeMCLQ unless  pairwise & with same count.A&B, B w Ct(A&B)=Ct(A) is a MCLQ.  potential for a k-plex [k-core] mining alg here. Instead of Ct(A&B)=Ct(A), consider. E.g., Ct(A&B)=Ct(A)-1. Each such pTree, C, would be missing just 1vertex (1 edge). Taking any MCLQ as above, ANDing in CpTree would produce a 1-plex. ANDing in k such C’s would produce a k-plex. In fact, suppose we have produced a k-plex in such a manner, then ANDing in any C with Ct(C)=Ct(A)-h would produce a (K+h)-plex. &i=1..nAi is a [i=1..nCt(Ai)]-Core Tripartite Clique Mining Algorithm? In a Tripartite Graph edges must start and end in different vertex parts. E.g., PART1=tweeters; PART2=hashtags; PART3=tweets. Tweeters-to-hashtags is many-to-many? Tweeters-to-tweets is many-to-many (incl. retweets)?; hashtags-to-tweets is many-to-many? Multipartite Graphs Bipartite, Tripartite (have 2,3 PARTs resp.) … The rule is that no edge can start and end in the same PART. HyperClique Mining: A 3hyperGraph has 3 vertex PARTS and each edge is a planar triangle (vertex triple), one from each PART. Stock recommender is 3PARThyperGraph (Investors, Stocks, Days) A triangular "edge" connects Investor #k, Stock X, and Day n if k recommended X on day n. A 3PARThyperClique is a community s.t. all the investors in the clique recommend all the stocks in the clique on each of the days in the clique (A strong signal?) Tweet example: PART1=tweeters; PART2=hashtags; PART3=tweets. Conjecture: KmultiCliques and KhyperCliques in 1-1 corresp. (K vertex set)? So, one of the mining processes only? Represent these common objects w cliqueTrees (cTrees). Cliques, Kplexes. Kcores are subgraphs (communities) defined using internal edge count. A Motif is a subgraph defined using external “isomorphisms in the graph” counting. A motif must occur (isomorphically) in the graph more times than “expected”. Criticism: Some authors argue[62] motif structure does not necessarily determine function. Recent research[64] shows the connections of a motif to the network, is too important to draw function inferences just from local structure.[65] Research shows certain topological features of biological networks naturally give rise to canonical motifs,.[66] Are Stock-Inv or Stock-Inv-Day Motifs useful? Some questions/theorems/thoughts: All K-Paths are isomorphic (thus, there’s alway a Kpath motif). A ShortestKPath is an Induced subgraph. What does sequence FG(1PathMotif)=|V|, FG(2PathMotif),…tell us? Sequence of FG(Shortest1Path), FG(Shortest2Path), …? Sequence FG(MaxShortest1Path), FG(MaxShortest2Path)… tell us? where a MaxS2P is not part of a S3P. Extend to HyperEdges? What is a path in, e.g., a 3HyperGraph? Both? 2HGInterface3HyperGraphPath. 1HGI3HGP. (In general, hHGIkHGP, where 0<h<k) At the other extreme (all SPs are length=1: Or? I’ll bet most important motifs, M(V’,E’) in G are “Shortest Path Motifs”: x,yV’,  a G-ShortestPath in M running from x to y. I.e., M is made up of G-SPs. A Clique is a SPMotif (made up entirely of Shortest1Paths)

MOTIFs: Cliques, k-plexes, k-cores and other communities are subgraphs defined by internal edge count. A Motif is a subgraph defined by isomorphism count(external). Wikipedia: motifs are recurrent and statistically significant sub-graphs or patterns. They may reflect functional properties. Motif detection is computationally challenging. Most find induced Motifs. A graph, G′, is a subgraph of G (G′⊆G) if V′⊆V and E′⊆E∩(V′×V′). If G′⊆G and G′ contains all ‹u,v›∈E with u,v∈V′, G′ is induced sub-graph. G′ and G are isomorphic (G′↔G), if  a bijection f:V′→V with ‹u,v›∈E′⇔‹f(u),f(v)›∈E u,v∈V′. G″⊂G and  an isomorphism between G″ and G′, G′ appears in G). The number of appearances G′ in G is the frequency FG of G′ in G, FG(G’). G is recurrent or frequent in G, when FG(G’)>threshold (pattern=frequent subgraph). Motif discovery includes exact counting, sampling, pattern growth. Motif discovery has 2 steps: calculate the # of occurrences; evaluating the significance. Mfinder implements full enumeration and sampling. Brute force exact counting (Milo et al.[3], was computationally feasible only for small motifs of size < 5 vertices. Kashtan et al [9] edge sampling NM alg, estimate concentrations of induced subgraphs for directed or undirected networks starting from an edge (subgraph size 2) then continues choosing random nbr edges until subgraph size=n. Finaly the subgraph is expanded to include all of the edges that exist in the network between these n nodes. It finds motifs up to size=6 and thus, most significant motifs. mfinderSampling: Es=set of picked edges. Vs= set of all nodes that are touched by the edges in E. Initilize Vs and Es=. 1. Pick random edge, e1=(vi,vj). Update Es={e1}, Vs={vi,vj} 2. Make list L of all nbr edges of Es. Omit from L all edges between vertices in Vs. 3. Pick random edge e= {vk,vl} from L. Update Es=Es⋃{e}, Vs=Vs⋃{vk,vl}. 4. Repeat 2-3 until |Vs|=n. 5. Calculate the probability to sample the picked n-node subgraph. Apply to G9 below: A 1 3 B 1 3 C 1 6 D 1 4 E 1 8 F 1 8 G 1 a H 1 e I 1 c J 1 5 K 1 4 L 1 6 M 1 3 N 1 3 1 2 3 4 5 6 7 8 9 a b v d e f g h i 1 8 2 1 7 3 1 8 4 1 7 5 1 4 6 1 4 7 1 4 8 1 3 9 1 4 a 1 4 b 1 4 c 1 6 d 1 7 e 1 8 f 1 5 g 1 2 h 1 2 i 1 2 B A C D E F G H I J K L M N