1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network construction from RNAi data Tamer Kahveci
Signaling Networks 2 MAPK network
Signal reachability 3 Receptor Reporter Luciferase
Signaling and RNA Interference 4 Receptor Reporter Luciferase X Not critical X Critical
Signaling Network Reconstruction from RNAi data 5 Receptor Reporter Not critical Critical
RNAi data and Reference Network 6 Receptor Reporter Not critical Critical Reference network Insert Delete Not consistent ! Consistent !
Overview 7 G R = (V R, E R ) Reference network Constraints G T = (V T, E T ) Target network 10 SiNeC (Signal Network Constructor) S-SiNeC (Scalable Signal Network Constructor) Given Find Goal: Minimize the number of edit operations to make the reference consistent. NP-Complete !
SiNeC algorithm Three steps 1.Order the critical genes left to right based on the topology of G R. [Sloan, 1986] – v 1, v 2, …, v c 2.Edge deletion phase 3.Edge insertion phase 8
Step 1: Order critical genes 9 ReceptorReporter Prioritize based on distance to the reporter + degree
Step 2: Edge deletion 10 Purpose: Eliminate detours around critical genes ReceptorReporter vivi vkvk vjvj Find all (undesirable) paths between non-consecutive critical genes. i.e., Paths which go through only noncritical genes Edges are weighted with the number of such paths they belong to. Remove greedily starting from the largest weight until al paths are disrupted. Bypassed !!!
Step 3: Edge insertion 11 Purpose: Make sure that critical are connected + noncritical genes are consistent ReceptorReporter v i-1 v i+1 vivi Insert an edge from v i-1 to v i if 1.There is no path from v i-1 to v i. 2.There is a noncritical gene on all paths from v i-1 to v i.
Overview 12 G R = (V R, E R ) Reference network Constraints G T = (V T, E T ) Target network 10 SiNeC (Signal Network Constructor) S-SiNeC (Scalable Signal Network Constructor) Given Find Finding all the paths can be too time consuming for large networks
S-SiNeC algorithm 13 Edge insertion 000None A1A1 100A 2 + A 3 + A 4 101A 2 + A 4 110A 3 + A 4 111A4A4 Critical Left reachable Right reachable Edge deletion Reference network vsvs vtvt vivi
S-SiNeC: Edge insertion (A1) 14 Reference network vsvs vtvt vivi L R Purpose: Make sure that noncritical genes are consistent
S-SiNeC: Edge insertion (A2) 15 Reference network vsvs vtvt vivi L R Purpose: Make sure that critical genes are left reachable
S-SiNeC: Edge insertion (A3) 16 Reference network vsvs vtvt vivi L R Purpose: Make sure that critical genes are right reachable
S-SiNeC: Edge insertion 17 L/Re1e1 e2e2 e3e3 1X 2XX 3XX 4XX 5XX 6XX 7X 8X
S-SiNeC: Edge deletion (A4) 18 Reference network vsvs vtvt vivi L R Purpose: Make sure that no detours exist around critical genes Solve minimum cut between L & R
Dataset 19 Reference networks are obtained by random edge shuffling at 5% to 40% mutation rates. 200 references per target network & per mutation rate.
Average distance to the true network 20
Accuracy based on edge class 21 vsvs vtvt Hot Cold
Running time results 22 SiNeC > 1 hour per reference network.
Success rate on constraints 23
24 Accuracy
25 Functional Enrichment of the Pathway
Last Remarks Constructing very large signaling networks from RNAi data is possible in practical running time. Both SiNeC and S-SiNeC are robust to errors in reference network. We recommend – S-SiNeC for very large OR dense networks. – SiNeC otherwise. 26
Acknowledgements 27 CCF IIS