Presentation is loading. Please wait.

Presentation is loading. Please wait.

6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.

Similar presentations


Presentation on theme: "6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical."— Presentation transcript:

1 6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical Informatics University of Pittsburgh

2 2 TCGA began as a three-year pilot from NCI and NHGRI in 2006. Number of Tumors: more than 7,000. Type of tumors: 26. Data type: Gene expression, Somatic mutations, SNP, CNV etc. TCGA data

3 3 Mutations in cancer cells disturb signaling pathway systems. Intuition of our model Mutated genes change functions of proteins in the signaling pathways. Differential expressions of down-stream genes reflect changed state (perturbation) of a signaling pathway. Tumor Sample Mutated genes Differently expressed genes

4 4 Cancer cells usually have many mutations that disturb multiple signaling pathways. We obtained mixed signals, i.e. differently expressed genes belong to differential functional modules. How to group differently expressed genes into functional modules such that each module is regulated by a signaling pathway? How to recognize mutations to different pathways? Challenges in the research Mutated genes Differently expressed genes Tumor sample 1 Mutated genes Differently expressed genes Tumor sample 2 Module 1 Module 2 Module 3 Pathway 1 Pathway 2Pathway 3

5 5 Use differently expressed gene modules as the readouts of signaling pathway perturbations. Find functional modules from differently expressed genes by using Gene Ontology and expression patterns. Find tumor samples in which a module is differentially expressed Use statistics tool to set weights for mutated genes with respect to each functional module. Use graph models to further search networks consisting of mutated proteins to reverse engineer the pathway. Basic idea of our model

6 6 Finding tumors share a common expression module. For each expression module, make a sample-gene relation graph and find a maximum density sub-graph (bi-clustering). – (NP-hard) Refine genes in down-stream modules. Find tumors that change the expression levels of down-stream module. Model detail: Step_1

7 7 Find mutations that carry strong information with respect to expression module. For each module, find a union of mutated genes from the tumor. Then decide the weights of mutated genes. Tumor samples  mutated genes. Use Fisher’s exact test to decide the impact of a mutation to a down- stream gene. Model detail: Step_2

8 8 Construct a network consisting of informative mutations. Create an instance of PPI network, in which mutated genes are assigned weights. Find top weighted short simple paths that end at a transcription factor. – (NP-hard) Reconstruct a network with top-weighted paths. Model detail: Step_3

9 9 Model for finding signaling pathway Find the simple path of length k with minimum weight in the graph (weighted k-path problem).

10 10 A simple way to solve the k-path Problem G=(V,E) is a graph. We want to find a simple path of length k in G. Try every subset V’={v 1,v 2,…,v k } of size k from V. Test every order of elements in V’.

11 11 A simple way to solve the k-path Problem G=(V,E) is a graph. We want to find a simple path of length k in G. Try every subset V’={v 1,v 2,…,v k } of size k from V. Test every order of elements in V’.

12 12 A simple way to solve the k-path Problem G=(V,E) is a graph. We want to find a simple path of length k in G. Try every subset V’={v 1,v 2,…,v k } of size k from V. Test every order of elements in V’. 1 2 34 13452–No 12543–No 15432–No 25134–No 34152–No 12345–Yes 5

13 13 The time complexity is a problem The time to try every subset V’={v 1,v 2,…,v k } of size k from V is O( )=O( ). The time to test every order of elements in V’ is O(k!). Total time is O(n(n-1)(n-2)…(n-k+1)). If n=5,000, k=8, then the time is larger than O(4096 8 )=O(2 96 ). The current best supercomputer, IBM Roadrunner that has 129,600 CUPs, can do 2 49.83 computations per second.IBM Roadrunnerthat has 129,600 CUPs 1 hour1day1year100 years1 million years 2 61.65 2 66.23 2 74.74 2 81.39 2 94.68

14 14 Our k-path Algorithm—Intuition Randomly split G into two subgraphs G1 and G2. Suppose that: is a simple path of k vertices in G. With probability 1/2 k, the random partition will split the k nodes in the path into two disjoint equal halves. Then we can recursively construct the two shorter paths. u1u1 u2u2 u k/2 u k/2+1 u i+2 ukuk G1G1 G2G2 u1u1 u2u2 u k/2 u k/2+1 u i+2 ukuk

15 15 Efficiency of our algorithm Using recurrence relation: T(k)=c2 k (T(k/2)+T(k/2)). We can get time complexity: O(4 k k 2 m), where m is the number of edges in the graph. m<n 2. If n=5000, k=8, then O(4 k k 2 m)<O(2 45.6 ). A current PC with a 1.6G CUP can do 2 30.6 computations per second. Hence a PC can finish the calculation in about 9 hours. (The old simple algorithm cannot be finished in millions of years even use a supercomputer.) So we can use a PC to solve this computational problem.

16 16 Result_1 Examples of down-stream modules: Expression levels of genes in Go Term GO:0008285 (Definition: Any process that stops, prevents or reduces the rate or extent of cell proliferation.) are suppressed in tumor cells. Expression levels of genes in Go Term GO:0030335 (Definition: Any process that activates or increases the frequency, rate or extent of cell migration.) are enhanced in tumor cells.

17 17 Result_2 Example of the most enriched known cancer pathway (Prostate Cancer Signaling Pathway) that overlaps with our pathway structure (corresponding to down-stream module GO:0008258).

18 18 Summary Formulate the biological problem into the computational problem. Design very efficient algorithm to solve the hard computational problems in the models.

19 19 Question? Thank you very much


Download ppt "6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical."

Similar presentations


Ads by Google