Uncovering Signaling Transduction Networks from PPI network by Inductive Logic Programming Woo-Hyuk Jang 2009. 3. 20.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Molecular Systems Biology 3; Article number 140; doi: /msb
Development of on-line database & tool for protein interface analysis Suk-hoon Jung.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
The multi-layered organization of information in living systems
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Background information Formal verification methods based on theorem proving techniques and model­checking –to prove the absence of errors (in the formal.
The STRING database Michael Kuhn EMBL Heidelberg.
Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis Jonsson.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 21 Jim Martin.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Catalyzing ways of thinking. Ferocious Beauty: Genome Liz Lerman Dance Exchange World Premiere February 3, 2006 Wesleyan University.
Functional genomics and inferring regulatory pathways with gene expression data.
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
Systems Biology Biological Sequence Analysis
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Data Mining Presentation Learning Patterns in the Dynamics of Biological Networks Chang hun You, Lawrence B. Holder, Diane J. Cook.
The Central Dogma of Molecular Biology (Things are not really this simple) Genetic information is stored in our DNA (~ 3 billion bp) The DNA of a.
Geometric Approaches to Reconstructing Time Series Data Final Presentation 10 May 2007 CSC/Math 870 Computational Discrete Geometry Connie Phong.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Building Knowledge-Driven DSS and Mining Data
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
Triangulation of network metaphors The Royal Netherlands Academy of Arts and Sciences Iina Hellsten & Andrea Scharnhorst Networked Research and Digital.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Presented by Liu Qi An introduction to Bioinformatics Algorithms Qi Liu
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks Jacob Scott, Trey Ideker, Richard M. Karp, Roded Sharan RECOMB 2005.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Reconstructing Gene Networks Presented by Andrew Darling Based on article  “Research Towards Reconstruction of Gene Networks from Expression Data by Supervised.
Functional Associations of Protein in Entire Genomes Sequences Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding.
Hyun, Bora. Contents Introduction Background & Motivation PreSPI++ Evaluation of PreSPI++ Method DCPPW++ Evaluation Conclusion 2ISI LABORATORY.
Engineering a simpler pheromone response pathway Alex Mallet Endy Lab MIT.
FAULT TREE ANALYSIS (FTA). QUANTITATIVE RISK ANALYSIS Some of the commonly used quantitative risk assessment methods are; 1.Fault tree analysis (FTA)
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.
A Method for Protein Functional Flow Configuration and Validation Woo-Hyuk Jang 1 Suk-Hoon Jung 1 Dong-Soo Han 1
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
For Monday Finish chapter 19 No homework. Program 4 Any questions?
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Introduction to biological molecular networks
June 13-15, 2007Policy 2007 Infrastructure-aware Autonomic Manager for Change Management H. Abdel SalamK. Maly R. MukkamalaM. Zubair Department of Computer.
Data Mining and Decision Support
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
 Signal Transduction transmits signals from outside to the inside of the cell  Integer Linear Programming model is used to unravel STN.
Formal Verification. Background Information Formal verification methods based on theorem proving techniques and model­checking –To prove the absence of.
Computer Science and Engineering PhD in Computer Science Monday, November 07, :00 a.m. – 11:00 a.m. Swearingen Conference Room 3A75 Network Based.
Modeling of Core Protection Calculator System Software February 28, 2005 Kim, Sung Ho Kim, Sung Ho.
Discovery and Dissemination
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Machine Learning Ali Ghodsi Department of Statistics
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Discovery and Dissemination
Proper Refinement of Datalog Clauses using Primary Keys
Homework #2 is due 10/17 Bonus #1 is due 10/24 FrakenFlowers.
What processes do scientists use when they perform scientific investigations? Chapter Introduction.
Presentation transcript:

Uncovering Signaling Transduction Networks from PPI network by Inductive Logic Programming Woo-Hyuk Jang

Contents Introduction Method –ILP system (ALEPH) –ILP modeling example (Marriage Case) ILP modeling of STP Challenges and Future Work

Introduction Most of Signaling Transduction Network (STN) prediction methods follow the sequences, 1) making integrative PPIs, 2) Finding rules from STN, 3) discovering STN components from PPIs. In addition, these methods generally adopt probabilistic model in each sequence. However, –Accumulation of even small noise may lead to big prediction inaccuracy. –Probabilistic model cannot provide biological explanation of the results.

Related Work Steffen, et. al. (2002) –Integrating PPI and microarray data –Netsearch algorithm Yin Liu and Hongyu Zhao (2004) –Ordering proteins when all components of STN are already known. Jacob Scott, et. al. (2006) –A variant of the color coding algorithm –Yeast PPI Gurkan Bebek and Jiong Yang (2007) –Extract functional patterns from STP –PathFinder Xing-Ming Zhao, et. al. (2008) –PPI + gene expression profile –Integer linear programming

New Approach ILP STP PPI Network PreSPI Functional Patterns Corrects True Negative, False Positive path Features Reference Induced Rules

Method Inductive Logic Programming (ILP) –Programs that “generalize” –Programs that follow the Specific  General idea Molecular structure of toxic and non-toxic chemicals, other props … Chemical is toxic if it has a ring connected to… and a C atom in… …

Method Inductive Logic Programming (ILP) –A powerful representation language Express complex relationships easily –Easy to provide background information Including other analysis methods like regression etc –We can easily integrate diverse features and their relations that may affect to PPI in STP

A Learning Engine for Proposing Hypotheses(ALEPH) ILP system that follows a very simple procedure that can be described in 4 steps: –1. Select example –2. Build most-specific-clause –3. Search –4. Remove redundant Background knowledge (*.b), Positive example (*.f), Negative example (*.n)

ILP modeling example Case 1 (Marriage) There are some features that have driven this couple to fall in love Property occupation Pos. in brothers personality...

Case 1 (Marriage) Mode Declarations Male1 Property 10 억 Occupation 의사 Pos. in. brothers Youngest PersonalityVery good Male2 Property 100 억 Occupation 사업가 Pos. in. brothers Eldest Personalitygood Male3 Property0 Occupation 의사 Pos. in. brothers Youngest PersonalityVery bad Male4 Property0 Occupation 없음 Pos. in. brothers Eldest PersonalityVery good Female1 Property 5억5억 Occupation 의사 Pos. in. brothers Youngest PersonalityVery good Female2 Property 1억1억 Occupation 학생 Pos. in. brothers middle PersonalityVery good Female3 Property 10 억 Occupation 의사 Pos. in. brothers Youngest PersonalityVery bad Female4 Property0 Occupation 없음 Pos. in. brothers Middle PersonalityVery good

Background Knowledge

Positive & Negative Example Positive Example Person Male1Female1 Male2Female2 Male3Female3 Male4Female4 Negative Example Person Male1Female4 Male2Female3 Male3Female2 Male4Female1

Case 1 (Marriage) Mode Declarations Male1 Property 10 억 Occupation 의사 Pos. in. brothers Youngest PersonalityVery good Male2 Property 100 억 Occupation 사업가 Pos. in. brothers Eldest Personalitygood Male3 Property0 Occupation 의사 Pos. in. brothers Youngest PersonalityVery bad Male4 Property0 Occupation 없음 Pos. in. brothers Eldest PersonalityVery good Female1 Property 5억5억 Occupation 의사 Pos. in. brothers Youngest PersonalityVery good Female2 Property 1억1억 Occupation 학생 Pos. in. brothers middle PersonalityVery good Female3 Property 10 억 Occupation 의사 Pos. in. brothers Youngest PersonalityVery bad Female4 Property0 Occupation 없음 Pos. in. brothers Middle PersonalityVery good

Approach Again ILP STP PPI Network PreSPI Functional Patterns Corrects True Negative, False Positive path Features Reference Induced Rules segments evaluation

ILP modeling on STP We can rewrite STP as a sequence of protein pair STE2/3Gpa1Ste4/18Cdc42Ste20 Pheromone response in MAPK STP Interaction(STE2/3, GPA1). Interaction(GPA1, STE4/18). Interaction(STE4/18, CDC42). Couple(person, person) W_property(male1,10) Go_of(STE2/3, GOXXXXX). GO_of(GPA1, GOXXXXX). GO_of(STE4/18, GOXXXXX).

Feature Selection T. P. Nguyen and T. B. Ho, “Discovering Signal Transduction Networks Using Signaling Domain-Domain Interaction”, 2006, Genome Informatics.

Challenges and Future Work Refined feature selection Build parser for each biological DB Mode declaration –Build determination predicates Evaluation problem –Induced rule from MAPK  extracting segments from PPI  compared to MAPK ???