PINALOG Protein Interaction Network Alignment and its implication in function prediction and complex detection Hang Phan Prof. Michael J.E. Sternberg.

Slides:



Advertisements
Similar presentations
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Advertisements

Basics of Comparative Genomics Dr G. P. S. Raghava.
Structural bioinformatics
An Efficient Parallel Approach for Identifying Protein Families from Large-scale Metagenomics Data Changjun Wu, Ananth Kalyanaraman School of Electrical.
Benchmarking Orthology in Eukaryotes Nijmegen Tim Hulsen.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Systems Biology Biological Sequence Analysis
Comparative ab initio prediction of gene structures using pair HMMs
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
NIGMS Protein Structure Initiative: Target Selection Workshop ADDA and remote homologue detection Liisa Holm Institute of Biotechnology University of Helsinki.
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Complementarity of network and sequence information in homologous proteins March, Department of Computing, Imperial College London, London, UK 2.
Identification of Ortholog Groups by OrthoMCL Protein sequences from organisms of interest All-against-all BLASTP Between Species: Reciprocal best similarity.
Top X interactions of PIN Network A interactions Coverage of Network A Figure S1 - Network A interactions are distributed evenly across the top 60,000.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Reconstructing the metabolic network of a bacterium from its genome: the construction of LacplantCyc Christof Francke In silico reconstruction of the metabolic.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
1 Repeats!. 2 Introduction  A repeat family is a collection of repeats which appear multiple times in a genome.  Our objective is to identify all families.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment Raja Jothi, Teresa.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Graph clustering to detect network modules
Sequence similarity, BLAST alignments & multiple sequence alignments
Hierarchical Agglomerative Clustering on graphs
CSCI2950-C Lecture 12 Networks
Spectral methods for Global Network Alignment
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Protein Families, Motifs & Domains.
Basics of Comparative Genomics
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Functional Annotation of the Horse Genome
Genome Annotation Continued
Comparison of Exemplars of Rotamer Clusters Across the Proteinogenic Amino Acids
Predicting Active Site Residue Annotations in the Pfam Database
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Protein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview -
Spectral methods for Global Network Alignment
SEG5010 Presentation Zhou Lanjun.
Modeling cells with protein networks
Gautam Dey, Tobias Meyer  Cell Systems 
Basics of Comparative Genomics
Nora Pierstorff Dept. of Genetics University of Cologne
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
TF candidate selection pipeline.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Presentation transcript:

PINALOG Protein Interaction Network Alignment and its implication in function prediction and complex detection Hang Phan Prof. Michael J.E. Sternberg Division of Molecular Biosciences Imperial College London PhD Research Day April 1st 2011

Comparison in biology Protein interaction network (PIN) Comparison of sequences and structures have had a central role in bioinformatics Protein interaction network (PIN)

Network alignment methods Analogous to sequence alignment methods: Global alignment methods: Greamlin, IsoRank Local alignment methods: PathBLAST, MAwiSH Pairwise alignment and multiple alignment

PINALOG Principles Global alignment Large equivalenced subgraphs Equivalence includes: Network structure Sequence similarity Function similarity Modules/ complexes in PIN are likely to be conserved across species Detect possible modules in input networks and align these first, then expand.

PINALOG - Method Community detection Community mapping Extension mapping Core pairs PIN A PIN B Mapped core A-N B-P C-M D-Q E-R F-S G-T PIN A Core’s first neighbour PIN B Core’s first neighbour Map these

Protein similarity measures Sequence similarity: BLAST score Function similarity: estimated by similarity of GO terms associated with proteins Combination of sequence and function similarity Θ is automatically calculated by Θ =1- C / ( M + N) Where C: number of reciprocal best BLAST hits of species A and B A: number of proteins in species A B: number of proteins in species B The closer the two species, the larger C gets, the smaller theta, -> less weight on sequence similarity

Protein similarity measures Topological similarity: implicitly included in extension process by awarding protein pairs with similar equivalenced neighbourhood

PINALOG – Method details PINB PINA Candidates for extension mapping, first neighbour of proteins in core   2.1 1.3 0.8 2.5 0.9 2.3 PIN A PIN B I X H Y J U Communities Score(I,X) = s(I,X) + ½ s(A,N) Extension mapping of candidates, add to core and repeat

Alignment result assessment No gold standard for alignment quality Assessment method: Conserved interactions: number, conserved ratio Number of mapped protein pairs belonging to homologous clusters

N conserved interaction Alignment results HUMAN vs. YEAST PIN N pairs N conserved interaction N Homologene pairs N Inparanoid pairs PINALOG_1 3,949 3,388 770 497 PINALOG auto 5,223 3,319 697 454 IsoRank 5,674 717 227 165 PINALOG_1: PINALOG using sequence and network topology PINALOG auto: PINALOG also using function in alignment IsoRank: Singh et al. Proc. Natl. Acad. Sci. USA, 105:12763-12768. Automaticcally detected ortholog groups Homologene : http://www.ncbi.nlm.nih.gov/homologene Inparanoid: http://inparanoid.sbc.su.se/cgi-bin/index.cgi

Function similarity of mapped protein pairs Please put only 2 graphs theta = auto theta = 1 Need to have larger text for axes. Maybe transfer to excel to do graphs

Conserved graphs IsoRank conserved graph PINALOG conserved graph 717 conserved interactions 3,388 conserved interactions No large networks equivalenced

Function prediction by PINALOG Comparison with PSI-BLAST prediction for GO Biological Process PINALOG prediction from yeast interactome, PSI-BLAST prediction from entire UniprotKB Better Recall at the similar level of Precision PINALOG PsiBlast Recall 0.14 0.07 Precision 0.28 0.29

Conserved network analysis (1) Cluster conserved network of human PIN by protein function Assess overlap of clusters with known protein complexes in CORUM database Human CORUM Core complexes number of complexes number of proteins in clusters number of proteins in complexes coverage rate PINALOG auto all complexes 251 1,179 1,471 0.80 PINALOG_1 all complexes 223 914 1,131 0.81 Clustering conserved network of human PIN by protein functions Assess overlap of clusters with known protein complexes Map clusters to yeast PIN, check overlap with known complexes Assess functional correspondence of

Conserved network analysis(2) HUMAN – Cluster 12 YEAST – Map of cluster 12 19/22S Regulator PA700 20S proteasome 20S proteasome

Conclusions PINALOG is a novel network alignment focusing on functional equivalence. Superior to IsoRank in quality of network alignment Can predict components of protein complexes Provide enhanced functional annotation in absence of homology An alternative to network alignment methods for the bioinformatics community

Acknowledgement I would like to thank the Wellcome Trust for generous funding

Function similarity by GO term semantic similarity Semantic similarity(1): based on information content(IC) of terms IC of term c: , p(c) is the freq. of c in the corpus Similarity measures: Relevance: cA is the most informative common ancestor

Semantic similarity examples Total 500 proteins annotated 500 GO3 - GO4 GO3 - GO3 GO1 - GO4 GO1 - GO2 cA GO1 GO3 GO0 IC(cA) 1.009 2 simRel 0.503 0.990 0.692 GO0 49 98 GO1 GO2 Change graph and text 5 12 GO3 GO4

Function similarity Schlicker’s *similarity of two proteins Protein A: annotated with terms a1, a2, ... an Protein B: annotated with terms b1, b2, ... bn Function similarity = max {rowScore, columnScore} rowScore = 1/m ∑yi columnScore = 1/n ∑xi a1 a2 a3 an Max Row b1 y1 b2 y2 bm ym Max column x1 x2 xn *Schlicker et al.2006 BMC Bioinformatics doi: 1471-2105-7-302.