Social behavior of proteins? Rui Alves. Organization of the talk Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling.

Slides:



Advertisements
Similar presentations
STRING Prediction of protein networks through integration of diverse large-scale data sets Lars Juhl Jensen EMBL Heidelberg.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
The STRING database Michael Kuhn EMBL Heidelberg.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Phylogenetic reconstruction
Predicting interactions between genes based on genome Sequence comparisons The “genomic context” component of STRING Bioinformatics seminar series
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Correlated Mutations and Co-evolution May 1 st, 2002.
Bioinformatics and Phylogenetic Analysis
Bio 465 Summary. Overview Conserved DNA Conserved DNA Drug Targets, TreeSAAP Drug Targets, TreeSAAP Next Generation Sequencing Next Generation Sequencing.
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Similar Sequence Similar Function Charles Yan Spring 2006.
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
“Multiple indexes and multiple alignments” Presenting:Siddharth Jonathan Scribing:Susan Tang DFLW:Neda Nategh Upcoming: 10/24:“Evolution of Multidomain.
Sequence comparison: Local alignment
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Multiple sequence alignment
A number of slides taken/modified from:
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Gene Set Enrichment Analysis (GSEA)
y-sa/2.0/. Integrating the Data Prof:Rui Alves Dept Ciencies Mediques Basiques, 1st.
y-sa/2.0/. Integrating the Data Prof:Rui Alves Dept Ciencies Mediques Basiques, 1st.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
GeWorkbench Highlights caBIG ® Molecular Analysis Tools Knowledge Center AACR Annual Meeting, April 3, 2011.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Calculating branch lengths from distances. ABC A B C----- a b c.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Overview of Bioinformatics 1 Module Denis Manley..
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Using BLAST for Genomic Sequence Annotation Jeremy Buhler For HHMI / BIO4342 Tutorial Workshop.
From Genomes to Genes Rui Alves.
Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Rui Alves Ciencies Mèdiques Bàsiques Universitat de Lleida
PPI team Progress Report PPI team, IDB Lab. Sangwon Yoo, Hoyoung Jeong, Taewhi Lee Mar 2006.
An overview of Bioinformatics. Cell and Central Dogma.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Bioinformatics and Computational Biology
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
Bioinformatics Overview
FLiPS Functional Linkage Prediction Service.
Predicting Active Site Residue Annotations in the Pfam Database
Dr Tan Tin Wee Director Bioinformatics Centre
Sequence Based Analysis Tutorial
Molecular Modeling By Rashmi Shrivastava Lecturer
Multiple sequence alignment & Phylogenetics Analysis
Gautam Dey, Tobias Meyer  Cell Systems 
Alignment IV BLOSUM Matrices
Part II SeqViewer AraCyc Help
Presentation transcript:

Social behavior of proteins? Rui Alves

Organization of the talk Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling Using pathway homology Using protein docking Using microarray data Using protein interaction data

Proteins do not work alone!

Networks of “interactions” predict global function Having the network of proteins/genes in which your protein/gene is inserted provides predictive information: –Which cellular pathways or processes is your protein/gene likely to be involved in

Organization of the talk Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling Using pathway homology Using protein docking Using microarray data Using protein interaction data

Publication databases are source of information

Meta text databases create social models from publication analysis

iHOP is a sofisticated context analysis motor

How does meta-text analysis create networks? Literature database Gene names database Language rules database scripts Entry Gene list Rule list Server/ Program Your genes List of entries mentioning your gene e.g Ste20 e.g activate, inhibit rescue

Organization of the talk Social behavior of the protein?!?!?!? Meta text analysis Evolutionary based protein interaction prediction Using pathway homology Using protein docking Using microarray data Using protein interaction data

Proteins that have coevolved share a function If protein A has co-evolved with protein B, they are likely to be involved in the same process Looking for proteins that coevolved will help prediction social networks of proteins There are many methods to look for co-evolution of proteins –Phylogenetic profiling, gene neighbourhoods, gene fusion events, phylogenetic trees…

Creating phylogenetic profiles Database of proteins in fully sequenced genomes Homology search against each genome Sequence of each protein Database of proteins in one genome Target Genome Homologue in Genome 1? Homologue in Genome 2? … Protein AYN… ………… Database of profiles for each protein in each organism

Using phylogenetic profiles to predict protein interactions Your Sequence (A) Server/ Program Database of profiles for each protein in each organism Database of proteins in fully sequenced genomes Protein id A Target Genome Homologue in Genome 1? Homologue in Genome 2? … ABC…ABC… YNY…YNY… NYN…NYN… …………………… AB 00 i/number of genomes<1 C 1 j/number of genomes A 1 C 0.9 … B 0.11 … Proteins (A and C) that are present and absent in the same set of genomes are likely to be involved in the same process and therefore interact Similarly, if protein A is absent in all genomes in which protein B is present there is a likelihood that they perform the same function! 2 Calculate coincidence index

How to do it? Download genomes Use blast for homology Use perl for homology processing and coincidence index calculations

Proteins A and B are in a conserved relative position in most genomes which is an indication that they are likely to interact Syntheny/Conservation of gene neighborhoods Genome 1 Genome 2 Genome 3 Genome … Protein AProtein BProtein CProtein D Protein AProtein BProtein C Protein D Protein A Protein BProtein C Protein D … Protein AProtein BProtein CProtein D Which of these proteins “interact”?

How to do it? Download genomes Use perl for analysis

Gene fusion events Genome 1 Genome 2 Genome 3 Genome … Protein AProtein BProtein C Protein D Protein AProtein BProtein CProtein D Protein A Protein BProtein C Protein D … Protein AProtein BProtein CProtein D Which of these proteins interact? Proteins A and B have suffered gene fusion events in at least some genomes, which is an indication that they are likely to interact

How to do it? Download genomes Use perl for analysis

Building phylogenetic trees of proteins Genome 1 Genome 2 Genome 3 Genome … Protein AProtein BProtein CProtein D Protein AProtein BProtein C Protein D Protein A Protein BProtein C Protein D … Get sequence of all homogues, align and build a phylogenetic tree Phylogenetic trees represent the evolutionary history of homologue genes/proteins based on their sequence

Distance based phylogenetic trees ACTDEEGGGGSRGHI… A-TEEDGGAASRGHI… ACFDDEGGGGSRGHL… … A1 A2 A3 … A1 A2 A3 A1 5 substitutions 3 substitutions A2 A3 8 substitutions A2 A3 A1 3 5

Maximum likelihood phylogenetic trees ACTDEEGGGGSRGHI… A-TEEDGGAASRGHI… ACFDDEGGGGSRGHL… … Alignment Probability of aa substitution A - E D … A … … E D …

Maximum likelihood phylogenetic trees ACTDEEGGGGSRGHI… A-TEEDGGAASRGHI… ACFDDEGGGGSRGHL… … Alignment A1 A2 A3 A1 5 substitutions 3 substitutions A2 A3 8 substitutions p(1,2) p(1,3) p(2,3) p(2,3)<p(1,2)<p(1,3) A1 A3 A2 A3 A1

Similarity of phylogenetic trees indicates “interaction” between proteins A1 B2 C1 D1 A2 A3 …… … B1 B3 C2 C3 … D3 D2 Proteins A and B have similar evolutionary trees and thus are likely to “interact”

How to do it? Download genomes Use blast,… for analysis Use Clustal, Phylip, PAUP, … for tree building

Organization of the talk Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling Using pathway homology Using protein docking Using microarray data Using protein interaction data

Pathway homology Database of protein sequences in genomes Database of pathways in genomes Database of interactions in genomes Server/ Program Your Sequence Homologue(s) Output

Pathway homology complements protein homology

Organization of the talk Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling Using pathway homology Using protein docking Using microarray data Using protein interaction data

What is protein docking? Protein A Protein B Protein A Protein B Protein A Protein B Protein A Protein B Same area of interaction Protein A Protein B Protein A Protein B Positive Negative Best Docking

Caveats of using protein docking to predict interaction Protein A Protein B Protein C GlycolisysDNA synthesis Proteins may not come into contact in the cell although if they did they could interact Very heavy computationaly

When shoudl we use protein docking to predict network structure? When we have a group of proteins that are known to be involved in the same function and we want to predict how the different proteins interact with each other

How to do it? Download structures or create structure predictions Use GRAMM, HEX, …

Organization of the talk Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling Using pathway homology Using protein docking Using microarray data Using protein interaction data

Predicting protein interactions using micro array data cells Stimulum Purify cDNA Compare cDNA levels of corresponding genes in the different populations Genes overexpressed as a result of stimulus Genes underexpressed as a result of stimulus Genes with expression independent of stimulus Group of proteins involved in response to the stimulus

Organization of the talk Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling Using pathway homology Using protein docking Using microarray data Using protein interaction data

Predicting protein networks using protein interaction data Database of protein interactions Server/ Program Your Sequence (A) A BC D E F Continue until you are satisfied or completed the network

Summary Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling Using pathway homology Using protein docking Using microarray data Using protein interaction data