y-sa/2.0/. Integrating the Data Prof:Rui Alves 973702406 Dept Ciencies Mediques Basiques, 1st.

Slides:



Advertisements
Similar presentations
Unravelling the biochemical reaction kinetics from time-series data Santiago Schnell Indiana University School of Informatics and Biocomplexity Institute.
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Introduction to Proteomics. First issue of Proteomics- Jan. 1, 2001.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Docking of Protein Molecules
An Integrated Approach to Protein-Protein Docking
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
Gene Set Enrichment Analysis (GSEA)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006.
y-sa/2.0/. Integrating the Data Prof:Rui Alves Dept Ciencies Mediques Basiques, 1st.
Social behavior of proteins? Rui Alves. Organization of the talk Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Finish up array applications Move on to proteomics Protein microarrays.
es/by-sa/2.0/. Design Principles in Systems Molecular Biology Prof:Rui Alves Dept Ciencies.
Combinatorial State Equations and Gene Regulation Jay Raol and Steven J. Cox Computational and Applied Mathematics Rice University.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Analysis of the yeast transcriptional regulatory network.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Integrating the Bioinformatic Technology Group into your research programme Introduction People and Skills Examples Integrating the BTG Contacts BHRC Away.
es/by-sa/2.0/. From Protein Sequence to Protein Properties Prof:Rui Alves Dept Ciencies.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Motif discovery and Protein Databases Tutorial 5.
TF-DNA binding dependency A progress report March 17, 2010 Hugo Willy.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Rui Alves Ciencies Mèdiques Bàsiques Universitat de Lleida
Central dogma: the story of life RNA DNA Protein.
es/by-sa/2.0/. Simulation Programs: What is out there? A critical evaluation. Prof:Rui Alves
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Introduction to biological molecular networks
1 Bioinformatics at Norwegian University of Science and Technology Professor Finn Drabløs Department of Cancer Research and Molecular Medicine Finn Drabløs.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Protein-Protein Interactions. A Protein may interact with: –Other proteins –Nucleic Acids –Small molecules Protein Interactions.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
BME435 BIOINFORMATICS.
Functional organization of the yeast proteome by systematic analysis of protein complexes Presented by Nathalie Kirshman and Xinyi Ma.
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
1 Department of Engineering, 2 Department of Mathematics,
Virtual Screening.
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Ligand Docking to MHC Class I Molecules
An Integrated Approach to Protein-Protein Docking
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Presentation transcript:

y-sa/2.0/

Integrating the Data Prof:Rui Alves Dept Ciencies Mediques Basiques, 1st Floor, Room 1.08 Website of the Course: Course:

Outline Methods for reconstruction of functional protein networks –Why is it important? Methods for reconstruction of physical protein interactions

Proteins do not work alone!

Finding the social environment of a protein Finding out what a protein does is not enough –Reductase, ok, but of what? (super-mouse) There is an incredible ammount of information available regarding the biology of many organisms –Sequences, omics, pathways, etc…

Integrating the information is important for network recontruction If we can integrate all the information available for a given protein/gene, then we are likely to be able to predict its social network From here to reconstructing the causal set of interactions in the network, there is only a step –Who does what to whom

Methods for network reconstruction Mapping Gene onto known pathways –If a gene is orthologous to genes in other organisms for which we known the pathways and circuits, then we can assume that they work in that circuit in the new organism

Find a gene in a new genome …Sequenced … Genome… Sequence of ste20 Orthologue gene

Reconstruct same pathway in new organism Ste20 new organism

Methods for network reconstruction Mapping Gene onto known pathways Using text analysis –Scientific literature as accumulated over centuries now. –No one can know everything and read everything. –However, information is buried in there –Mining that information can assist in network reconstruction

Publication databases are source of information

Meta text databases create network models from publication analysis

iHOP is a sofisticated context analysis motor

How does meta-text analysis create networks? Literature database Gene names database Language rules database scripts Entry Gene list Rule list Server/ Program Your genes List of entries mentioning your gene e.g Ste20 e.g activate, inhibit rescue

Problems with this set up Delay with respect to available information Disregards a lot of information available over the web

Text Miner will address this

Text Miner

Things to do Statistical Significance –Internal controls –Overall controls Sentence Mining –Definition of action words ontology to help automated function mining Graphical Drawing –Allowing for mouse drag and droping Selector for interaction that are to be trusted and included in the model

Problems with this set up Slow, analysis and document retrieval is done live –In the future there will be an option so that if a search has been done by someone before the user will be able to use that, instead of doing a live search There is more “junk info” –However you can control that by selecting the sources of information you want to use

Methods for network reconstruction Mapping Gene onto known pathways Meta text analysis Evolutionary based protein interaction prediction –Proteins that work together (i.e. belong to the smae close social network) evolve together –Ergo, proteins that show co-evolution are to likely to work together

Proteins that have coevolved share a function If protein A has co-evolved with protein B, they are likely to be involved in the same process Looking for proteins that coevolved will help prediction social networks of proteins There are many methods to look for co-evolution of proteins –Phylogenetic profiling, gene neighbourhoods, gene fusion events, phylogenetic trees…

Using phylogenetic profiles to predict protein interactions Your Sequence (A) Server/ Program Database of profiles for each protein in each organism Database of proteins in fully sequenced genomes Protein id A Target Genome Homologue in Genome 1? Homologue in Genome 2? … ABC…ABC… YNY…YNY… NYN…NYN… …………………… AB 00 i/number of genomes<1 C 1 j/number of genomes A 1 C 0.9 … B 0.11 … Proteins (A and C) that are present and absent in the same set of genomes are likely to be involved in the same process and therefore interact Similarly, if protein A is absent in all genomes in which protein B is present there is a likelihood that they perform the same function! 2 Calculate coincidence index

Phylogenetic coincidence server We have one that will be up in a few months for yeast, coli, man, chimp, candida and xanthus.

Syntheny/Conservation of gene neighborhoods Genome 1 Genome 2 Genome 3 Genome … Protein AProtein BProtein CProtein D Protein AProtein BProtein C Protein D Protein A Protein BProtein C Protein D … Protein AProtein BProtein CProtein D Which of these proteins interact? Proteins A and B are in a conserved relative position in most genomes which is an indication that they are likely to interact

Gene fusion events Genome 1 Genome 2 Genome 3 Genome … Protein AProtein DProtein C Protein B Protein AProtein BProtein CProtein D Protein A Protein BProtein C Protein D … Protein AProtein BProtein CProtein D Which of these proteins interact? Proteins A and B have suffered gene fusion events in at least some genomes, which is an indication that they are likely to interact

Building phylogenetic trees of proteins Genome 1 Genome 2 Genome 3 Genome … Protein AProtein BProtein CProtein D Protein AProtein BProtein C Protein D Protein A Protein BProtein C Protein D … Get sequence of all homogues, align and build a phylogenetic tree Phylogenetic trees represent the evolutionary history of homologue genes/proteins based on their sequence

Similarity of phylogenetic trees indicates interaction between proteins A1 B2 C1 D1 A2 A3 …… … B1 B3 C2 C3 … D3 D2 Proteins A and B have similar evolutionary trees and thus are likely to interact

Protein/Gene interactions Often, people use these methods to say that genes of proteins interact. The methods previously describe can not be used accurately to describe PHYSICAL interaction When people say interact in this context one is forced to assume FUNCTIONAL (not necessarily physical) interaction, unless more info is available

Methods for network reconstruction Mapping Gene onto known pathways Using meta text analysis Using phylogenetic profiling Using omics data –If two proteins/genes have evolve to perform a function in the same process, it is likely that their activity and gene expression is co-regulated –Conversely, if proteins/genes are co-regulated, then they are likely to participate in the same process

Predicting gene functional interactions using micro array data cells Stimulum Purify cDNA Compare cDNA levels of corresponding genes in the different populations Genes overexpressed as a result of stimulus Genes underexpressed as a result of stimulus Genes with expression independent of stimulus Group of genes/proteins involved in response to the stimulus

Gene network reconstruction Reconstruction of gene networks based on micro-array data is a very difficult endeavor It is an inverse problem, meaning that there is usually more than one solution that fits the data Pioner groups used either petri nets (e.g. Somogyi, Finland) or mathematical model (Okamoto, Japan)

Group of proteins involved in response to the stimulus Predicting protein functional interactions using mass spec data cells Stimulum Purify proteins Identify Proteins and compare Protein profiles/levels in the different populations Proteins present as a result of stimulus Proteins absent as a result of stimulus Proteins Present in both conditions

Protein network reconstruction Reconstruction of protein networks based on mass spec proteomics data is still very immature. To my knowledge no paradigmatic, large scale example of it has yet been done

Regulation of gene expression Predicting which TF regulate gene expression is an important part of reconstructing biological circuits of interest Omics data and bioinformatics can also be used to do this

Predicting regulatory modules with CHIP-ChIp experiments cells Crosslink Protein/DNA Break DNA Reverse cross link & Purify DNA Pieces Afinity Purification of Transcription factor Reverse cross link & Purify DNA Pieces bound to TF Compare in Microarray Derive consensus sequences for TF binding sites Scan new genomes for TF regulatory modules

Predicting protein activity modulation with NMR/IR/MS Metabolomics cells Stimulus Measuring Metabolites cells Measuring Metabolites Compare changes in metabolic levels to infer changes in protein activity

Incorporating metabolomics information These changes can be incorporated into mathematical models and these models can then be used predictively

Methods for network reconstruction Mapping Gene onto known pathways Using meta text analysis Using phylogenetic profiling Using omics data Using protein interaction data –Large scale protein interaction data sets are available –If proteins physically interact, it is likely that they work together in the same network

Predicting protein networks using protein interaction data Database of protein interactions Server/ Program Your Sequence (A) A BC D E F Continue until you are satisfied or completed the network

Outline Methods for reconstruction of functional protein networks Methods for reconstruction of physical protein interactions

How do proteins work within the network? Assume we now have the network our protein is involved in. How do we further analyze the role of the protein?

Proteins work by binding Effect DNA Proteins work by binding!

So what? So, if we can predict how proteins DOCK to their ligands, then we will be able to understand how the binding allows them to work systemically Design drugs to overcome mutations in binding sites Design proteins to prevent/enhance other interactions

What is in silico protein docking? Given two molecules find their correct association using a computer: + = Receptor Ligand T Complex

What types of in silico docking exist? Sequence Based Docking:

In silico two hybrid docking E. coli S. typhi … Y. pestis AGGMEYW…. AA – CDWY… … AGG –DYW Protein A E. coli S. typhi … Y. pestis VCHPRIIE…. VCH -KIIE… … VCH –KIIE… Protein B V C H P K I I E… AGG…D…AGG…D… D/K or E/R may be involved in a salt bridge Pearson Correlation

What types of in silico docking exist? Sequence Based Docking In silico structural protein docking

Structure based docking Protein-Protein docking –Rigid (usually) Protein-Ligand docking –Rigid protein, flexible ligand Very demanding on computational resources

Structural docking in a nutshell Scan molecular surfaces of protein for best surface fit –First steric, then energetics –Can (and should) include biologically relevant information (e.g. residue X is known from mutation experiments to be involved in the docking → discard any docking not involving this residue)

Atom based docking First, a surface representation is needed Van der Waals Surface Accessible (Connolly) Surface Solvent accessible Surface

Calculating the best docking Scan molecular surfaces of protein for best surface fit –Calculate the position where a largest number of atoms fits together, factor in energy + biology and rank solutions according to that

Grid-based techniques Grid-based Techniques –Alternative to calculating protein atom / ligand atom interactions. more efficient (number of grid points < number of atoms)

Grid based docking Score 1 Score 2 Score 3 Score 4 Place grid over protein Calculate inter- molecular forces for each grid point

The docking function There are many and none is the best for all cases Scores will depend on the exact docking function you use

A docking function for surface matching Molecules a, b placed on l × m × n grid Match surfaces Fourier transform makes calculation faster Tabulate and rank all possible conformations

A docking function for electrostatics There are many they use different force field approximations to calculate energy of electrostatic interactions. The basics: Charge distributions for proteins Potential for proteins

The full docking function Calculates a relative binding energy that integrates electrostatic and shape matching factors. For example:

Overall process of docking

Mol 1Mol 2 Rigid Body energy calculation List of Complexes Re-rank using statistics of residue contact, H/bond, biological information, etc Re-rank using rotamers, flexibility in protein backbone angles, Molecular dynamics, etc. Final list of solutions

Summary Methods for reconstruction of functional protein networks –Bibliomics –Genomics –Phenomics, etc Methods for reconstruction of protein interactions –Sequence based –Structure based

The overall picture

Grid-based techniques Grid-based Techniques –Notes: Grids spaced <1 Å –Results show very little change in error for grids spacing between.25 and 1 Å

Problem Importance Computer aided drug design – a new drug should fit the active site of a specific receptor. Many reactions in the cell occur through interactions between the molecules. No efficient techniques for crystallizing large complexes and finding their structure.