Modeling Functional Genomics Datasets CVM Lesson 6 11 July 2007Bindu Nanduri
Lesson 6: Functional genomics modeling II: a pathway analysis example.
Introduction to protein interaction networks
Cancer Proliferation Differentiation Quiescence Programmed Cell Death Cell Differentiation
Proliferation Differentiation Quiescence Programmed Cell Death Anergy Activation CD4 + T ‘helper” Lymphocyte Lymphoma
Agbase protein annotation process Protein identifiers or Fasta format GORetriever Annotated Proteins GOanna Proteins with no annotations GOSlimViewer
44% 67% Proliferation Angiogenesis Apoptosis Migration Quiescence Differentiation Anergy Activation Senescence Cell Cycle 100% 20% 80% 69% 31% 56% 79%21% 92%8% 92% 8%32% 68% 33% Potential CD4+ T lymphocyte Biological Processes
AP-1 dependent gene expression Metastasis Tumor invasion AP-1 Integrin Signaling Pathway
Hypothesis driven data analysis Exploration of data to identify pathways of interacting proteins Protein protein interaction networks (PPI)
Why study PPIs Proteins do not function alone!!!!! PPI are inherent to the function of multiprotein complexes PPIs can help infer function : where functional information is available for one partner Changes in normal PPI can result in disease disease
Types of PPI
PPI categories based on composition, affinity and timescale of interaction Homo and hetero oligomeric complexes: interactions between identical or non-identical chains Obligate PPI: protomers do not exist in as stable structures in vivo these are functionally obligate Non-obligate PPI: protomers can exist as stable structures, may co-localize for function /are co-localized c Arc repressor dimer necessary for DNA binding Non-obligate homo dimer Sperm lysin
PPI based on the life time of the complex: transient or permanent Permanaent interactions are stable and exist only as complex Transient interactions are marked by association/dissociation cycles in vivo Weak interactions (sperm lysin) associate and dissociate Strong transient interactions require a molecular trigger heterotrimeric G protein dissociates to G-alpha andg-beta and g-gamma when it binds to GTP, GDP-bound form is a trimer
Control of protein oligomerization PPI interactions are a continuum of obligate and non-obligate states Interactions of complexes driven by concentration and free energy of complex relative to alternate states
Take home message of PPI types PPI interactions are a continuum of obligate and non-obligate states Interactions of complexes driven by concentration and free energy of complex relative to alternate states
How to identify PPI Experimental Computational Gene Coexpression TAP assays Sequence coevolution Yeast two hybrid Phylogenetic profile Gene Cluster Rosetta stone method Text mining TAP assays Yeast two hybrid (Y2H) Protein arrays
PLoS Computational Biology March 2007, Volume 3 e42 Y2H Assay Eukaryotic transcription factors have DNA binding and activation domain Physical association of these domains activates transcription Cretae chimeric proteins with either BD or AD tranfect yeast Gal4/LexA based reporters In vivo method that can detect transient PPI
TAP Assay TAP tag consists of two IgG binding domains of Staphylococcus protein A and calmodulin binding peptide seperated by tobacco etch virus protease cleavage site TAP provides direct information on protein complexes O. Puig et al,Methods, 2001
PLoS Computational Biology March 2007, Volume 3 e42 Gene Coexpression Expression profile similarity correlation coefficient between relative expression levels of two genes/proteins the normalized difference between their absolute expression levels The distribution for target proteins is compared with the distributions for random noninteracting protein pairs Expression levels of physically interacting proteins coevolve coevolution of gene expression is a better predictor of protein interactions than coevolution of amino acid sequences Good for studying permanent complexes : ribosome, proteasome
PLoS Computational Biology March 2007, Volume 3 e42 Protein microarrays/chips Protein chips are disposable arrays of microwells in silicone elastomer sheets placed on top of microscope slides Target proteins are over expressed immobilized and probed with fluorescently labeled proteins H Zhu et al (2000) “Analysis of yeast protein kinases using protein chips” Nature Genetics 26: can detect PPI between actual proteins
PLoS Computational Biology March 2007, Volume 3 e42 Database/URL/FTPType DIP BIND E,C,Shttp://bind.ca MPact/MIPS E,C,Fhttp://mips.gsf.de/services/ppi STRING MINT IntAct BioGRID HPRD ProtCom 3did, Interprets Pibase, Modbase CBM ftp://ftp.ncbi.nlm.nih.gov/pub/cbmSftp://ftp.ncbi.nlm.nih.gov/pub/cbm SCOPPI iPfam InterDom DIMA Prolinks Predictomehttp://predictome.bu.edu/F
PLoS Computational Biology March 2007, Volume 3 e42 Database/URL/FTPType DIP BIND E,C,Shttp://bind.ca MPact/MIPS E,C,Fhttp://mips.gsf.de/services/ppi STRING Type of data (high-throughput experimental data (E), structural data (S), manual curation(C), functional predictions (F), and interface homology modeling (H) Unit of interaction :P is protein IntAct BioGRID HPRD ProtCom 3did, Interprets Pibase, Modbase CBM ftp://ftp.ncbi.nlm.nih.gov/pub/cbmSftp://ftp.ncbi.nlm.nih.gov/pub/cbm
PPI database comparisons Proteins: Structure, Function and Bioinformatics 63:
Experimental PPI dataset overlap is small High FP rate in high- throughput exp …….difficult to confirm by multiple sources
How to identify PPI Experimental Computational Gene Coexpression TAP assays Sequence coevolution Yeast two hybrid Phylogenetic profile Gene Cluster/neighborhood Rosetta stone method Text mining TAP assays Yeast two hybrid (Y2H) Protein arrays
PLoS Computational Biology March 2007, Volume 3 e43 Phylogenetic profile (PP) Hypothesis: functionally linked and potentially interacting nonhomologous proteins co-evolve and have orthologs in the same subset of fully sequenced organisms
PLoS Computational Biology March 2007, Volume 3 e43 Gene Cluster, Gene Neighborhood Genes in the gene cluster/operon are co-regulated and participate in the same biological function
PLoS Computational Biology March 2007, Volume 3 e43 Sequence Co-evolution interacting proteins very often co-evolve changes in one protein ( loss of function or Interaction) compensated by the correlated changes in another protein. The orthologs of co-evolving proteins tend to interact, thereby making it possible to infer unknowninteractions in other genomes co-evolution can be reflected in terms of the similarity between phylogenetic trees of two non-homologous interacting protein families
PLoS Computational Biology March 2007, Volume 3 e43 Rosetta Stone method interacting proteins/domains have homologs in other genomes fused into one protein chain, a Rosetta Stone protein Gene fusion occurs to optimize co-expression of genes encoding for interacting proteins.
Text Mining Utilizing the wealth of publicly available data..search Medline or PubMed for words or word combinations co-occurrence of words together is a simple metric, however prone to high false positive rates Natural Language Processing (NLP) methods are specific “A binds to B”; “A interacts with B”; “A associates with B” difficult to detect so it has a higher false negative rate Normally requires a list of known gene names or protein names for a given organism
GO ToolBox Genome Biol. 2004;5(12):R101.
ProtQuant tool