The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian.

Slides:

Advertisements

Similar presentations

STRING Prediction of protein networks through integration of diverse large-scale data sets Lars Juhl Jensen EMBL Heidelberg.

Advertisements

MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.

Pathways analysis Iowa State Workshop 11 June 2009.

Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for.

Two stories 1) reconstruction the evolution of a complex 2) Adding qualitative labels to predicted interactions Paulien Smits & Thijs Ettema Department.

5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.

The IntAct Database Sandra Orchard & Birgit Meldal.

5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.

Pathways & Networks analysis COST Functional Modeling Workshop April, Helsinki.

The STRING database Michael Kuhn EMBL Heidelberg.

Gene Ontology John Pinney

STRING Modeling of biological systems through cross-species data integration.

August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.

Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)

Mining text and data on chemicals Lars Juhl Jensen.

ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.

Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.

DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.

Protein-protein interactions Chapter 12. Stable complex Transient Interaction Transient Signaling Complex Rap1A – cRaf1 Interface 1310 Å 2 Stable complex:

Ch10. Intermolecular Interactions and Biological Pathways

Cytoscape A powerful bioinformatic tool Mathieu Michaud

Abstract Visualization of protein structural data is an important aspect of protein research. Incorporation of genomic annotations into a protein structural.

Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose

Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.

Overview  Introduction  Biological network data  Text mining  Gene Ontology  Expression data basics  Expression, text mining, and GO  Modules and.

EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.

Overview. What is Annotation? Annotation is the process of determining the location and function of all identifiable genes in a genome. Annotation is.

Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.

Copyright OpenHelix. No use or reproduction without express written consent1.

I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –

EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.

Copyright OpenHelix. No use or reproduction without express written consent1.

GeWorkbench Highlights caBIG ® Molecular Analysis Tools Knowledge Center AACR Annual Meeting, April 3, 2011.

Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.

MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,

Protein and RNA Families

Anis Karimpour-Fard ‡, Ryan T. Gill †,

Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.

Copyright OpenHelix. No use or reproduction without express written consent1.

Sequencing the World of Possibilities for Energy & Environment MGM workshop. 19 Oct 2010 Information Sources for Genomics Konstantinos Mavrommatis Genome.

PPI team Progress Report PPI team, IDB Lab. Sangwon Yoo, Hoyoung Jeong, Taewhi Lee Mar 2006.

BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,

GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.

An overview of Bioinformatics. Cell and Central Dogma.

I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.

Copyright OpenHelix. No use or reproduction without express written consent1.

A collaborative tool for sequence annotation. Contact:

Bioinformatics and Computational Biology

EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.

Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.

A database of biological pathways and processes (borrowed from a presentation created by Steve Jupe)

GO based data analysis Iowa State Workshop 11 June 2009.

GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.

Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.

Copyright OpenHelix. No use or reproduction without express written consent1.

David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.

InterPro Sandra Orchard.

 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?

Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions Corin Yeats

Lab Interactions and Ontologies LAB CBW Bioinformatics Workshop February 23 th 2006, Toronto Christopher Hogue Blueprint Initiative.

Computational Biology Signaling networks and drug repositioning Lars Juhl Jensen.

Networks and Interactions

Biological Databases By: Komal Arora.

Interactions and Ontologies

Protein association networks with STRING

STRING Large-scale data and text mining

STRING Protein networks from data and text mining

INFORMATION FLOW AARTHI & NEHA.

Network biology An introduction to STRING and Cytoscape

SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.

Presentation transcript:

The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian von Mering, University of Zurich & SIB bigDATA Workshop

- viewers for all types of evidence - focus on useability and speed - integrated scoring scheme - information transfer between species Genomic Neighborhood Genes/Species Co-occurence Gene Fusions Database Imports Exp. Interaction Data Co-expression Literature co-occurence STRING

630 organisms 2.6 Mio proteins 88 Mio interactions server-footprint: 320 Gb Numbers:

networks Phylogenetic Profiles Conserved Neighborhood Gene-Fusions quantify … integrate … Interaction prediction from genome information “genomic context”

Other Interaction Sources Interaction DatabasesPathway Databases Reactome Automated TextminingInterolog Transfer

final interaction score: protein A – protein B between 0 and 1, pseudoprobability, “likelihood of functional association” 1 – (1 – nscore) * (1 – fscore) * (1 – pscore) * (1 – cscore) * (1 – escore) * (1 – tscore) neighborhoodfusioncooccurencecoexpression experimental textmining nscore = 1 – (1 – nscore query species ) * (1 – nscore transf. ) evidence transfer between species information transfer between species either via orthologs (COG database) or via homology analog for cscore, escore, tscore,... benchmarking raw score KEGG performance (fraction on same map) raw score Example - Neighborhood raw score: each predictor has its own raw-score regime gene Agene B 100 bp6 bp20 bp raw score: sum of intergenic distances The scoring system

The raw score regimes gene Agene B 100 bp6 bp20 bp raw score: sum of intergenic distances Neighborhood Phylogenetic profiles “similarity profiles” singular value decomposition raw score: euklidian distance filter: downweigh scores for homologous pairs raw score: constant (0.99) Fusionexperimental interactions two-hydrid, TAP, annotated complexes, … topology-based analysis: who with whom, how many other partners? raw score: various (usually ‘uniqueness’ of interaction). Co-expression download all microarray datasets for a given species data normalization (spatial correction) raw score: pairwise pearson-correlation coefficient Textmining download all PubMed abstracts identify proteins in the abstracts search for co-mentioned pairs raw score: log-odds score

User-Experience: Aiming to be Visual and Intuitive

1’000 visits / day 800 users / day 9’000 pageviews / day > 10’000 DB-queries / day

Citations 2000 NAR Snel et al NAR von Mering et al NAR von Mering et al NAR von Mering et al NAR Jensen et al. 80 citations 215 citations 183 citations 189 citations 47 citations total: 714 citations

Cross-links SMART: protein domain information GENECARDS: info and products on human genes SWISS-MODEL-REPOSITORY: homology models CYTOSCAPE: access via plug-in architecture SWISSPROT / UNIPROT: expert protein annotation

Cross-link example launch SwissModel

Reciprocal View popup: launch STRING

Example #1 A missing chaperone for Cytochrome C oxidase Question: who inserts the Copper-atom into CcO ?

Initial observation: Example #1 The missing chaperone for Cytochrome C oxidase

Example #1 The missing chaperone for Cytochrome C oxidase gene expressed structure solved it binds copper ! likely function - copper delivery

Example #2 Simplify discovery in genome-wide association screens ? Christian von Mering – UZH MolBio – SIB

a)download data in relational database scheme d)cross-link to server (version controlled, to network, protein, link,...) In-House Use of STRING b)download data as compact flat-files e)PSI-MI export f)[ SOAP / webservices ] c)in-house installation of webserver

Core organisms: include all model organisms (annotated knowledge) non-redundant, each genus is covered include organisms with functional genomics data Irrelevant Organisms [future category] Version 9.0 – exceeding 1000 genomes

More details & new features

“Payload Display” - Your Own STRING Server => “branding” STRING  via remote-control:  a call-back API => “branding” STRING  via remote-control:  a call-back API

Acknowledgements The STRING team: Samuel Chaffron Manuel Weiss Michael Kuhn Lars Juhl Jensen Sean Hooper Berend Snel Martijn Huynen Peer Bork The STRING institutions: SIB – Swiss Institute of Bioinformatics University of Zurich TU-Dresden, University of Copenhagen European Molecular Biology Laboratory

“MySTRING”  users can register / login  using OpenID or similar for authentication  persistency of search results (“history”)  store lists / items of interest (“bag of genes”)  users can customize the interface  generate revenue (?)

Feature #2 (Finding Relevant Texts)

Example #2 The missing enzymes for uric acid degradation Question: why can’t humans degrade uric acid ?

Example #2 The missing enzymes for uric acid degradation ? ?

Example #2 The missing enzymes for uric acid degradation initial observation:

Example #2 The missing enzymes for uric acid degradation genes cloned, expressed enzymatic activity demonstrated candidate short-term therapeutics !