Protein association networks with STRING

Slides:



Advertisements
Similar presentations
STRING Prediction of protein networks through integration of diverse large-scale data sets Lars Juhl Jensen EMBL Heidelberg.
Advertisements

Two stories 1) reconstruction the evolution of a complex 2) Adding qualitative labels to predicted interactions Paulien Smits & Thijs Ettema Department.
The STRING database Michael Kuhn EMBL Heidelberg.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Yeast - why it simply has a lot to say about human disease.
STRING Modeling of biological systems through cross-species data integration.
Components of a Cell (Eukaryotes) Picture from on-line biology book,on-line biology book,
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Protein domains vs. structure domains - an example.
1 Protein-Protein Interaction Networks MSC Seminar in Computational Biology
Literature Mining and Systems Biology Lars Juhl Jensen EMBL.
Mining text and data on chemicals Lars Juhl Jensen.
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. (1999). Detecting protein function and protein-protein interactions from genome sequences.
The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Modeling Functional Genomics Datasets CVM Lessons 4&5 10 July 2007Bindu Nanduri.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Moving beyond free text. Authors Scientist does research Scientist publishes research results in journal article Old Paradigm:
Protein-protein interactions Chapter 12. Stable complex Transient Interaction Transient Signaling Complex Rap1A – cRaf1 Interface 1310 Å 2 Stable complex:
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Biological Pathways & Networks
Interactions and more interactions
Networks and Interactions Boo Virk v1.0.
Flexible Text Mining using Interactive Information Extraction David Milward
Creating Metabolic Network Models using Text Mining and Expert Knowledge J.A. Dickerson, D. Berleant, Z. Cox, W. Qi, and E. Wurtele Iowa State University.
Finish up array applications Move on to proteomics Protein microarrays.
Lars Juhl Jensen Biomedical text mining. exponential growth.
Improve your R&D Effectiveness and Manage Your Intellectual Property Assets with Luxid ® for Life Sciences.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
Workshop Aims NMSU GO Workshop 20 May Aims of this Workshop  WIIFM? modeling examples background information about GO modeling  Strategies for.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Copyright OpenHelix. No use or reproduction without express written consent1.
PPI team Progress Report PPI team, IDB Lab. Sangwon Yoo, Hoyoung Jeong, Taewhi Lee Mar 2006.
A collaborative tool for sequence annotation. Contact:
A database of biological pathways and processes (borrowed from a presentation created by Steve Jupe)
The Unreasonable Effectiveness of Data
Michael Schroeder BioTechnological Center TU Dresden Biotec Semantic Search: Algorithms and Applications.
Open access – making the most of biomedical literature mining Lars Juhl Jensen EMBL Heidelberg.
Computational Biology Signaling networks and drug repositioning Lars Juhl Jensen.
The Integrated Microbial Genome (IMG) systems
Networks and Interactions
Biological Databases By: Komal Arora.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Genomic Data Integration
STRING Large-scale data and text mining
Mental Functioning and the Gene Ontology
Interrogation of cross talk between proteins and gene regulatory networks in breast cancer Chambers, Teressa Lee Hiren Karathia Sridhar Hannenhalli.
Workshop Aims TAMU GO Workshop 17 May 2010.
STRING Protein networks from data and text mining
Large Scale Data Integration
Genomic Data Manipulation
Annotation: linking literature to gene products
Complex Sentence Processor
Schedule for the Afternoon
Annotation Presentation
Batyr Charyyev.
A Database of human biological pathways
Volume 58, Issue 4, Pages (May 2015)
Network biology An introduction to STRING and Cytoscape
Pathway Visualization
BIOBASE Training TRANSFAC® ExPlain™
Artificial Intelligence 2004 Speech & Natural Language Processing
Construction of a Rice Glycosyltransferase Phylogenomic Database and Identification of Rice-Diverged Glycosyltransferases  Cao Pei-Jian , Bartley Laura.
Presentation transcript:

Protein association networks with STRING Lars Juhl Jensen

interaction networks

association networks

guilt by association

protein networks

STRING

9.6 million proteins

common foundation

Exercise 1 Go to http://string-db.org/ Query for human insulin receptor (INSR) using the search by name functionality Make sure you are in evidence view (check the buttons below the network) Why are there multiple lines connecting the same to two proteins?

curated knowledge

(what we know)

protein complexes

3D structures

pathways

metabolic pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

signaling pathways

very incomplete

experimental data

(what we measured)

physical interactions

Jensen & Bork, Science, 2008

genetic interactions

Beyer et al., Nature Reviews Genetics, 2007

gene coexpression

microarrays

RNAseq

Exercise 2 (Continue from where exercise 1 ended) Which types of evidence support the interaction between INSR and IRS1? Click on the interaction to view the popup, which has buttons linking to full details Which types of experimental assays support the INSR–IRS1 interaction?

predictions

(what we infer)

genomic context

evolution

gene fusion

Korbel et al., Nature Biotechnology, 2004

gene neighborhood

Korbel et al., Nature Biotechnology, 2004

phylogenetic profiles

Korbel et al., Nature Biotechnology, 2004

a real example

Cell Cellulosomes Cellulose

complications

many databases

different formats

different identifiers

variable quality

not comparable

not same species

hard work

parsers

mapping files

quality scores

affinity purification

von Mering et al., Nucleic Acids Research, 2005

phylogenetic profiles

score calibration

gold standard

von Mering et al., Nucleic Acids Research, 2005

implicit weighting by quality

common scale

homology-based transfer

orthologous groups

Franceschini et al., Nucleic Acids Research, 2013

missing most of the data

Exercise 3 (Continue from where exercise 2 ended) Change the network to the confidence view Change the confidence cutoff to 0.9; any changes in proteins or interactions shown? Turn off all but experiments; what changes? Increase the number of interactors shown to 50; how many proteins do you get? Why?

text mining

>10 km

too much to read

exponential growth

~40 seconds per paper

named entity recognition

comprehensive lexicon

cyclin dependent kinase 1

CDC2

orthographic variation

prefixes and suffixes

CDC2

hCdc2

“black list”

SDS

information extraction

co-mentioning

counting

within documents

within paragraphs

within sentences

scoring scheme

score calibration

Natural Language Processing NLP Natural Language Processing

part-of-speech tagging

what you learned in school pronoun pronoun verb preposition noun

semantic tagging

grammatical analysis

Cue words for entity recognition Verbs for relation extraction Gene and protein names Cue words for entity recognition Verbs for relation extraction [nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]] is controlled by [nxpg HAP1] Saric et al., Proceedings of ACL, 2004

type and direction

complex sentences

anaphoric references

it

related resources

association networks

heterogeneous data

common identifiers

quality scores

protein networks

string-db.org Szklarczyk et al., Nucleic Acids Research, 2015

STITCH

chemical networks

stitch-db.org Kuhn et al., Nucleic Acids Research, 2014

COMPARTMENTS

subcellular localization

compartments.jensenlab.org Binder et al., Database, 2014

TISSUES

tissue expression

tissues.jensenlab.org Santos et al., PeerJ, 2015

DISEASES

disease associations

diseases.jensenlab.org Frankild et al., Methods, 2015

Exercise 4 Open http://tissues.jensenlab.org Look up tissue associations for insulin (INS) Open http://diseases.jensenlab.org Search for insulin receptor (INSR) What is the strongest associated disease? Inspect the underlying text-mining evidence