The STRING database Michael Kuhn EMBL Heidelberg.

The STRING database Michael Kuhn EMBL Heidelberg

protein interactions

example Tryptophan synthase beta chain E. Coli K12

many sources

genomic context

curated knowledge

T experimental evidence

literature

Jensen et al., Drug Discovery Today: Targets, 2004

373 genomes (only completely sequenced genomes)

1.5 million genes (not proteins)

Genome Reviews

RefSeq

Ensembl

model organism databases

data integration

genomic context methods

gene fusion

gene neighborhood

phylogenetic profiles

Cell Cellulosomes Cellulose

automatic inference of interactions

correct interactions

wrong associations

gene fusion score: sequence similarity

gene neighborhood score: sum of intergenic distances

phylogenetic profiles

SVD singular value decomposition (removes redundancy)

score: Euclidean distance

all scores are “raw scores”

not comparable sequence similarity sum of intergenic distances Euclidean distance

benchmarking calibrate against “gold standard” (KEGG)

raw scores

probabilistic scores e.g. “70% chance for an assocation”

curated knowledge

KEGG Kyoto Encyclopedia of Genes

Reactome

MIPS Munich Information center for Protein Sequences

STKE Signal Transduction Knowledge Environment

GO Gene Ontology

primary experimental data

many sources

many parsers

physical protein interactions

BIND Biomolecular Interaction Network Database

GRID General Repository for Interaction Datasets

MINT Molecular Interactions Database

DIP Database of Interacting Proteins

HPRD Human Protein Reference Database

large sets are scored separately

co-expression microarray data

GEO Gene Expression Omnibus

correlation coefficient

literature mining

different gene identifiers

synonyms list

Medline

SGD Saccharomyces Genome Database

The Interactive Fly

OMIM Online Mendelian Inheritance in Man

simple scheme

co-mentioning

more advanced

NLP Natural Language Processing

Gene and protein names Cue words for entity recognition Verbs for relation extraction [ nxgene The GAL4 gene] [ nxexpr The expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7]]] is controlled by [ nxpg HAP1]

Gene and protein names Cue words for entity recognition Verbs for relation extraction The expression of the cytochrome genes CYC1 and CYC7 is controlled by HAP1

calibrate against gold standard

combine all evidence

Bayesian scoring scheme

e.g.: two scores of 0.7 combined probability: ?

e.g.: two scores of 0.7 combined probability: 0.91 1 - (1-0.7) 2 = 0.91

evidence transfer

evidence spread over many species

transfer by orthology (or “fuzzy orthology”)

von Mering et al., Nucleic Acids Research, 2005

two modes

COG mode

higher coverage lower specificity includes all available evidence some orthologous groups are too large to be meaningful

proteins mode

maximum specificity lower coverage information will be relevant for selected species

outlook

take home message STRING integrates information and predicts interactions You can always go to the sources Proteins mode: specific species COG mode: more coverage, especially for prokaryotic genes

Acknowledgements The STRING team Lars Jensen Peer Bork Christian von Mering & group in Zurich Berend Snel Martijn Huynen

Thank you for your attention

take home message STRING integrates information and predicts interactions You can always go to the sources Proteins mode: specific species COG mode: more coverage, especially for prokaryotic genes

Exercises: tinyurl.com/36twzq (or via course wiki) Alternative server: xi.embl.de

Bork et al., Current Opinion in Structural Biology, 2004

The STRING database Michael Kuhn EMBL Heidelberg.

Similar presentations

Presentation on theme: "The STRING database Michael Kuhn EMBL Heidelberg."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The STRING database Michael Kuhn EMBL Heidelberg.

Similar presentations

Presentation on theme: "The STRING database Michael Kuhn EMBL Heidelberg."— Presentation transcript:

Similar presentations

About project

Feedback