Center for Biologisk Sekvensanalyse Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Genome Annotation: A Protein-centric Perspective.
Bioinformatics Ayesha M. Khan Spring 2013.
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for.
On line (DNA and amino acid) Sequence Information Lecture 7.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
1 Introduction to Bioinformatics Fall Administration  Adi Doron  Nimrod Rubinstein  Dudu Burstein.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Protein Databases EBI – European Bioinformatics Institute
The Cell, Central Dogma and Human Genome Project.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
Center for Biologisk Sekvensanalyse Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
UniProt - The Universal Protein Resource
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
An Introduction to Bioinformatics Molecular Biology Databases.
On line (DNA and amino acid) Sequence Information
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
1 Review of Biological Database Utilization. 2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods.
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
Organizing information in the post-genomic era The rise of bioinformatics.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
EB3233 Bioinformatics Introduction to Bioinformatics.
Computer Storage of Sequences
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Center for Biologisk Sekvensanalyse Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark
Introduction to Genes and Genomes with Ensembl
Introduction to Bioinformatics
Protein databases Henrik Nielsen
Archives and Information Retrieval
생물정보학 Bioinformatics.
Mangaldai College, Mangaldai
Access to Sequence Data and Related Information
Introduction to Bioinformatics
Lesson 3 Bioinformatics Laboratory
Introduction to Databases
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Center for Biologisk Sekvensanalyse Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark ”Resources of Biomolecular Data: Sequences, Structures and Functionality” PhD course #27803

Center for Biologisk Sekvensanalyse Outline Magnitudes and Scales Resources: Data Sources & Tools Primary DNA sources Sequence Repositories Structure Repositories Functional Categorization Integration of Databases The Human Genome Genome Browsers Prediction Tools Evaluation of Prediction Servers Starting points Link collections

Center for Biologisk Sekvensanalyse Learning Objectives The student should be able to: Describe differences between sequence repositories and curated databases Describe the challenges of maintaining genome-wide biological databases List two entry points for getting an overview of ”my gene of interest” Describe how prediction servers may be evaluated

Center for Biologisk Sekvensanalyse Resources: Sources & Tools There is A LOT OF biomolecular databases/sources A LOT OF overlap of information/redundancy A LOT OF TOOLS Personal picks/preferences User-friendliness Update intervals Curation efforts / error correction Linkage to other DBs

Center for Biologisk Sekvensanalyse Faster than Moore’s law...

Center for Biologisk Sekvensanalyse Faster than Moore’s law...

Center for Biologisk Sekvensanalyse Human Genome Published HUGO: Nature, 15.feb.2001 Celera: Science, 16.feb.2001

Center for Biologisk Sekvensanalyse Magnitudes and Scales Human genome 3,200,000,000 bp Single basepair  full genome is 9 orders of magnitude Genome = Football field: ~3 billion leaves of grass Single base A T G C (or SNP) = 1 leaf of grass Genome browsing Zooming from whole stadium to single leaf

Center for Biologisk Sekvensanalyse How we got the sequence Sanger chain termination method

Center for Biologisk Sekvensanalyse Primary DNA sources Trace files repositories Single read: bp (~golf ball size / jig saw puzzle) Variable quality WashU-Merck Human EST Project / Trace files ”Base-calling” non-trivial G, C or nothing?

Center for Biologisk Sekvensanalyse Assembly is Non-trivial!

Center for Biologisk Sekvensanalyse Sequence repositories - GenBank et al. GenBank / EMBL / DDBJ Highly redundant (many versions of same gene) Cross-updated daily Version history is recorded Previous sequence records can be retrieved Contigs/HTGS ( kb) finishing at different stages Draft  Finished Includes genomic DNA, cDNA, ESTs, translated peptides

Center for Biologisk Sekvensanalyse Non-redundant and Curated databases Non-redundant Manual or automatic curation DNA RefSeq (NCBI; semi-automated) Ensembl gene index (automated) Protein RefSeq (NCBI; semi-automated) TrEMBL (EMBL; automated)

Center for Biologisk Sekvensanalyse Curated database: UniProt/SwissProt SIB - Swiss Institute of Bioinformatics Protein Knowledgebase / Sequence Database Highly curated Experimental evidence evaluated (e.g. modifications) All 80,000 entries checked by Amos Bairoch himself ;-) ExPASy - Expert Protein Analysis System Proteomics tools: links + local servers

Center for Biologisk Sekvensanalyse Structure databases / Protein Data Bank (PDB) X-ray, NMR biomolecular structures Protein Data Bank (PDB)

Center for Biologisk Sekvensanalyse Structure databases / Protein Data Bank (PDB)

Center for Biologisk Sekvensanalyse Functional Categorization Gene Ontology (GO) Hierarchical Controlled vocabulary

Center for Biologisk Sekvensanalyse Functional Categorization Gene Ontology (GO) Molecular Function - the tasks performed by individual gene products; examples are transcription factor and DNA helicase Biological Process - broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions Cellular Component - subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and origin recognition complex

Center for Biologisk Sekvensanalyse Integration of databases - Webs of web- sites Links, links, links... SRS = Sequence Retrieval System Powerful, complex query language BioDAS – Distributed Annotation System

Center for Biologisk Sekvensanalyse For ’my gene’, how do I: Get an overview of the sequence information known? (GeneCards+OMIM) Examine the ’Genome Neighbourhood’? (Genome Browsers) Predict protein post-translational modifications (PTMs)? (Prediction servers) (Evaluate the value of predicted features)

Center for Biologisk Sekvensanalyse GeneCards

Center for Biologisk Sekvensanalyse GeneCards-II

Center for Biologisk Sekvensanalyse GeneCards-III

Center for Biologisk Sekvensanalyse GeneCards-IV

Center for Biologisk Sekvensanalyse GeneCards-V

Center for Biologisk Sekvensanalyse Genetic/Medical Information OMIM, Online Mendelian Inheritance in Man (NCBI) The OMIM database is a catalog of human genes and genetic disorders >16,000 entries (April, 2006) Examples: cystic fibrosis, prions, amyloid precursor protein Condensed, highly curated descriptions of genetics/disease/animal models/references

Center for Biologisk Sekvensanalyse OMIM-I (

Center for Biologisk Sekvensanalyse OMIM-II

Center for Biologisk Sekvensanalyse OMIM-III

Center for Biologisk Sekvensanalyse For ’my gene’, how do I: Get an overview of the sequence information known? (GeneCards+OMIM) Examine the ’Genome Neighbourhood’? (Genome Browsers) Predict protein post-translational modifications (PTMs)? (Prediction servers) (Evaluate the value of predicted features)

Center for Biologisk Sekvensanalyse Genome Browsing Three public Open access Use same genome build/assembly NCBI (U.S.) UCSC (Santa Cruz, U.S.) EnsEmbl (EBI, EU) (One private) (Restricted, commercial; closed 2005)

Center for Biologisk Sekvensanalyse Celera Discovery System & Database

Center for Biologisk Sekvensanalyse Genome Browsers - Portals to the Genomic World UCSC – Univ. California – Santa Cruz (U.S.) NCBI – National Center for Biotechnology Information (U.S.) html html EnsEmbl – European Molecular Biology Laboratory (E.U.)

Center for Biologisk Sekvensanalyse UCSC – Genome Browser

Center for Biologisk Sekvensanalyse UCSC – Genome Browser II

Center for Biologisk Sekvensanalyse NCBI

Center for Biologisk Sekvensanalyse NCBI

Center for Biologisk Sekvensanalyse

EnsEmbl – Genome Browser

Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser

Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser

Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser

Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser

Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser

Center for Biologisk Sekvensanalyse For ’my gene’, how do I: Get an overview of the sequence information known? (GeneCards) Examine the ’Genome Neighbourhood’? (Genome Browsers) Predict protein post-translational modifications (PTMs) or Gene Structure? (Prediction servers)...and evaluate the reliability of prediction methods

Center for Biologisk Sekvensanalyse CBS Services/Toolbox

Center for Biologisk Sekvensanalyse

NetPhos – a prediction server

Center for Biologisk Sekvensanalyse NetPhos – a prediction server

Center for Biologisk Sekvensanalyse Evaluating Prediction Servers Performance on independent/cross- validated data presented? Published in peer-reviewed journal? Cited by others? Science Citation Index Linked to from credible web sites? Google Page-rank ”link:URL” search

Center for Biologisk Sekvensanalyse Evaluating Prediction Servers

Center for Biologisk Sekvensanalyse 2can Bioinformatics Education At EBI – European Bioinformatics Institute can/index.html Tutorials, resource links, etc.

Center for Biologisk Sekvensanalyse EnsEMBL Bioinformatics Education

Center for Biologisk Sekvensanalyse Starting Points General Bioinformatics NCBI, National Center for Biotechnology Information, U.S. EBI, European Bioinformatics Institute Prediction Tools CBS, DK Expasy (Protein analysis), Switzerland

Center for Biologisk Sekvensanalyse Dynamic Resources Pros Includes most recent developments Updated regularly User interface improves(usually) Cons Difficult to keep pace Tutorials and lectures hard to recycle ;-( Difficult to use at irregular intervals

Center for Biologisk Sekvensanalyse Genome Browsers - Portals to the Genomic World Three main entry points: NCBI, UCSC, EnsEmbl Essentially contain same information High degree of linking to secondary databases Advisable to become familiar with only one genome browser Learn to navigate and make queries GeneCards and OMIM well suited for getting a quick overview of a gene of interest

Center for Biologisk Sekvensanalyse Prediction Servers Evaluate scientific ’soundness’ Look for indications of quality (citations, etc.) Remember that prediction servers provide...well, predictions!

Center for Biologisk Sekvensanalyse Learning Objectives The student should be able to: Describe differences between sequence repositories and curated databases Describe the challenges of maintaining genome-wide biological databases List two entry points for getting an overview of ”my gene of interest” Describe how prediction servers may be evaluated

Center for Biologisk Sekvensanalyse Immediate Feedback Title: ”Resources of Biomolecular Data: Sequences, Structures and Functionality” Did the lecture live up to your expectations? Did you expect to learn about resources that were not covered during this lecture? NB! You can also provide input at the general course evaluation

Center for Biologisk Sekvensanalyse The End 25,000?