Presentation is loading. Please wait.

Presentation is loading. Please wait.

Center for Biologisk Sekvensanalyse Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark

Similar presentations


Presentation on theme: "Center for Biologisk Sekvensanalyse Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark"— Presentation transcript:

1 Center for Biologisk Sekvensanalyse Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark nikob@cbs.dtu.dk ”Resources of Biomolecular Data: Sequences, Structures and Functionality” PhD course #27803

2 Center for Biologisk Sekvensanalyse Outline Magnitudes and Scales Resources: Data Sources & Tools Primary DNA sources Sequence Repositories Structure Repositories Functional Categorization Integration of Databases The Human Genome Genome Browsers Prediction Tools Evaluation of Prediction Servers Starting points Link collections

3 Center for Biologisk Sekvensanalyse Learning Objectives The student should be able to: Describe differences between sequence repositories and curated databases Describe the challenges of maintaining genome-wide biological databases List two entry points for getting an overview of ”my gene of interest” Describe how prediction servers may be evaluated

4 Center for Biologisk Sekvensanalyse Resources: Sources & Tools There is A LOT OF biomolecular databases/sources A LOT OF overlap of information/redundancy A LOT OF TOOLS Personal picks/preferences User-friendliness Update intervals Curation efforts / error correction Linkage to other DBs

5 Center for Biologisk Sekvensanalyse Faster than Moore’s law...

6 Center for Biologisk Sekvensanalyse Faster than Moore’s law...

7 Center for Biologisk Sekvensanalyse Human Genome Published HUGO: Nature, 15.feb.2001 Celera: Science, 16.feb.2001

8 Center for Biologisk Sekvensanalyse Magnitudes and Scales Human genome 3,200,000,000 bp Single basepair  full genome is 9 orders of magnitude Genome = Football field: ~3 billion leaves of grass Single base A T G C (or SNP) = 1 leaf of grass Genome browsing Zooming from whole stadium to single leaf

9 Center for Biologisk Sekvensanalyse How we got the sequence Sanger chain termination method

10 Center for Biologisk Sekvensanalyse Primary DNA sources Trace files repositories Single read: 500-1000 bp (~golf ball size / jig saw puzzle) Variable quality WashU-Merck Human EST Project / Trace files ”Base-calling” non-trivial G, C or nothing?

11 Center for Biologisk Sekvensanalyse Assembly is Non-trivial!

12 Center for Biologisk Sekvensanalyse Sequence repositories - GenBank et al. GenBank / EMBL / DDBJ Highly redundant (many versions of same gene) Cross-updated daily Version history is recorded Previous sequence records can be retrieved Contigs/HTGS (100-200 kb) finishing at different stages Draft  Finished Includes genomic DNA, cDNA, ESTs, translated peptides

13 Center for Biologisk Sekvensanalyse Non-redundant and Curated databases Non-redundant Manual or automatic curation DNA RefSeq (NCBI; semi-automated) Ensembl gene index (automated) Protein RefSeq (NCBI; semi-automated) TrEMBL (EMBL; automated)

14 Center for Biologisk Sekvensanalyse Curated database: UniProt/SwissProt SIB - Swiss Institute of Bioinformatics Protein Knowledgebase / Sequence Database Highly curated Experimental evidence evaluated (e.g. modifications) All 80,000 entries checked by Amos Bairoch himself ;-) ExPASy - Expert Protein Analysis System Proteomics tools: links + local servers

15 Center for Biologisk Sekvensanalyse Structure databases / Protein Data Bank (PDB) X-ray, NMR biomolecular structures Protein Data Bank (PDB) http://www.rcsb.org/pdb/

16 Center for Biologisk Sekvensanalyse Structure databases / Protein Data Bank (PDB)

17 Center for Biologisk Sekvensanalyse Functional Categorization Gene Ontology (GO) Hierarchical Controlled vocabulary

18 Center for Biologisk Sekvensanalyse Functional Categorization Gene Ontology (GO) http://www.geneontology.org/ http://www.geneontology.org/ Molecular Function - the tasks performed by individual gene products; examples are transcription factor and DNA helicase Biological Process - broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions Cellular Component - subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and origin recognition complex

19 Center for Biologisk Sekvensanalyse Integration of databases - Webs of web- sites Links, links, links... SRS = Sequence Retrieval System Powerful, complex query language BioDAS – Distributed Annotation System http://srs.ebi.ac.uk/

20 Center for Biologisk Sekvensanalyse For ’my gene’, how do I: Get an overview of the sequence information known? (GeneCards+OMIM) Examine the ’Genome Neighbourhood’? (Genome Browsers) Predict protein post-translational modifications (PTMs)? (Prediction servers) (Evaluate the value of predicted features)

21 Center for Biologisk Sekvensanalyse GeneCards http://nciarray.nci.nih.gov/cards/ http://nciarray.nci.nih.gov/cards/

22 Center for Biologisk Sekvensanalyse GeneCards-II

23 Center for Biologisk Sekvensanalyse GeneCards-III

24 Center for Biologisk Sekvensanalyse GeneCards-IV

25 Center for Biologisk Sekvensanalyse GeneCards-V

26 Center for Biologisk Sekvensanalyse Genetic/Medical Information OMIM, Online Mendelian Inheritance in Man (NCBI) The OMIM database is a catalog of human genes and genetic disorders >16,000 entries (April, 2006) Examples: cystic fibrosis, prions, amyloid precursor protein Condensed, highly curated descriptions of genetics/disease/animal models/references

27 Center for Biologisk Sekvensanalyse OMIM-I (http://www3.ncbi.nlm.nih.gov/Omim/)http://www3.ncbi.nlm.nih.gov/Omim/

28 Center for Biologisk Sekvensanalyse OMIM-II

29 Center for Biologisk Sekvensanalyse OMIM-III

30 Center for Biologisk Sekvensanalyse For ’my gene’, how do I: Get an overview of the sequence information known? (GeneCards+OMIM) Examine the ’Genome Neighbourhood’? (Genome Browsers) Predict protein post-translational modifications (PTMs)? (Prediction servers) (Evaluate the value of predicted features)

31 Center for Biologisk Sekvensanalyse Genome Browsing Three public Open access Use same genome build/assembly NCBI (U.S.) UCSC (Santa Cruz, U.S.) EnsEmbl (EBI, EU) (One private) (Restricted, commercial; closed 2005)

32 Center for Biologisk Sekvensanalyse Celera Discovery System & Database

33 Center for Biologisk Sekvensanalyse Genome Browsers - Portals to the Genomic World UCSC – Univ. California – Santa Cruz (U.S.) http://genome.ucsc.edu/ NCBI – National Center for Biotechnology Information (U.S.) http://www.ncbi.nlm.nih.gov/Genomes/index. html http://www.ncbi.nlm.nih.gov/Genomes/index. html EnsEmbl – European Molecular Biology Laboratory (E.U.) http://www.ensembl.org/

34 Center for Biologisk Sekvensanalyse UCSC – Genome Browser

35 Center for Biologisk Sekvensanalyse UCSC – Genome Browser II

36 Center for Biologisk Sekvensanalyse NCBI

37 Center for Biologisk Sekvensanalyse NCBI

38 Center for Biologisk Sekvensanalyse

39 EnsEmbl – Genome Browser

40 Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser

41 Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser

42 Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser

43 Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser

44 Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser

45 Center for Biologisk Sekvensanalyse For ’my gene’, how do I: Get an overview of the sequence information known? (GeneCards) Examine the ’Genome Neighbourhood’? (Genome Browsers) Predict protein post-translational modifications (PTMs) or Gene Structure? (Prediction servers)...and evaluate the reliability of prediction methods

46 Center for Biologisk Sekvensanalyse CBS Services/Toolbox http://www.cbs.dtu.dk/services/ http://www.cbs.dtu.dk/services/

47 Center for Biologisk Sekvensanalyse

48

49 NetPhos – a prediction server http://www.cbs.dtu.dk/services/NetPhos/

50 Center for Biologisk Sekvensanalyse NetPhos – a prediction server

51 Center for Biologisk Sekvensanalyse Evaluating Prediction Servers Performance on independent/cross- validated data presented? Published in peer-reviewed journal? Cited by others? Science Citation Index Linked to from credible web sites? Google Page-rank ”link:URL” search

52 Center for Biologisk Sekvensanalyse Evaluating Prediction Servers

53 Center for Biologisk Sekvensanalyse 2can Bioinformatics Education At EBI – European Bioinformatics Institute http://www.ebi.ac.uk/2 can/index.html Tutorials, resource links, etc.

54 Center for Biologisk Sekvensanalyse EnsEMBL Bioinformatics Education

55 Center for Biologisk Sekvensanalyse Starting Points General Bioinformatics NCBI, National Center for Biotechnology Information, U.S. EBI, European Bioinformatics Institute Prediction Tools CBS, DK Expasy (Protein analysis), Switzerland

56 Center for Biologisk Sekvensanalyse Dynamic Resources Pros Includes most recent developments Updated regularly User interface improves(usually) Cons Difficult to keep pace Tutorials and lectures hard to recycle ;-( Difficult to use at irregular intervals

57 Center for Biologisk Sekvensanalyse Genome Browsers - Portals to the Genomic World Three main entry points: NCBI, UCSC, EnsEmbl Essentially contain same information High degree of linking to secondary databases Advisable to become familiar with only one genome browser Learn to navigate and make queries GeneCards and OMIM well suited for getting a quick overview of a gene of interest

58 Center for Biologisk Sekvensanalyse Prediction Servers Evaluate scientific ’soundness’ Look for indications of quality (citations, etc.) Remember that prediction servers provide...well, predictions!

59 Center for Biologisk Sekvensanalyse Learning Objectives The student should be able to: Describe differences between sequence repositories and curated databases Describe the challenges of maintaining genome-wide biological databases List two entry points for getting an overview of ”my gene of interest” Describe how prediction servers may be evaluated

60 Center for Biologisk Sekvensanalyse Immediate Feedback Title: ”Resources of Biomolecular Data: Sequences, Structures and Functionality” Did the lecture live up to your expectations? Did you expect to learn about resources that were not covered during this lecture? NB! You can also provide input at the general course evaluation

61 Center for Biologisk Sekvensanalyse The End 25,000?


Download ppt "Center for Biologisk Sekvensanalyse Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark"

Similar presentations


Ads by Google