Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biological databases: Collection, storage and maintenance

Similar presentations


Presentation on theme: "Biological databases: Collection, storage and maintenance"— Presentation transcript:

1 Biological databases: Collection, storage and maintenance
Biological Database as a collection of data that is structured, searchable, updated periodically, and cross-referenced

2 Biological databases: Collection, storage and maintenance
Heterogeneous content ~ Complex data type (Text base sequence, Blobs, images of cells and tissue , 3-D molecular structure, biochemical pathway, model data , scalar and vector fields Hierarchical data organization Dynamic nature Accessibility Quality

3 The first database was of proteins
Atlas of Protein Sequence and Structure (1965) edited by Margaret Dayhoff . It contains protein sequence that published at that time (Foundation of PIR) Yeast t-RNA with 77 bases was first nucleotide sequence data base Protein structural data base with 10 entries was first constructed in 1972. First genome data base was published on 1995 with that Haemophilus influenzae

4

5 ~100 GB

6 162886727 loci, 150,141,354,858 bases, from 162,886,727 sequences as of
15th Feb 2013

7 Categories of Databases
Data Type (Data heterogeneity) Maintainer Status Technical Design Data Source Data Access And/or other parameter

8 1. Categories of Databases: Data Type
Taxonomy Database Genome Database Sequence database Structure Database Proteomic Database Micro-array Database Enzyme Database Disease Database Pathway Database Literature Database… Many More

9 Nucleotide Databases Nucleotide Databases dbEST PopSet dbGSS Probe
dbSNP RefSeq dbSTS TPA Nucleotide Trace Archive GenBank UniGene HomoloGene UniSTS MGC

10 Protein Databases 3D Domains PROW Proteins RefSeq Protein Clusters Structure Databases Conserved Domains Structure (MMDB) 3D Domains Taxonomy Databases Taxonomy Genome Databases Cancer Chromosomes Genome Project COGs Genomes Gene

11 Expression Databases GEO Profiles SAGE GEO Datasets Chemical Databases PubChem BioAssay PubChem Compound PubChem Substance

12 2. Categories of Databases: Maintainer Status
NCBI (Federal Govt. agency of USA) ( EBI/EMBL(Non-profit academic organization) ( SIB (Quasi-academic non-profit foundation) (

13

14 3. Categories of Databases: Technical Design
Flat file (Information store in text files) XML (Extensible markup language) (Hierarchical semi-structured model) Relational model (Highly structured model) (It has tables with rows (tuples or record) and columns (field) supports by RDBMS like SQL, Oracle, DB2) Object-oriented database management system ASN.1 (abstract syntax notation)

15

16

17 This information is organised into a tabular form, as it usually done in relational DB. The number of columns (fields) in such DB is much larger than in the table below. An index of these fields can be made, which allows very fast search of a DB using one or few field simultaneously. The information in one DB can be cross-referenced to that in another DB. For instance DNA, protein and reference DB have all been cross-references so that moving between them is readily accomplished. Accession Organism Reference Name Keywords Sequence No E. coli Medline1, LexA SOS regulon, ATGCCGG… protein repressor,… H. sapiens Medline2, glucorticoid transcriptional CCGATAAC receptor regulator

18 Example of object-oriented DB

19 Comparison Structure Advantages Disadvantages Flat File
Fast data retrieval, Simple structure, easy programming Difficult to process multiple value, adding new data require reprogramming, slow without the key Hierarchical Addition and deletion easy, fast retrieval through higher level records, multiple association with like records Pointer require large computer storage, pointer path restricts access, each association requires repetitive data Relational Easy access, minimal training for users, flexible for unforeseen enquiry, easy modification, physical storage of data can be changed without affecting the relationship Sequential access is slow, prone to logical mistakes, method of storage impact processing time, new relation require considerable processing Comparison

20 Database Data Data format Data type GenBank OMIM DNA/RNA seq, phynotype, genotype Text file/ASN.1 Text, Numeric Text file GDB AceDB Genetic map Relational/MySQL Object oriented Medline NCBI Literature Seq, str, literature ASN.1 Text PDB BLAST ClustalW KEGG Microarray Structure Seq, Analysis Metabolic path Microarray data Oracle Fasta HTML text, binary RDBMS, Excel 3D Image Images, text

21 4. Categories of Databases: Data Source
Type -1 Primary (From experimental sources) Nucleic acid sequence, protein sequence, protein structure Secondary (From already existing primary database) Genomic (TiGR human gene index), Proteomic (Prosite, CATH) Type -2 Nucleic acids Literature (pubmed) Biomacromolecules Pathways

22 DNA Sequence Database National Center for Biotechnology Information (NCBI) DNA Databank of Japan (DDBJ) European Molecular Biology Laboratory (EMBL)

23 Protein sequence Database

24 European Bioinformatics Institute Swiss Institute of Bioinformatics
Georgetown University

25 Exchange data on a hourly basis
International Nucleotide Sequence Database Collaboration (INSD). Exchange data on a hourly basis Mirroring Data backup

26 Protein structure Database http://www.rcsb.org/pdb/index.html

27 PDB

28 PDB

29 Secondary database

30

31

32

33 5. Categories of Databases: Data Access
Publicly available Available with copyright Browsing but not downloadable Academic but not free Commercial access with payment

34 6. Categories of Databases: Others
Completeness Curation (annotation) …..

35 ENTREZ DB of different kind merged together and become global hubs of knowledge.

36 1. Nucleotide Sequence Databases
2. RNA sequence databases 3. Protein sequence databases 4. Structure Databases 5. Genomics Databases (non-human) 6. Metabolic Enzymes and Pathways; Signaling Pathways 7. Human and other Vertebrate Genomes 8. Human Genes and Diseases 9. Microarray Data and other Gene Expression Databases 10. Proteomics Resources 11. Other Molecular Biology Databases

37 For a detailed list and full coverage see

38

39

40 NCBI resources Databases Online analysis tools

41

42

43

44

45

46

47

48

49

50

51

52 Sequence Retrieval System (http://srs.ebi.ac.uk)

53


Download ppt "Biological databases: Collection, storage and maintenance"

Similar presentations


Ads by Google