Presentation is loading. Please wait.

Presentation is loading. Please wait.

SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.

Similar presentations


Presentation on theme: "SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS."— Presentation transcript:

1 SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS

2 Biological databases: Biological databases are stores of biological information. Biological databases Based on their contents, biological databases can be roughly divided into three categories: Primary databases, Secondary databases, and Specialized databases.

3 Primary Databases Primary Databases: Primary databases contain data that is derived experimentally. They usually store information related to the sequences or structures of biological components. They can be further divided into protein or nucleotide databases which can be further divided as sequence or structure databases. The most commonly used primary databases are: DNA Data Bank of Japan (DDBJ), European Molecular Biology Laboratory (EMBL) Nucleotide Sequence Database, GenBank, and Protein Data Bank (PDB)

4 Database of Nucleotide sequences 1. GenBank- This is a public sequence database and it can be accessed through a web addess http://www.ncbi.nlm.nih.gov/genbank/. http://www.ncbi.nlm.nih.gov/genbank/ GenBank is the most complete collection of annotated nucleic acid sequence data for almost every organism. The content includes genomic DNA, mRNA, cDNA, ESTs, high throughput raw sequence data, and sequence polymorphisms. 2. Entrez-Entrez system is used to search all NCBI associated databases. It is a powerful tool to peform simple or complicated searches by combining key word with the logical operator (AND, NOT). For example, searching a protein kinase sequence in human can be done by the following search syntax: Homo sapiens [ORGN] AND protein kinase. 3.EMBL and DDBJ- EMBL is the nucleotide sequence database present at European bioinformatics institute where as DDBJ is the DNA sequence database present at centre for information biology, Japan. EMBL can be accessed at http://www.embl.de/. where as DDBJ canbe accessed at http://www.ddbj.nig.ac.jp/.

5 Database of protein sequences SWISSPROT-It is the collection of the annoted protein sequence of the swiss instituite of bioinformatics (SIB). SWISSPROT can be accessed http://web.expasy.org/groups/swissprot/. The protein sequence entry in the swissprot is manually curated and if required it is compared with the available literature. Swissprot is part of the UniProt database and collectively known as UniProt Knowledge base. NCBI protein database-It is a compilation of the protein sequence present in other databases. The NCBI database contains the entries from the swissprot, PIR database, PDB database and other known databases. UniProt-It contains the information about the 3-D structure, expression profile, secondary structures and biochemical function of the protein.

6 Database of protein sequences The Protein Databank (PDB) The Protein Databank is the main repository for protein structural information,three-dimensional structures are stored in the Protein Databank (PDB). This is the single world-wide archive of structural data derived by X- ray crystallography, nuclear magnetic resonance spectroscopy, and other techniques, as well as structural models. The database is maintained by the Research Collaboratory for Structural Bioinformatics (RCSB), at Rutgers University.

7 Database of protein sequences Molecular Modeling Database (MMDB) There are also other structural databases such as the NCBI’s Molecular Modeling Database (MMDB) which aims to provide information on sequence and structure neighbors, links between the scientific literature and 3D structures, and sequence and structure visualization.

8 Secondary Databases: Secondary databases contain the data that is obtained through the analysis or treatment of data present in primary databases. For instance, it can contain conserved protein sequence, signature sequence active site residues of protein families which are obtained from multiple sequence alignment of related proteins, etc. These databases can be further classified as metabolic pathways database, protein family database,etc. The most common examples are Class Architecture Topology Homology (CATH), Kyoto Encyclopedia of Genes Genomics (KEGG), Protein Families (Pfam) and Structural Classification of Proteins (SCOP).

9

10


Download ppt "SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS."

Similar presentations


Ads by Google