On line (DNA and amino acid) Sequence Information Lecture 7
Bioinformatcs Databases The Biological data, generated by various labs, is submitted and stored in specific databases is : The data can be: – Nucleotide: DNA and mRNA (cDNA) – Proteins sequences The main nucleotide sequence databases are: – United states: Genebank (NCBI)Genebank – Europe: Nucleotide sequence database (EMBL)Nucleotide sequence database – Japan: DNA databank of Japan. (DDJB)DNA databank of Japan These databases also contain sequences related to: – Expressed sequence tags (ESTs) small (800 bp) of mRNA that be used to see what genes are expressed…
Protein Databases The main protein databases is: Uniprot (DB) databases contains data from three related databases sites: Uniprot (DB) – SWISS-PROT (most up-to date information) SWISS-PROT – Trembl: (translation of coding sequences.) Trembl – PIR database [protein information resource] PIR Both the nucleotide and protein databases contain much more detail than just sequences. The data is generated is referred to gene annotated data.
The Annotation of genes Once the gene sequence’s have been determined then the data must be annotated, This basic annotated data includes: (Klug 2010) – Identify regulatory regions – Identify coding sequences (cds); the exons/ introns (if a sequence; eukaryotic)…. – The amino acid sequence for the gene. – Other organisms where the DNA sequence/ AA sequence is to found – Journals/Reference to where data came from. – Links to other databases that contain information about the gene, 4Global Sequence
Bioinformatics Database To faciliate finding annotated data about genes and protein information there are a number of sites containing specific search engines; – NCBI has ENTREZENTREZ – EMBL has the EBI search page previously SRS engineEBI search pageSRS engine – The SIB ExPaSy search engine (This is more fosuces on protein related information. )ExPaSy search engine Consider the following query: – What is the DNA and amino acid sequence for the following gene: Human BTEB – Type the following into the search text box: – Human[orgamism] AND BTEB[title]
NCBI Entrez search page
BTEB NCBI Nucleotide Record
Coding section of gene The Exon intron structure is also available in graphic form
Further information On the right hand column you will find links to online analytical resources; e.g. BLAST (psi- blast) (a tool to search for similar sequences contained in the database): Information on the amino acid sequence obtained for the CDs of the gene. The text box also provides a link to information on the protein in the uniprot database.
An EMBL nucleotide record Annotated data can also be found in the EMBL database: BTEB EMBL record.: shows the main record. BTEB EMBL record Clicking on the “text” link at the top right hand corner will give the essential features of the gene. BTEB-EMBL-EBI_text_record.BTEB-EMBL-EBI_text_record An ExPASy database search gives the following information for this gene: Type BTEB and then BTEB and HumanExPASy
The BTEB Protein record A link to a graphic representation of the protein and the relevant annotated data can be found at: BTEB Human ProteinBTEB Human Protein
Other databases databases The nucleotide (Genbank and EMBL) and protein (Uniprot) contain the “raw data” and are referred to as “primary databases”. – More specific databases derive data from these and are referred to as secondary database; examples include protein family and sequence similarity databases such as PROSITE and PRINTSPROSITEPRINTS – There are databases which contain information about specific organisms such as e. coli using Genome online database (GOLD)GOLD
Other databases – Databases for specific types of sequences such as those associated with promoters and other regulatory elements. dbEST ; Homologous structure alignment database.dbESTHomologous structure alignment database. – Structural databases from the Protein Data BankProtein Data Bank – On-line Mendelian inheritance of man (OMIM) which contains information on human genes and genetic disorders. On-line Mendelian inheritance of man The nucleic acids research journal January edition provides up-to-date analysis of current online bioinformatics databases: Nucleic acid research database editionNucleic acid research database edition
Other important information sources PUBMED: Literature research: journal articles/ conference proceedings/ books etc. – Search under many fields: keyword, author…. – Returns: journal articles/abstracts – Two types: general/review. – BTEB pubmed search found at: tailsSearch tailsSearch The user can register a NCBI account to manage their activity and store findings of: gene searches; pubmed searches…. This information can be download, ed….
BTEB pubmed search result
Exercise The EMBL-EBI record: BTEB_”text”_record.BTEB_”text”_record The NCBI : BTEB NCBI Nucleotide RecordBTEB NCBI Nucleotide Record The DDJB: BTEB flatfile RecordBTEB flatfile Record Exercise: write a briefy report comparing and contrasting the core elements of both records: refer to page 8-16 in Bioinformatics: A practical guide to the analysis of genes and proteins 3 rd edition ; Book can be found in the library.
Exercise Search for the following gene “DNA” sequence: – Human Leukocyte Elastase gene linear DNA [ hint should be 5292 bp long]. – Retrieve the record and download and save the fasta file.