On line (DNA and amino acid) Sequence Information Lecture 7.

Slides:



Advertisements
Similar presentations
Bioinformatics Ayesha M. Khan Spring 2013.
Advertisements

Application to find Eukaryotic Open reading frames. Lab.
Databases (“knowledge bases”) used in genome analysis
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
© Wiley Publishing All Rights Reserved. How Most People Use Bioinformatics.
Bioinformatics Lecture 4 BCH 550 Arjumand Warsy. Retrieving DNA Sequences.
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.
Finding Eukaryotic Open reading frames.
Archives and Information Retrieval
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
The Cell, Central Dogma and Human Genome Project.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
How to use the web for bioinformatics Ethan Strauss X 1171
Bioinformatics Lecture 3 BCH 550 Arjumand Warsy. Retrieving Protein Sequences.
Finding prokaryotic genes and non intronic eukaryotic genes
An Introduction to Bioinformatics Molecular Biology Databases.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
On line (DNA and amino acid) Sequence Information
Bioinformatics.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
1 Review of Biological Database Utilization. 2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods.
Biological Databases and Tools Sandra Sinisi / Kathryn Steiger November 25, 2002.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
جلسه اول بیو انفورماتیک گردآوری:مسعود رسول آبادی
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Function preserves sequences
Basic Local Alignment Search Tool BLAST Why Use BLAST?
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Bioinformatics and Computational Biology
Computer Storage of Sequences
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis
Copyright OpenHelix. No use or reproduction without express written consent1.
Finding genes in the genome
What is BLAST? Basic BLAST search What is BLAST?
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
What is BLAST? Basic BLAST search What is BLAST?
Annotation Presentation
Archives and Information Retrieval
생물정보학 Bioinformatics.
Basic Local Alignment Search Tool
Explore Evolution: Instrument for Analysis
How to search NCBI.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

On line (DNA and amino acid) Sequence Information Lecture 7

Bioinformatcs Databases The Biological data, generated by various labs, is submitted and stored in specific databases is : The data can be: – Nucleotide: DNA and mRNA (cDNA) – Proteins sequences The main nucleotide sequence databases are: – United states: Genebank (NCBI)Genebank – Europe: Nucleotide sequence database (EMBL)Nucleotide sequence database – Japan: DNA databank of Japan. (DDJB)DNA databank of Japan These databases also contain sequences related to: – Expressed sequence tags (ESTs) small (800 bp) of mRNA that be used to see what genes are expressed…

Protein Databases The main protein databases is: Uniprot (DB) databases contains data from three related databases sites: Uniprot (DB) – SWISS-PROT (most up-to date information) SWISS-PROT – Trembl: (translation of coding sequences.) Trembl – PIR database [protein information resource] PIR Both the nucleotide and protein databases contain much more detail than just sequences. The data is generated is referred to gene annotated data.

The Annotation of genes Once the gene sequence’s have been determined then the data must be annotated, This basic annotated data includes: (Klug 2010) – Identify regulatory regions – Identify coding sequences (cds); the exons/ introns (if a sequence; eukaryotic)…. – The amino acid sequence for the gene. – Other organisms where the DNA sequence/ AA sequence is to found – Journals/Reference to where data came from. – Links to other databases that contain information about the gene, 4Global Sequence

Bioinformatics Database To faciliate finding annotated data about genes and protein information there are a number of sites containing specific search engines; – NCBI has ENTREZENTREZ – EMBL has the EBI search page previously SRS engineEBI search pageSRS engine – The SIB ExPaSy search engine (This is more fosuces on protein related information. )ExPaSy search engine Consider the following query: – What is the DNA and amino acid sequence for the following gene: Human BTEB – Type the following into the search text box: – Human[orgamism] AND BTEB[title]

NCBI Entrez search page

BTEB NCBI Nucleotide Record

Coding section of gene The Exon intron structure is also available in graphic form

Further information On the right hand column you will find links to online analytical resources; e.g. BLAST (psi- blast) (a tool to search for similar sequences contained in the database): Information on the amino acid sequence obtained for the CDs of the gene. The text box also provides a link to information on the protein in the uniprot database.

An EMBL nucleotide record Annotated data can also be found in the EMBL database: BTEB EMBL record.: shows the main record. BTEB EMBL record Clicking on the “text” link at the top right hand corner will give the essential features of the gene. BTEB-EMBL-EBI_text_record.BTEB-EMBL-EBI_text_record An ExPASy database search gives the following information for this gene: Type BTEB and then BTEB and HumanExPASy

The BTEB Protein record A link to a graphic representation of the protein and the relevant annotated data can be found at: BTEB Human ProteinBTEB Human Protein

Other databases databases The nucleotide (Genbank and EMBL) and protein (Uniprot) contain the “raw data” and are referred to as “primary databases”. – More specific databases derive data from these and are referred to as secondary database; examples include protein family and sequence similarity databases such as PROSITE and PRINTSPROSITEPRINTS – There are databases which contain information about specific organisms such as e. coli using Genome online database (GOLD)GOLD

Other databases – Databases for specific types of sequences such as those associated with promoters and other regulatory elements. dbEST ; Homologous structure alignment database.dbESTHomologous structure alignment database. – Structural databases from the Protein Data BankProtein Data Bank – On-line Mendelian inheritance of man (OMIM) which contains information on human genes and genetic disorders. On-line Mendelian inheritance of man The nucleic acids research journal January edition provides up-to-date analysis of current online bioinformatics databases: Nucleic acid research database editionNucleic acid research database edition

Other important information sources PUBMED: Literature research: journal articles/ conference proceedings/ books etc. – Search under many fields: keyword, author…. – Returns: journal articles/abstracts – Two types: general/review. – BTEB pubmed search found at: tailsSearch tailsSearch The user can register a NCBI account to manage their activity and store findings of: gene searches; pubmed searches…. This information can be download, ed….

BTEB pubmed search result

Exercise The EMBL-EBI record: BTEB_”text”_record.BTEB_”text”_record The NCBI : BTEB NCBI Nucleotide RecordBTEB NCBI Nucleotide Record The DDJB: BTEB flatfile RecordBTEB flatfile Record Exercise: write a briefy report comparing and contrasting the core elements of both records: refer to page 8-16 in Bioinformatics: A practical guide to the analysis of genes and proteins 3 rd edition ; Book can be found in the library.

Exercise Search for the following gene “DNA” sequence: – Human Leukocyte Elastase gene linear DNA [ hint should be 5292 bp long]. – Retrieve the record and download and save the fasta file.