Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
On line (DNA and amino acid) Sequence Information Lecture 7.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
© 2006 W.W. Norton & Company, Inc. DISCOVER BIOLOGY 3/e
Bioinformatics and Phylogenetic Analysis
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
BIOLOGY 3020 Fall 2008 Gene Hunting (DNA database searching)
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
DbSNP: the NCBI database of genetic variation S. T. Sherry, M.H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski and K. Sirotkin, Nucleic Acids.
On line (DNA and amino acid) Sequence Information
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Bioinformatics.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Sequence Databases What are they and why do we need them.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
1 Database Resources of the National Center for Biotechnology Information Baharak Rastegari MEDG 505 presentation February 3, 2005 David.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Chapter 21 Eukaryotic Genome Sequences
Construction of Substitution Matrices
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
DNA TO RNA Transcription is the process of creating a molecule that can carry the genetic blueprint for a particular protein coding gene from the DNA.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
生物資料庫搜尋 ( 第八組 ) 連威森 王鼎 黃智楹 張鈞淵
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
What is BLAST? Basic BLAST search What is BLAST?
bacteria and eukaryotes
Introduction to Bioinformatics Resources for DNA Barcoding
Basics of BLAST Basic BLAST Search - What is BLAST?
Sequence Alignments—part 2
Sequencing Data Analysis
Gene architecture and sequence annotation
Access to Sequence Data and Related Information
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
Genomes and Their Evolution
BLAST.
Comparative Genomics.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Gene Safari (Biological Databases)
Basic Local Alignment Search Tool
Sequencing Data Analysis
Presentation transcript:

Sequence Analysis

Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to determine what is known about this sequence biologically?

Gene structure Genes contain introns and exons. Introns are transcribed into RNA but are removed, ie. the are non-coding regions. Exons are the coding regions. Present in mRNA. mRNA Exon1 Intron1 E2E3 I2

Types of DNA sequence Genomic – Contains both genes and non-genic regions – Genes have both intron and exons cDNA (complimentary DNA) – Sequence corresponds to genes that are expressed. – Sequence contain only the

What could you do with genomic sequence? What about with cDNA sequence?

What is an EST? Expressed sequence tag. Part or all of a cDNA that has been sequenced.

What is NCBI? National Center for Biotechnology National Library of Medicine, NIH Created in 1988 to develop information systems for molecular biology. Provides data retrieval systems and computational resources.

Database Resources Database retrieval tools BLAST family of sequence-similarity search programs. Resources for gene-level sequences Resources for genome-scale analysis

Database Retrieval Tools Entrez-for DNA and protein sequences PubMed Central-for literature Taxonomy-organisms and associated sequences LocusLinks-provides links from sequence info to map and other information.

BLAST family Basic local alignment search tool Sequence similarity search against various databases in GenBank

BLAST Pairwise alignment. Each alignment has a statistical significance (e-value). Accounts for amino acid sequence Outputs a list of matches including start, stop, score, and e-value.

5 BLAST Programs BLASTN – Nucleotide vs. Nucleotide BLASTP – Protein vs. Protein BLASTX – Protein vs. nucleotide translation TBLASTN – Nucleotide translation vs. Protein TBLASTX – Nucleotide translation vs. nucleotide translation.

Genome-Scale Analysis Entrez Genomes – taxonomic, genome or chromosome view of the current sequence data for an organism. COGs – List of orthologous protein groups from completely sequenced organisms. Retroviroal genotyping tools – Important in viral genetic diversity, tracking outbreaks, and vaccine development.

Genome-Scale Analysis Eukaryotic Genomic Resources – location of Plant Genomes Central with information from various plant genome projects. Map Viewer – Displays genome assemblies using chromosome map views.

Genome-Scale Analysis Human-Mouse Homology Maps – List of genes in homologous segments. Cancer Chromosome Aberration Project – List of recurrent chromosome aberrations associated with cancer.

Gene Expression/Phenotype OMIM – Catalog of human genes and genetic disorders including phenotypes and polymorphism information. Gene Expression Omnibus (GEO) – Data repository and retrieval system for expression data from all sources.

MMDB, CDDB, CDART Molecular Modeling Database Conserved Domain Database Conserved Domain Architecture Retrieval Tool – Identifies conserved domains and displays their structure.

Sequence Analysis References Korf, Yandell, and Bedell An Essential Guide to the Basic Local Alignment Search Tool: BLAST. O’Reilly & Associates, Sebastopol, CA. Markel and Leon Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases. O’Reilly & Associates, Sebastopol, CA.

Sequence Analysis References Baxevanis and Ouellette Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Wiley Interscience, New York. Mount Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory, New York.

What can you do with the sequence? Gene prediction Motif identification Promoter identification Survey gene expression across tissues Full length gene isolation Identify mutations (SNP, InDel)

InDel Insertion/Deletions Usually small sized Can use the same protocols and equipment as for SSR analysis or can run separation on a capillary system using fluorecently labelly material.

Single Nucleotide Polymorphism SNP Single base-pair change in the DNA sequence of two alleles. Best done with high quality sequence and confirmed in multiple lines or multiple experiments.

SNP popularity Difficult to identify human disease loci by other methods. Most abundant class of polymorphisms in many species. Ease of use for genotyping, ie. they can be automated easily.

What can you do with ESTs? Gene expression analysis Colinearity studies Protein prediction SNP identification Genetic mapping

Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to determine what is known about this sequence biologically?

Using adh as an example Find adh1 sequence in corn. Find related sequences. Determine its function in corn. Find adh in human. Find related sequences. Determine its function in human.