Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Databases (“knowledge bases”) used in genome analysis
Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
NCBI web resources I: databases and Entrez Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
On line (DNA and amino acid) Sequence Information Lecture 7.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Archives and Information Retrieval
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Bioinformatics and Phylogenetic Analysis
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Bioinformatics.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
1 Database Resources of the National Center for Biotechnology Information Baharak Rastegari MEDG 505 presentation February 3, 2005 David.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
NCBI FieldGuide NCBI Molecular Biology Resources January 2008 Using Entrez.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
Organizing information in the post-genomic era The rise of bioinformatics.
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
NCBI FieldGuide NCBI Molecular Biology Resources March 2007 Using Entrez.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Finding genes in the genome
What is BLAST? Basic BLAST search What is BLAST?
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Database resources of the National Center for Biotechnology The National Center for Biotechnology Information (NCBI) at the National Institutes of Health.
NCBI PubMed NCBI Literature Databases: PubMed Session #1, April 28, 2005 Session #2, April 29, 2005 Ho Chi Minh City, VietNam.
What is BLAST? Basic BLAST search What is BLAST?
bacteria and eukaryotes
NCBI Molecular Biology Resources
Basics of BLAST Basic BLAST Search - What is BLAST?
Lettuce/Sunflower EST CGPDB project.
Genome Annotation Continued
Gene architecture and sequence annotation
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
BLAST.
Comparative Genomics.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Gene Safari (Biological Databases)
Problems from last section
Basic Local Alignment Search Tool
Presentation transcript:

Sequence Analysis MUPGRET June workshops

Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel

What can you do with the sequence? Gene prediction Motif identification Promoter identification Survey gene expression across tissues Full length gene isolation

NCBI Tools National Center for Biotechnology National Library of Medicine, NIH Created in 1988 to develop information systems for molecular biology. Provides data retrieval systems and computational resources.

Database Resources Database retrieval tools BLAST family of sequence-similarity search programs. Resources for gene-level sequences Resources for genome-scale analysis

Database Resources Resources for analyzing gene expression patterns and phenotypes Molecular modeling database, conserved domain database, conserved domain architecture retrieval tool.

Database Retrieval Tools Entrez-for DNA and protein sequences PubMed Central-for literature Taxonomy-organisms and associated sequences LocusLinks-provides links from sequence info to map and other information.

BLAST family Basic local alignment search tool Sequence similarity search against various databases in GenBank Gapped alignments with links to various other databases such as unigene or locuslink.

BLAST pairwise alignment but can do multiple alignments with “query-anchored” feature. each alignment has a statistical significance (e-value) Accounts for amino acid sequence Outputs a list of matches including start, stop, score, and e-value.

5 BLAST Programs BLASTN – Nucleotide vs. Nucleotide BLASTP – Protein vs. Protein BLASTX – Protein vs. nucleotide translation TBLASTN – Nucleotide translation vs. Protein TBLASTX – Nucleotide translation vs. nucleotide translation.

BLAST family BLAST2Sequences-dot plot of alignment MegaBLAST-nearly exact matches PSI-BLAST – match to protein that reduces false positive hits Blink – Allows display of alignments by taxonomic criteria, database origin, relation to a complete genome, relation to a 3D protein structure or conserved domain.

Gene-Level Sequences UniGene – Identifies a non-redundant set of EST based on GenBank sequences. ProtEST – displays pre-computed BLAST alignments between protein sequences from model organisms and the 6-frame translation of the UniGene nucleotide sequences.

Gene-Level Sequences HomoloGene – Curated and calculated gene lrthologs and homologs for 14 organsisms. RefSeq – Curated reference sequences for mRNAs, genomic sequences, etc. ORF Finder – 6-frame translation with graph of ORF position. ePCR – locates sequence tagged sites. dbSNP – Contains SNP and InDel

Genome-Scale Analysis Entrez Genomes – taxonomic, genome or chromosome view of the current sequence data for an organism. COGs – List of orthologous protein groups from completely sequenced organisms. Retroviroal genotyping tools – Important in viral genetic diversity, tracking outbreaks, and vaccine development.

Genome-Scale Analysis Eukaryotic Genomic Resources – location of Plant Genomes Central with information from various plant genome projects. Map Viewer – Displays genome assemblies using chromosome map views. Model Maker (MM) – Generates transcript models using exon data from prediction or from GenBank alignments.

Genome-Scale Analysis Evidence Viewer – Graphical summary of alignments relative to contigs including insertion/deletion or mismatches. Human-Mouse Homology Maps – List of genes in homologous segments. Cancer Chromosome Aberration Project – List of recurrent chromosome aberrations associated with cancer.

Gene Expression/Phenotype SAGEmap – A way to look at SAGE data inlcuding two-way mapping between SAGE tag and UniGene. Gene Expression Omnibus (GEO) – Data repository and retrieval system for expression data from all sources. OMIM – Catalog of human genes and genetic disorders including phenotypes and polymorphism information.

MMDB, CDDB, CDART Molecular Modeling Database Based on Protein Data Bank Conserved Domain Database PSI-BLAST-derived scores indicating domains in the protein data bank. Conserved Domain Architecture Retrieval Tool – Identifies conserved domains and displays their structure.

Sequence Analysis References Korf, Yandell, and Bedell An Essential Guide to the Basic Local Alignment Search Tool: BLAST. O’Reilly & Associates, Sebastopol, CA. Markel and Leon Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases. O’Reilly & Associates, Sebastopol, CA.

Sequence Analysis References Baxevanis and Ouellette Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Wiley Interscience, New York. Mount Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory, New York.