Download presentation
Presentation is loading. Please wait.
1
Sequence Analysis MUPGRET June workshops
2
Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel
3
What can you do with the sequence? Gene prediction Motif identification Promoter identification Survey gene expression across tissues Full length gene isolation
4
NCBI Tools National Center for Biotechnology National Library of Medicine, NIH Created in 1988 to develop information systems for molecular biology. Provides data retrieval systems and computational resources.
5
Database Resources Database retrieval tools BLAST family of sequence-similarity search programs. Resources for gene-level sequences Resources for genome-scale analysis
6
Database Resources Resources for analyzing gene expression patterns and phenotypes Molecular modeling database, conserved domain database, conserved domain architecture retrieval tool.
7
Database Retrieval Tools Entrez-for DNA and protein sequences PubMed Central-for literature Taxonomy-organisms and associated sequences LocusLinks-provides links from sequence info to map and other information.
8
BLAST family Basic local alignment search tool Sequence similarity search against various databases in GenBank Gapped alignments with links to various other databases such as unigene or locuslink.
9
BLAST pairwise alignment but can do multiple alignments with “query-anchored” feature. each alignment has a statistical significance (e-value) Accounts for amino acid sequence Outputs a list of matches including start, stop, score, and e-value.
10
5 BLAST Programs BLASTN – Nucleotide vs. Nucleotide BLASTP – Protein vs. Protein BLASTX – Protein vs. nucleotide translation TBLASTN – Nucleotide translation vs. Protein TBLASTX – Nucleotide translation vs. nucleotide translation.
11
BLAST family BLAST2Sequences-dot plot of alignment MegaBLAST-nearly exact matches PSI-BLAST – match to protein that reduces false positive hits Blink – Allows display of alignments by taxonomic criteria, database origin, relation to a complete genome, relation to a 3D protein structure or conserved domain.
12
Gene-Level Sequences UniGene – Identifies a non-redundant set of EST based on GenBank sequences. ProtEST – displays pre-computed BLAST alignments between protein sequences from model organisms and the 6-frame translation of the UniGene nucleotide sequences.
13
Gene-Level Sequences HomoloGene – Curated and calculated gene lrthologs and homologs for 14 organsisms. RefSeq – Curated reference sequences for mRNAs, genomic sequences, etc. ORF Finder – 6-frame translation with graph of ORF position. ePCR – locates sequence tagged sites. dbSNP – Contains SNP and InDel
14
Genome-Scale Analysis Entrez Genomes – taxonomic, genome or chromosome view of the current sequence data for an organism. COGs – List of orthologous protein groups from completely sequenced organisms. Retroviroal genotyping tools – Important in viral genetic diversity, tracking outbreaks, and vaccine development.
15
Genome-Scale Analysis Eukaryotic Genomic Resources – location of Plant Genomes Central with information from various plant genome projects. Map Viewer – Displays genome assemblies using chromosome map views. Model Maker (MM) – Generates transcript models using exon data from prediction or from GenBank alignments.
16
Genome-Scale Analysis Evidence Viewer – Graphical summary of alignments relative to contigs including insertion/deletion or mismatches. Human-Mouse Homology Maps – List of genes in homologous segments. Cancer Chromosome Aberration Project – List of recurrent chromosome aberrations associated with cancer.
17
Gene Expression/Phenotype SAGEmap – A way to look at SAGE data inlcuding two-way mapping between SAGE tag and UniGene. Gene Expression Omnibus (GEO) – Data repository and retrieval system for expression data from all sources. OMIM – Catalog of human genes and genetic disorders including phenotypes and polymorphism information.
18
MMDB, CDDB, CDART Molecular Modeling Database Based on Protein Data Bank Conserved Domain Database PSI-BLAST-derived scores indicating domains in the protein data bank. Conserved Domain Architecture Retrieval Tool – Identifies conserved domains and displays their structure.
19
Sequence Analysis References Korf, Yandell, and Bedell. 2003. An Essential Guide to the Basic Local Alignment Search Tool: BLAST. O’Reilly & Associates, Sebastopol, CA. Markel and Leon. 2003. Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases. O’Reilly & Associates, Sebastopol, CA.
20
Sequence Analysis References Baxevanis and Ouellette. 2001. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Wiley Interscience, New York. Mount. 2000. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory, New York.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.