MCB 371/372 Student Projects Databanks 3/16/05 Peter Gogarten Office: BSP 404 phone: 860 486-4061,

Slides:



Advertisements
Similar presentations
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Advertisements

Databanks (A) NCBINCBI (National Center for Biotechnology Information) is a home for many public biological databases (see an older diagram below). All.
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
On line (DNA and amino acid) Sequence Information Lecture 7.
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Computer lab exercises #8. Comments on projects worth sharing: 1.Use BLINK whenever possible. It can save a lot of waiting and greatly accelerates explorations.
BIOINFORMATICS Ency Lee.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
MCB 5472 Student Projects Databanks, Blast possibly unix Perl J. Peter Gogarten Office: BPB 404 phone: ,
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Sequence and structure databanks can be divided into many different categories. One of the most important is Supervised databanks with gatekeeper. Examples:
Bioinformatics and Phylogenetic Analysis
Lecture 2.21 Retrieving Information: Using Entrez.
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
The Protein Data Bank (PDB)
The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
MCB 372 Student Projects Databanks, Blast possibly unix Perl J. Peter Gogarten Office: BPB 404 phone: ,
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
How to use the web for bioinformatics Ethan Strauss X 1171
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
On line (DNA and amino acid) Sequence Information
Bioinformatics.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Copyright © 2010 Pearson Education Inc. Lecture 01 – Genetics & Genomics: An Introduction Based on Chapter 1 – Genetics: An introduction.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
Copyright OpenHelix. No use or reproduction without express written consent1.
Function preserves sequences
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.
Using blast to study gene evolution – an example.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Bioinformatics and Computational Biology
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Copyright OpenHelix. No use or reproduction without express written consent1.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Computer Applications and Bioinformatics
Basics of BLAST Basic BLAST Search - What is BLAST?
Archives and Information Retrieval
Pipelines for Computational Analysis (Bioinformatics)
What is Bioinformatics?
Genome Annotation Continued
Genome Center of Wisconsin, UW-Madison
Genomes and Their Evolution
There are four levels of structure in proteins
Student Projects Databanks, Blast possibly unix Perl
Regulatory Genomics Lab
Unit Genomic sequencing
Regulatory Genomics Lab
Presentation transcript:

MCB 371/372 Student Projects Databanks 3/16/05 Peter Gogarten Office: BSP 404 phone: ,

Student Projects Should be related to your interests Examples for possible projects:

Example 1: Evolution of a gene family When in the evolution of evolution of the interferon (or what ever you are interested in) gene family did gene duplications occur? Which of the resulting subfamilies have a acquired a new function? What is the phylogenetic distribution of this subfamily? (Would you expect members of this subfamily to be present in insects, fish, chicken, fungi, archaea?) Can you detect episodes of positive selection? Is there anything that would suggest gene conversion events? The “ to-do-list” would include: gather data (note for some of the questions mentioned above you’ll need aa and nucleotide sequences), align sequences build phylogenies analyze sequences assess reliability of branches INTERPRET WHAT YOU GOT!

Example 2: Can one detect a distinct second peak in the divergence of putatively chimeric genomes? Genome fusions are the latest rage in evolutionary biology: For example: Koonin EV, Mushegian AR, Galperin MY, Walker DR. Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol Microbiol Aug;25(4): The Eukaryotes are a chimera of at least an archaeal like host cell and a bacterium that evolved into a mitochondrium (+ in some cases a cyanobacterium that evolved into a plastid) The Haloarchaea contain many bacterial genes The Thermotogales contain many archaeal genes In most of these instances it is not clear that the transfer really occurred in a single massive event, or if the transfers occurred on a gene by gene basis.

Example 2: Chimera? continued In case of a chimera formed in a single historic event one would expect A)Two distinct types of phylogenetic affinity. E.g.: Genes in Thermotoga maritima should either group with the sistergroup of the bacterial partner, or with the sistergroup of the archaeal donor

(a) concordant genes, (b) according to 16S (and other conserved genes) (c) according to phylogenetically discordant genes Gophna, U., Doolittle, W.F. & Charlebois, R.L.: Weighted genome trees: refinements and applications. J. Bacteriol. (in press) The Phylogenetic position of Thermotoga maritima

Example 2: Chimera? continued In case of a chimera formed in a single historic event one would expect A)Two distinct types of phylogenetic affinity. E.g.: Genes in Thermotoga maritima should either group with the sistergroup of the bacterial partner, or with the sistergroup of the archaeal donor B) Two distinct peaks in a divergence histogram. E.g.: If one measures the divergence Thermotoga – Archaea for all the individual genes, under the assumption of a chimera formation one should obtain a bimodal distribution in a histogram of the different genes.

Histogram of divergence to archaeal genes for two bacterial genomes For each encoded protein BLAST searches were performed against the proteins in 5 archaeal genomes (Pyrococcus abysii, P. furiosus, Archaeoglobus fulgidus, Methanocaldococcus janaschii, and Methanothermobacter thermautotrophicus). The highest (bitscore divided by the alignment lengths) was utilized as a measure of sequence similarity. Relative sequence divergence between two sequences was calculates as (1-similarity(b_a)/similarity(b_b)), where similarity(b_a) is the similarity score for a bacterial sequence with the most similar archaeal one, and similarity(b_b) is the similarity score of the bacterial sequence compared with itself.

Example 2, continued The “ to-do-list” would include: Formulate the question you want to address Find a computer where you can run blastall (this might take a couple of hours) Download and analyze the required genomes Analyze the results in an Excel spreadsheet Selected some genes (e.g., the ones that are most archaeal), assemble gene families and reconstruct their phylogenies. INTERPRET YOUR RESULTS! What does it all mean?

Example 3: Gene versus Genome Duplications The same approach as suggested for the chimera formation can be applied to the question was the whole genome or a large segment of an organism’s genome duplicated, or did the duplications occur in a piecemeal fashion? Frequency distributions of d S in human and mouse between the members of two-member gene families located on the same and different chromosomes From: Robert Friedman and Austin L. Hughes: Two Patterns of Genome Organization in Mammals: the Chromosomal Distribution of Duplicate Genes in Human and Mouse. Mol. Biol. Evol. 21(6):1008– Bob Friedman will be the new Bioinformatics services specialist at UConn’s Bioinformatics services facility !

Assignments: Pick a topic for your student project! Please, don’t hesitate to send me an in case you have a question. Let me know what you are interested in ( ). What we will do in this course will in part depend on your interests. Reading for Monday: Read through the NCBI's BLAST tutorialBLAST tutorial

Databanks (A) NCBINCBI (National Center for Biotechnology Information) is a home for many public biological databases (see an older diagram below). All of the databases are interlinked, and they all have common search and retrieval system - Entrez.Entrez Another more complete representation with an interactive display of the number of the connections between the different databases in ENTRZ is here.here.

Entrez / Pubmed, continued An interactive Pubmed tutorial click here.here An Entrez tutorial (non interactive) is herehere Use Boolean operators (AND, OR, NOT) to perform advanced searches. Here is an explanation of the Boolean operators from the Library of Congress Help Page. Here Explore features of Entrez interface: Limits, Index, History, and Clipboard.Entrez Search Field Tags- Listed here.here Demonstrate Link>Books

Other Literature databanks and Services While Pubmed is incorporating more and more non-medical literature, there might still be gaps in the coverage. Alternatives are local services offered at the UConn libraries. Especially Current Contents and Agricola nicely complement PubMed. The best way to access them is the use of "SilverPlatter" database. "SilverPlatter" Also, the "Web of Science" database gives access to the Science Citation Index: a database that tracks cited references in journals. Note that these resources are restricted to UConn domain, so you either need to access it from a campus computer or through the proxy account."Web of Science"through the proxy account.

Search Robots PubCrawlerPubCrawler allows to run predefined literature searches. Results are written into a database and you are send an , if there were new results. NCBI now offers a similar service (see My NCBI (Chubby), check the tutorial). Swiss-ShopSwiss-Shop is offering the same service for proteins

Sequence and structure databanks can be divided into many different categories. One of the most important is Supervised databanks with gatekeeper. Examples: Swissprot Refseq (at NCBI) Entries are checked for accuracy. + more reliable annotations -- frequently out of date Repositories without gatekeeper. Examples: GenBank EMBL TrEMBL Everything is accepted + everything is availabel -- many duplicates -- poor reliability of annotations

Other web pages besides the NCBI Nucleic Acid Research Database Issue Every year, the first issue of Nucleic Acid Research is devoted to updates on biological databases.Nucleic Acid Research Database Issue The European homolog/analog to NCBI. The US ribosomal databank projecthttp://rdp.cme.msu.edu/ Several organism specific resources: Yeast and Arabidopsis genome projectshttp://genome- Database of Drosophila Genomehttp:// TAIR - The Arabidopsis Information Resourcehttp:// The European ribosomal databank projecthttp:// The Joint Genome Institutehttp://genome.jgi-psf.org/mic_home.html A recent (well hidden) addition is the integrated microbial genomes site at the coolest feature is the selected gene neighborhoods. Most up to date information on ongoing and completed genome projects – free for academic users.