Genomic and comparative genomic analysis BIO520 BioinformaticsJim Lund.

Slides:



Advertisements
Similar presentations
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Advertisements

Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Comparative genomics Joachim Bargsten February 2012.
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Bioinformatics and Phylogenetic Analysis
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Protein Modules An Introduction to Bioinformatics.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Bioinformatics Genome anatomy Comparisons of some eukaryotic genomes Allignment of long genomic sequences Comparative genomics Oxford Grid Reconstruction.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Genome organization Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
HOGENOM a phylogenomic database
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Genomics Lecture 8 By Ms. Shumaila Azam. 2 Genome Evolution “Genomes are more than instruction books for building and maintaining an organism; they also.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Sackler Medical School
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Protein and RNA Families
Genome Analysis II Comparative Genomics Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Comparative genomics Haixu Tang School of Informatics.
Comparative Genomics.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Comparative genomics, genome context.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
bacteria and eukaryotes
Genomes and their evolution
Evolution of eukaryotic genomes
Genetics and Evolutionary Biology
Basics of Comparative Genomics
Comparative Genomics.
Genomes and Their Evolution
Genomes and Their Evolution
SGN23 The Organization of the Human Genome
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
Introduction to Bioinformatics II
Evolution of eukaryote genomes
Gene Density and Noncoding DNA
Basics of Comparative Genomics
Nora Pierstorff Dept. of Genetics University of Cologne
Volume 21, Issue 23, Pages (December 2011)
Volume 11, Issue 7, Pages (May 2015)
Origins and Impacts of New Mammalian Exons
Presentation transcript:

Genomic and comparative genomic analysis BIO520 BioinformaticsJim Lund

Comparative genomics delivers Clues as to human disease genes and evolutionary history Evidence of general trends in genome evolution Previously unknown regulatory strategies “Natural history”of species as apparent in genome records Surprises

Difference is in Scale and Direction One or several genes compared against all other known genes. Use genome to inform us about the entire organism. Use information from many genomes to learn more about the individual genes. Entire Genome compared to other entire genomes. Other “omics” Comparative

What are some questions that comparative genomics can address? How has the organism evolved? What differentiates species? Which non-coding regions are important? Which genes are required for organisms to survive in a certain environment? (prokaryotes)

Genomic characteristics observed in recently diverged species Time (My) F AB D E C Organism-specific differences in gene regulation more apparent than difference in genome sequence or structure Relatively small amount of neutral drift Apparent positive selection Some chromosomal rearrangement Minimal species-specific gene innovation

Genomic characteristics observed in species that have diverged ~80MYA Time (My) F AB D E C Chromosomal re-arrangements dominate organizational change. Changes in chromosome number likely. Conservation of synteny regions within rearrangements. High conservation features indicate purifying selection against drift background, therefore important genomic features in common. Protein domain arrangements largely conserved among orthologs. Species-specific gene duplication, divergence, and/or loss.

Genome structure has no resolvable large or small-scale homology. Cis-regulatory regions do not correspond. Greatest conservation at the functional level in some protein domains and functional RNA. Different strategies in gene organization and regulation. Apparent homology in shared-ancestral systems, such as energy processing and storage. Time (My) AE F G Genomic characteristics observed between species that have diverged ~1BYA

Different Questions Require Different Comparisons From: Hardison. Plos Biology. Vol 1 (2):

What is compared? Gene location Gene structure –Exon number –Exon lengths –Intron lengths –Sequence similarity Gene characteristics –Splice sites –Codon usage –Conserved synteny

From: Miller et al. Annu. Rev. Genom. Human. Genet : Millions of years

t Early globin gene Alpha chain Frog alpha Human alpha Human Beta Frog beta Beta chain First duplication event Second duplication event (speciation) Orthologues Paralogues Reminder: Orthologues & Paralogues

Figure 1 Regions of the human and mouse homologous genes: Coding exons (white), noncoding exons (gray}, introns (dark gray), and intergenic regions (black). Corresponding strong (white) and weak (gray) alignment regions of GLASS are shown connected with arrows. Dark lines connecting the alignment regions denote very weak or no alignment. The predicted coding regions of ROSETTA in human, and the corresponding regions in mouse, are shown (white) between the genes and the alignment regions.

Example Functional elements: Gene regulation? Chromatin structure?

Terminologies (Cont’d) –Synteny Two or more genes that are located in the same chromosome. Relevant within a species. –Conserved synteny Orthologs of genes that are syntenic in one species are also located on a single chromosome in a second species. Gene order is irrelevant. –Conserved segments/linkages In a segment of DNA, the order of multiple orthologous genes is the same in two species.

Image credit: U.S. Department of Energy Human Genome Program From:

Q: Why do gene pairs in syntenic regions have more significant E scores?

VISTA A genomic alignment and visualization program VISTA automatically finds an orthologue for your input sequence and performs a VISTA similarity plot Example: Rat BAC: gj (AC097115) For alignment, uses the AVID or LAGAN programs Quickly aligns 100’s of kb Can handle sequence in draft format Uses HMM-like algorithm to find strong anchors from a collection of maximal matches Uses VISTA browser – sequence alignment visualization tool Allows easy visualization of areas with high similarit.y Visualization is scalable – allows you to zoom in/out.

Gene: CARP – cardiac ankyrin repeat protein

There are many genomic alignment and visualization tools: BLASTZ/PipMaker : AVID/VISTA: LAGAN/Multi-LAGAN: AVID: BLAT: SSAHA: CONREAL: MUMmer:

Example output from PipMaker

Q: What general patterns can be seen? Q: Why do some of the factors correlate w/ gene density? Genomic view of simple sequence categories

Multi-species conservation

Conserved Non-Coding Sequences

What are those MCS? Regulatory –Transcription factor binding sites –miRNAs or miRNA target sites –Chromosome structure –Insulator sequences Structural –Replication –Recombination –Chromosome structure

Between-proteome comparisons Used to identify orthologs. Protein alignments involving a search of one protein from species A against the proteome of a species B Several different bioinformatic approaches have been used to make the comparison. High scoring reciprocal best hits. COGs (and KOGs) Genome-wide phylogenetic analysis

Using High scoring reciprocal best hits High scoring reciprocal best hits with the same domain structure are most likely orthologs –share common ancestry –likely to have the same function –Function likely to be more essential (replication, etc) –Genes are not unique to either organism. –E-value should be >0.01 and alignment should stretch over >60% of each protein High scoring hits with slightly different domain structures may be orthologous, but it difficult to tell due to common, conserved domains that have complicated histories Cluster analysis can help sort this out

Cut-off p-value:<e-10<e-20<e-50<e-100 Total num seq groups Num groups w/ > 2 members Num (%) of all (6217) yeast proteins in groups 2697 (40)1848 (30)888 (14)330 (5) Num (%) of all worm proteins in groups 3653 (19)2497 (13)1094 (6)370 (2) Worm v. yeast sequences

What is COG? The database of Clusters of Orthologous Groups of proteins (COGs) represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes. Each COG group consists of individual orthologous proteins or orthologous sets of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain.

A shortcut for identifying orthologs ---the genomic-specific best hit (BeT) Given a gene from one genome, the gene from another genome with the highest sequence similarity (the BeT) is the ortholog.

Algorithm of clustering orthologous groups (overview) All-against-all sequence comparison (gapped-BLAST) Merge triangles Input protein sequences paralogs Ortholog triangleCOG database Quality control Graph of BeTs

The ortholog triangle Multiple alignment A(a) C(c)B(b) Comparing pairwise alignments of AC and AB, we deduce the alignment of BC. Comparing the calculated and deduced alignment of BC; if the two alignments are consistent, the BeTs triangle is a triangle of orthologs and can initiate a new COG group.

Algorithm – merging triangles Merging triangles that had a common side until no new ones can be joined. A simple COG with two yeast paralogs isoleucyl-tRNA synthetase The candidates of orthologous sets were detected.

Functional and phylogenetic patterns E, E. coli; H, H. influenzae; G, M. genitalium; P, M. pneumoniae; C, Synechocystis sp.; M, M. jannaschii; Y, S. cerevisiae.

Phyletic patterns of COGs (2003) 74% of COGs show scattered distribution, which reflect frequent lineage-specific gene loss and horizontal gene transfer in prokarytic evolution. ~500 COGs

Representation of the 7 analyzed eukaryotic species in KOGs KOG: eukaryotic orthologous groups

Phylogenetic patterns of KOGs All, include representatives from each of the 7 analyzed species; All-Ec, include representatives from each of 6 species other than Encephalitozoon cuniculi; All animals, include representatives from three animal genomes only; All fungi, include representatives from two fungal genomes only.