28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007.

Slides:



Advertisements
Similar presentations
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
Advertisements

Basics of Comparative Genomics Dr G. P. S. Raghava.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
Molecular Clock I. Evolutionary rate Xuhua Xia
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
[Bejerano Fall10/11] 1 Any Project reflections?
[Bejerano Fall09/10] 1 Milestones due today. Anything to report?
Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), McLean,
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Journal club 06/27/08. Phylogenetic footprinting A technique used to identify TFBS within a non- coding region of DNA of interest by comparing it to the.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Sequencing a genome and Basic Sequence Alignment
Active Lecture Questions for BIOLOGY, Eighth Edition Neil Campbell & Jane Reece Questions prepared by Jung Choi, Georgia Institute of Technology Copyright.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Generating Diversity: how genes and genomes evolve Erin “They call me Dr. Worm” Friedman 29 September 2005.
1 The Interrupted Gene. Ex Biochem c3-interrupted gene Introduction Figure 3.1.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park.
PHYLOGENETICS CONTINUED TESTS BY TUESDAY BECAUSE SOME PROBLEMS WITH SCANTRONS.
Sequencing a genome and Basic Sequence Alignment
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Click to edit Master title style Click to edit Master subtitle style CLICKER QUESTIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry,
Molecular Biology in a Nutshell (via UCSC Genome Browser) Personalized Medicine: Understanding Your Own Genome Fall 2014.
Phylogenetic trees: Computer models of evolution Dr Dan Everett CSCI 1210.
Calculating branch lengths from distances. ABC A B C----- a b c.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Gene Regulations and Mutations
Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M Genome Res 17: example of complexities observed.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Mark D. Adams Dept. of Genetics 9/10/04
From Genomes to Genes Rui Alves.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Copyright © 2008 Pearson Education, Inc., publishing as Pearson Benjamin Cummings PowerPoint ® Lecture Presentations for Biology Eighth Edition Neil Campbell.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Chapter 3 The Interrupted Gene.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
Post-Darwinian Facts I. Physics II. Geology/Paleontology III. Genetics.
Evolutionary change involves genetic change   Phenotype   Genotype Study of evolution of macromolecules - nature of changes (in DNA, protein) & their.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Finding genes in the genome
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
CAMPBELL BIOLOGY IN FOCUS © 2014 Pearson Education, Inc. Urry Cain Wasserman Minorsky Jackson Reece 18 Genomes and Their Evolution Questions prepared by.
Published primate genome sequences - I Published primate genome sequences - II.
Who is smarter and does more tricks you or a bacteria? YouBacteria How does my DNA compare to a prokaryote? Show-off.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
All proteins consist of ________. 1.DNA molecules 2.RNA molecules 3.triglyceride chains 4.polypeptide chains
Considerations for multi-omics data integration Michael Tress CNIO,
In-Text Art, Ch. 16, p. 316 (1).
Types of Mutations.
Molecular Evolution.
Chapter 4 The Interrupted Gene.
Chapter 6 Clusters and Repeats.
Presentation transcript:

28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Vertebrate genome sequencing the Broad Institute of MIT (Massachusetts Institute of Technology) and Harvard the Human Genome Sequencing Center at the Baylor College of Medicine the Genome Sequencing Center at Washington University. the Sanger Center the Department of Energy ’ s (DOE ’ s) Joint Genome Institute the National Institute of Genetics in Japan.

Alignment: Similarities & differences between genome sequences: 1. functional noncoding regions 2. protein-coding genes 3. non-coding RNA genes

Aims 1. to more reliably identify functional elements via sequence alignment 2. To enhance the effectiveness of the disease-model species for experiment 3. To determine the course of evolution & reconstruct the ancestral genome sequence

April 2007: 17  old species data 6 updated old species 11 new species

>79% Heterogeneous mix

Coverage: 2 – >99% 16 – 5.1% ~ 8.5% 10 – ~2x (2x – 87.5%, 5x – 99.4%) Cloning bias …

Applications Application 1: indels in protein-coding regions Application 2: conservation of start and stop codons Application 3: phylogenetic extent of alignment of functional regions

Application 1 Indels accumulated at a uniform rate during the evolution? The phenotypic consequence of human- specific protein indels? Positions of potentially disease-associated indels resisted substitution over evolutionary time – interspecies conservation

6-bp indel near the start of PRNP Primate & glires PG D

Total Indel: 209 # of Indel / # per MY Parametric bootstrap test ---- significantly differ from hypothesis 4/MY 2/MY

Human specific protein indels

SULF1: human specific 3-bp insertion in exon 11 Replication slippage1.Fixed in humans 2.Very conserved region (retain 4Es over 2 billion years) 3.Without 3D data

GFM2: human specific 6-bp insertion 1.Not conserved region 2.This insertion only occurs in some human individuals 3.Similar protein 3D data implied no phynotypic consequence

Human replacement disease-associated amino acid mutations are overabundant occur predominantly in positions essential to the structure and function of the proteins Subramanian and Kumar, BMC Genomics 2006, 7:306

Disease-associated deletion More species considering Data from PhenCode Locus Variants PAH Simplified distance -- # of distinct aa.

6

>79% < Hard to identify precise gene boundaries based on comparative genomics data Drift away

Hypothesis 1: the CpG islands that are common near gene starts are more difficult to sequence

Hypothesis 2: Selection at the start codon might be more relaxed in genes with multiple promoters (alternate promoters) 4%1.65%

Hypothesis 3: the program may not have enough surrounding conserved sequence to reliably align the small initial coding exon around the start codon

Hypothesis 3: the program may not have enough surrounding conserved sequence to reliably align the small initial coding exon around the start codon similar

Conclusion A bias against CpG islands in the draft sequence combined with difficulty in aligning small initial coding exons does explain a great deal of the observed unalignability of start codons compared with stop codons Gene model based on multiple genomic alignments must be aware of the start codon

Background – finding functional elements conservation in noncoding regions is much more subject to evolutionary turnover than in protein-coding regions. Evolutionary(conservation) turnover -- Most studies tacitly equate homology of functional elements with sequence homology. This assumption is violated by the phenomenon of turnover, in which functionally equivalent elements reside at locations that are nonorthologous at the sequence level. Frith et al. Genome research 2006 More species genomics data --- higher resolution

coding exons of RefSeq genes 481 ultraconserved elements predicted regulatory regions(PRPs) 3900 putative transcriptional regulatory regions (pTRRs)

Alignability: the fraction that aligns with a designated comparison species

Human