[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.

Slides:



Advertisements
Similar presentations
Evolution and proteins You can see the effects of evolution, not only in the whole organism, but also in its molecules - DNA and protein For a mutation.
Advertisements

Evolution of genomes.
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 12:
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
CS273a Lecture 8, Win07, Batzoglou Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion.
Some new sequencing technologies. Molecular Inversion Probes.
[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Evolution and the Santa Cruz Genome Browser Jim Kent and the Genome Bioinformatics Group University of California Santa Cruz Pennsylvania State University.
Visualizing Genes and Evolution Jim Kent Genome Bioinformatics Group University of California Santa Cruz.
CS273a Lecture 10, Aut 08, Batzoglou Multiple Sequence Alignment.
Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger
Multiple Sequence Alignments. Lecture 12, Tuesday May 13, 2003 Reading Durbin’s book: Chapter Gusfield’s book: Chapter 14.1, 14.2, 14.5,
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
[Bejerano Fall09/10] 1 Milestones due today. Anything to report?
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
[Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
[Bejerano Fall09/10] 1 This Friday 10am Beckman B-200 Introduction to the UCSC Browser.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger.
Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor.
[Bejerano Spr06/07] 1 TTh 11:00-12:15 in Clark S361 Profs: Serafim Batzoglou, Gill Bejerano TAs: George Asimenos, Cory McLean.
[Bejerano Fall10/11] 1.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
[Bejerano Fall10/11] 1 Primer, Friday 10am, Beckman B-302 Ex. 1 is coming.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Sequence comparison: Local alignment
Sequencing a genome and Basic Sequence Alignment
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Comparative Genomics of the Eukaryotes
[Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser.
CS273A Lecture 11: Comparative Genomics II
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 11:
CSE 6406: Bioinformatics Algorithms. Course Outline
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Chapter 11 Assessing Pairwise Sequence Similarity: BLAST and FASTA (Lecture follows chapter pretty closely) This lecture is designed to introduce you to.
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 17:
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Sequencing a genome and Basic Sequence Alignment
[Bejerano Fall11/12] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TAs: Aaron Wenger & Jim Notwell.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Using blast to study gene evolution – an example.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Construction of Substitution matrices
1 MAVID: Constrained Ancestral Alignment of Multiple Sequence Author: Nicholas Bray and Lior Pachter.
Accessing and visualizing genomics data
Chapter 1 Principles of Life
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Chapter 1 Principles of Life. All organisms Are composed of a common set of chemical components. Genetic information that uses a nearly universal code.
CS273A Lecture 15: Inferring Evolution: Chains & Nets II
Genetics and Evolutionary Biology
Comparative Genomics.
Sequence comparison: Local alignment
CS273A Lecture 12: Inferring Evolution: Chains & Nets
CS273A Lecture 14: Inferring Evolution: Chains & Nets
CS273A Lecture 8: Inferring Evolution: Chains & Nets
The Human Genome Source Code
The Human Genome Source Code
The Human Genome Source Code
Presentation transcript:

[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

[Bejerano Aut07/08] 2 Lecture 6 Vertebrate Comparative Genomics Sequence Conservation and Function Chains & Nets

[Bejerano Aut07/08] 3 Meet Your Genome contd. [Human Molecular Genetics, 3rd Edition]

[Bejerano Aut07/08] 4 human mouse rat chimp chicken fugu zfish dog tetra Intelligent Designer human mouse rat chimp chicken fugu zfish dog tetra opossum cow macaque platypus opossum cow macaque platypus Comparative Genomics “Nothing in Biology Makes Sense Except in the Light of Evolution” Theodosius Dobzhansky t [Adam Siepel, Cornell]

[Bejerano Aut07/08] 5 DNA: Functional and Non-Functional DNA = linear molecule that carries instructions for making living organisms ~ long string(s) over a small alphabet Alphabet of four {A,C,G,T} Strings of length ACGTACGACTGACTAGCATCGACTACGACTAGCAC... genetic instructions: how to... when to... where to... “junk” DNA

[Bejerano Aut07/08] 6 One Cell, One Genome, One Replication Every cell holds a copy (actually 2) of all its DNA = its genome. The genome is replicated every cell division. The human body is made of ~10 14 cells. All originate from a single cell through repeated cell divisions. cell genome = all DNA chicken ≈ copies (DNA) of egg (DNA) chicken egg cell division DNA string

[Bejerano Aut07/08] 7 DNA Replication is Imperfect Small Scale: single letters are substituted, erased, added...ACGTACGACTGACTAGCATCGACTACGA... chicken egg...ACGTACGACTGACTAGCATCGACTACGA... functional junk TT CAT “anything goes” many changes are not tolerated chicken thus, sequence conservation over generations implies function!

[Bejerano Aut07/08] 8 Sequence Conservation implies Function (but which function/s?...) human another species common ancestor...CTTTGCGA-TGAGTAGCATCTACTATTT......ACGTGGGACTGACTA-CATCGACTACGA... functional region! Comparative Genomics of Distantly related species: Note: the inverse “no conservation  no function” is a much weaker statement given current knowledge

[Bejerano Aut07/08] 9 Our Place in the Tree of Life [Human Molecular Genetics, 3rd Edition]  you are here Which species to compare to? Too close and purifying selection will be largely indistinguishable from the neutral rate. Too far and many functional orthologs will diverge beyond our ability to accurately align them.

[Bejerano Aut07/08] 10 Metazoans (multi-cellular organisms) [Human Molecular Genetics, 3rd Edition]  you are here

[Bejerano Aut07/08] 11 Vertebrates: what to sequence? [Human Molecular Genetics, 3rd Edition]  you are here, Opossum, Lizard, Stickleback too far sweet spot too close

[Bejerano Aut07/08] 12 The Dawn of Whole Genome Comparative Genomics % DNA alignable 95% coding genes shared

[Bejerano Aut07/08] 13, Opossum, Lizard, Stickleback Phylogenetic Shadowing [Human Molecular Genetics, 3rd Edition]  you are here too close “too close” can actually be a boon if you have enough closely related genomes

[Bejerano Aut07/08] 14 More Species Have Joined Since Are you sure it’s all orthologous DNA??

[Bejerano Aut07/08] 15 Paralogy & Orthology Chains & Nets

16 Chaining Alignments Chaining bridges the gulf between syntenic blocks and base-by-base alignments. Local alignments tend to break at transposon insertions, inversions, duplications, etc. Global alignments tend to force non-homologous bases to align. Chaining is a rigorous way of joining together local alignments into larger structures. [Jim Kent’s slides]

17 Chains join together related local alignments Protease Regulatory Subunit 3

[Bejerano Aut07/08] 18 Chains a chain is a sequence of gapless aligned blocks, where there must be no overlaps of blocks' target or query coords within the chain. Within a chain, target and query coords are monotonically non- decreasing. (i.e. always increasing or flat) double-sided gaps are a new capability (blastz can't do that) that allow extremely long chains to be constructed. not just orthologs, but paralogs too, can result in good chains. but that's useful! chains should be symmetrical -- e.g. swap human-mouse -> mouse- human chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments. chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done. chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs). [Angie Hinrichs, UCSC wiki]

19 Affine penalties are too harsh for long gaps Log count of gaps vs. size of gaps in mouse/human alignment correlated with sizes of transposon relics. Affine gap scores model red/blue plots as straight lines.

20 Before and After Chaining

21 Chaining Algorithm Input - blocks of gapless alignments from blastz Dynamic program based on the recurrence relationship: score(B i ) = max(score(B j ) + match(B i ) - gap(B i, B j )) Uses Miller’s KD-tree algorithm to minimize which parts of dynamic programming graph to traverse. Timing is O(N logN), where N is number of blocks (which is in hundreds of thousands) j<i

22 Netting Alignments Commonly multiple mouse alignments can be found for a particular human region, particularly for coding regions. Net finds best match mouse match for each human region. Highest scoring chains are used first. Lower scoring chains fill in gaps within chains inducing a natural hierarchy.

23 Net Focuses on Ortholog

[Bejerano Aut07/08] 24 Nets a net is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, for several levels. a net is single-coverage for target but not for query. because it's single-coverage in the target, it's no longer symmetrical. the netter has two outputs, one of which we usually ignore: the target- centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single- cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and target coords. Reciprocal- best nets are symmetrical again. nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level. [Angie Hinrichs, UCSC wiki]

[Bejerano Aut07/08] 25 "LiftOver chains" are actually chains extracted from nets, or chains filtered by the netting process. [Angie Hinrichs, UCSC wiki]

26 Before and After Netting

27 Net highlights rearrangements A large gap in the top level of the net is filled by an inversion containing two genes. Numerous smaller gaps are filled in by local duplications and processed pseudo-genes.

28 Useful in finding pseudogenes Ensembl and Fgenesh++ automatic gene predictions confounded by numerous processed pseudogenes. Domain structure of resulting predicted protein must be interesting!

29 Mouse/Human Rearrangement Statistics Number of rearrangements of given type per megabase excluding known transposons.

30 A Rearrangement Hot Spot Rearrangements are not evenly distributed. Roughly 5% of the genome is in hot spots of rearrangements such as this one. This 350,000 base region is between two very long chains on chromosome 7.

[Bejerano Aut07/08] 31 Cautionary Note 1

[Bejerano Aut07/08] 32 Cautionary Note 2

[Bejerano Aut07/08] 33 Same Region… same in all the other fish

[Bejerano Aut07/08] 34 Orthology vs. Paralogy

[Bejerano Aut07/08] 35 Conservation Track Documentation

[Bejerano Aut07/08] 36 What People Largely Expected to Find gene (how to) control region (when & where) DNA proximal: in 10 3 letters genome.ucsc.edu 3kb

[Bejerano Aut07/08] 37 Human Genome: 3*10 9 letters What They Found [Science 2004 Breakthrough of the Year, 5 th runner up] 1.5% known function >50% junk 3x more functional DNA than known! compare to other species >5% human genome functional ~10 6 substrings do not code for protein What do they do then?