Lecture 6 : More trees 9/21/09
Biology retreat 2009
Homework - nucleotide blast ACTGCGTTAC ACTGCCCTACT
Tblastn T A L ACTGCCCTACC T A L L P Y C P T
Tblastn TGACGGGATGG T A L ACTGCCCTACC T A L L P Y C P T
Tblastn = 3 + 3 = 6 comparisons CCATCCCGTCA T A L GGTAGGGCAGT G R A V G Q . G S
Nature paper on retinal gene therapy Fig 1 a shows the construct they made in the virus to inject in the eye. It includes the LCR = locus control regions, PP = proximal promoter, RHLOPS = recombinant human long wavelength opsin. 1b shows the light they illuminate the eye with to test for response to red wavelengths. 1c shows the response in the retina. The small inset shows the response 16 weeks after inject. The larger image is 40 weeks after inject - so it takes a lot of time for the vector to turnon.
Dichromats
Trichromats
Squirrel monkey tested to see if it can discriminate colors from gray Color presented amongst gray bgd Monkey gets juice reward When the threshold gets very high, the monkey can not discriminate colors in those regions. They appear gray and so can not be distinguished from the gray dot background. The color appearance and the white point shifts depending on if the monkey is missing the long wavelength gene (top) or the medium wavelength gene (bottom left). If the monkey has al three visual pigments, it can discriminate colors across the full spectrum. Color appearance for dichromats Color appearance for trichromat
Squirrel monkey before and after treatment with virus containing Human LWS gene It takes about 20 weeks for the virus with the new gene to build up such that the monkey gets cones sensitive to the longer wavelengths and can distiinguish the green colors (circles) from the gray background. Enclosed triangle and square are untreated dichromat which does not improve. Each monkey is tested before treatment (circles). Then they are tested after treatment (blue dots). The color confusion region around 490 nm goes away and the monkey can distinguish colors across the full spectrum. Note: Pink points are from trichromatic females
Big ideas How do genomes evolve? How does that impact an organism? What forces drive this evolution? Small details How do we get gene sequences? How do we compare them?
Questions for today Making trees What can you learn from a tree? How do we root trees? Programs for making trees
Last time Parsimony Distance Count # changes in sequence for a given tree Tree with smallest # of changes is best = most parsimonius Distance Calculate %difference = distance between sequences Sequences that are most similar are closest on tree
Maximum likelihood methods Assume explicit model for how DNA evolves = rates of nucleotide change Many models of different complexity for how DNA can change Fit model to ALL data Shortest tree is best Branch lengths = time Rates of DNA change
A T G C DNA models Transition, A G C T Transversion, A C, T G C, T
DNA bases Purines Pyrimadines Transitions Adenine Guanine Cytosine Thymine Transitions A-G C-T Mutations are easy between purines or between pyrimadines Easy to lose CH3 group from T and get a C so T->C mutation rates are pretty high. This causes A->G mutation on other strand.
A T G C DNA models Transition, A G C T Transversion, A C, T G C, T Different models of different complexity Many models will include a transition /transversion ratio
Matrix Give probability that change from one base to another = symmetric A G C T Stay A G C T GA Stay G GC GT CA CG Stay C CT TA TG TC Stay T Starting base This is change matrix
Jukes Cantor, 1969 Transitions = transversions, no difference Equal base frequencies A G C T 1-3 If alpha is rate or probability you change to something else, then 1 - 3 alpha is probability that you stay the same
Kimura 2 parameter, 1980 Transitions and transversions differ Equal frequencies A G C T 1- Here we account for the fact that transitions and transversion occur at different rates.
Felsenstein 1981 Transitions = transversions Unequal frequencies A G C fA fA fG fG fC fC fT fT
Hasegawa Kishino Yano HKY 1985 Transitions and transversions differ Unequal frequencies A G C T fA(1- fA fA fG fG(1- fG fC fC(1- fC fT fT fT(1-
Tamura Nei 1993 Transitions and transversions differ Unequal frequencies Different sites can vary at different rates Gamma distribution to describe rate differences
Maximum likelihood Find most likely explanation for the data given the model The tree topology is one of the variables to be fit Rate matrix is another Best approach Fits all the data Requires A LOT of computer time
Can also build ML trees for proteins Use matrix which describes rate of change from one AA to another
Bayesian methods Also a maximum likelihood approach Takes what people know about parameter values and uses it as a posteriori probability. Does ML fit Gives confidence levels of fit Unfortunately, people often don’t know what values to use. All the rage - but may over estimate how well it does
Q2. What can you learn from a tree?
Phylogenetic time travel
Phylogenetics Compare sequences and determine the relatedness of things Calculate % similarity of DNA or AA sequences Draw relatedness as a tree Human Mouse Bird Human Mouse Bird
Vertebrates Mammals Amphibians Marsupials Bird Reptiles Cartilagenous fish Bony fish Jawless fish Primates
Vertebrate divergence times Mammals, 100 MY Fish, 450 MY The time scales show the times to most recent common ancestor. Cartilagenous fish = sharks, rays Agnatha = jawless fish = lamprey, hagfish Vertebrates are 500-600 MY old Mammals are 100 MY Tetrapods are 400 MY Kumar and Hedges 1998
Trees can tell you about genes Which organisms have the gene? Where did the gene come from? What happens to the gene once it’s there? Duplicate - tandem - mRNA can be inserted Lost
Default expectation - if gene arose early in vertebrates, all species will have a copy and gene will be related in same way as organisms Dog Gene A Opossum Gene A Chicken Gene A Frog Gene A Zebrafish Gene A
Examine whether a gene exists in all organisms Dog Gene A Opossum Gene A Chicken Gene A Frog Gene A Zebrafish No A Gained In sequencing gene A, we find out that zebrafish does not have any copy of gene A. However, all the other species have one copy. This suggests the gene arose after fish separated from the rest of the vertebrates. Amphibians, reptiles/birds, marsupials and mammals are called tetrapods as they have 4 limbs. This means this gene arose after fish and tetrapods diverged.
Examine whether a gene exists in all organisms Mouse Platypus Finch Frog Pufferfish Gene loss
What is happening? Dog Gene A Human Gene A Chicken Gene A Frog Gene A In this tree, zebrafish has two copies of the gene, but all the other organisms only have one copy. This gene duplication is specific to fish - only they had the gene duplicate. Frog Gene A Zebrafish Gene A1 Gene duplication Zebrafish Gene A2
Dog Gene duplication Human Chicken GeneA2 Frog Zebrafish GeneA1 Gene A In this tree, lamprey has only one copy of gene A. However, all the other vertebrates have two copies. These copies are very similar, but the two copies are unique from each other. One example of this could be the red opsin gene and the blue opsin gene. This gene duplication occurred early in history of vertebrates. Lamprey are vertebrates that do not have jaws. The rest of the vertebrates shown have jaws. Therefore, this figure shows that the gene duplicated between time of jawless and jawed vertebrates GeneA1 Gene A Lamprey
Dog Human Frog Chicken Frog Zebrafish Lamprey
Gene duplication and then losses Human Chicken Frog Zebrafish Dog Lamprey
What does this tree tell us? LWS ZebrafishR RH2 RH1 SWS2 SWS1 Gaustralis = lamprey
LWS RH2 RH1 SWS2 SWS1 Gaustralis = lamprey ZebrafishR Opsin classes evolved early in history of vertebrates Rods evolved from cones Gene loss has occurred Gene duplications are everywhere SWS2 SWS1
Conclusions from opsin tree #1 5 opsin classes arose very early in vertebrates SWS1 - very short wavelength sensitive SWS2 - short wavelength sensitive RH2 - like rhopopsin but in cones LWS - long wavelength sensitive RH1 - rhodopsin rods cones
Range of visual pigment lmax SWS2 LWS SWS1 RH2 This slide shows that each of the cone opsin classes occurs in a particular part of the spectrum. The line above each pigment curve is meant to show that the lambda max can be tuned over the range corresponding to the line length. Any one opsin sequence will correspond to just one lambda max. However, when you consider all of the SWS1 genes that occur throughout vertebrates, they produce pigments that range from about 360 to 400 nm. The SWS1 gene for any given animal, will have a particular set of amino acids that are directed into the retinal binding pocket. A species with a short SWS1 (say 360 nm) will have a few key changes from a longer SWS1 gene (whose lambda max is around 400 nm). The same is true of the other cone opsin classes and also the rod opsin class.
Conclusion #2 Rod opsins evolved from cone opsins RH1 RH2 SWS2 SWS1 This tree summarized the relationships of the major opsin groups and shows that the cone opsins occur earliest in the tree and the rod opsin, RH1, evolved from cone opsins. SWS1 LWS Rhodopsin is Greek for rose + vision refers to color of pigment when look at dissected retina
LWS RH2 RH1 SWS2 SWS1 Gaustralis = lamprey ZebrafishR Opsin classes evolved early in history of vertebrates Rods evolved from cones Gene loss has occurred Gene duplications are everywhere SWS2 SWS1
Conclusion #3 Mammals lost two of the opsin classes Mammals have LWS, SWS1 and RH1 Only 2 cone opsins (dichromat) Dogs, cats, mice, rats, horses, goats, pigs … Mammals went through “nocturnal period” during reign of dinosaurs
Q3. How do you root a tree?
Rooting the tree Phylogenetics tells you how things are related Doesn’t tell you anything about what came first Human Chimp Gorilla Orangutan
Define outgroup Human Gorilla Orangutan Chimp Human Chimp Gives you an ordered tree of relationships Gorilla Orangutan
What if an in-group is selected as the outgroup? Human Orangutan Chimp Gorilla Gorilla Human Orangutan Chimp Everything is inverted
How to pick the root Use a more primitive organism For vertebrates invertebrate For jawed vertebrates lamprey For mammals marsupial For primates mammal If studying gene families For group A use gene from group B
Cladogram vs phylogram Cladogram only shows relationships No information on sequence difference Phylogram branch lengths are proportional to difference Human Human Chimp Chimp Mouse Mouse Chick Chick Cladogram Phylogram
Line lengths are proportional to how different sequences are Human Chimp Dog Humans and chimps are 5 MY since common ancestor so genes will be very similar Dogs and other mammals are about 100 MY so genes will be 20x more different from human as compared to human-chimp
Gene can evolve quickly Phylogram Rat Human Chicken Salamander Shark
Phylogram Human Distances matter Distances don’t count!! Chimp Human - chimp distance Chimp Gorilla When we draw the phylogram, we are allowed to add whatever vertical lines we need to space things out and make them easy to look at. Only the horizontal line lengths matter. These are proportional to the distance (sequence difference) between two genes. Orangutan