TREES. Trees HumanChimpGorilla = ChimpGorillaHuman ChimpHumanGorilla = HumanGorilla = Chimp HumanChimpGorilla ≠ ChimpHuman ≠ GorillaChimp.

Slides:



Advertisements
Similar presentations
Motivation “Nothing in biology makes sense except in the light of evolution” Christian Theodosius Dobzhansky.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.
Phylogenetic Trees Systematics, the scientific study of the diversity of organisms, reveals the evolutionary relationships between organisms. Taxonomy,
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Reconstructing and Using Phylogenies
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Classification systems have changed over time as information has increased. Section 2: Modern Classification K What I Know W What I Want to Find Out L.
1995 Host-pathogen (co-)evolution Virus A Primate B Primate C Primate A Virus C Virus B Host-pathogen co-evolution.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
5.4 Cladistics Nature of science:
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Phylogenetic trees as a visualization tools for evolutionary classification.
1. 2 Rooting the tree and giving length to branches.
HIV/AIDS as a Microcosm for the Study of Evolution.
Introduction to Bioinformatics Tutorial 4 Multiple Alignment and Phylogeny.
Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Phylogenetic reconstruction
Mol Biol Evol 2007 Vol 24: Which adaptations to human characterize the transfer from SIVcpz to HIV-1?
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
Introduction to Bioinformatics Molecular Phylogeny Lesson 5.
Phylogenetic trees. ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp Trees.
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
TREES. ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp Trees.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
The Origin of HIV Neelam Rani, Vijaya Alla. What is HIV? Group: Group VI (ssRNA) Family: Retroviridae Genus: Lentivirus Species: HIV-1, HIV-2.
Phylogenetic trees Sushmita Roy BMI/CS 576
Origins of HIV Dr. Matthew Marsden, Ph.D. UCLA School of Medicine
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
HIV & AIDS Pages ; IB Topic 6.3. Turn and Talk What do you know or think of HIV & AIDS?
The evolution of HIV Why is HIV fatal?. Lethal strains are favored, due to “Short sighted” evolution within hosts Transmission rate advantages.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
A few words on HIV The virus = HIV The disease = AIDS (Aquired Immunodeficiency Syndrome) First recognized clinically in 1981 By 1992, it had become.
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Phylogeny GENE why is coalescent theory important for understanding phylogenetics (species trees)? coalescent theory lets us test our assumptions.
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
CHARACTERS USED IN RECONSTRUCTING PHYLOGENETIC TREES 1. Morphological “ Tiktaalik is the sister group of Acanthostega + Ichthyostega in one of the two.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Systematics and Phylogenetics Ch. 23.1, 23.2, 23.4, 23.5, and 23.7.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building”
Direct evidence of extensive diversity of HIV-1 in Kinshasa by 1960
Introduction to Bioinformatics Resources for DNA Barcoding
Evolutionary genomics can now be applied beyond ‘model’ organisms
Methods of molecular phylogeny
Phylogeny.
Unit Genomic sequencing
Presentation transcript:

TREES

Trees HumanChimpGorilla = ChimpGorillaHuman ChimpHumanGorilla = HumanGorilla = Chimp HumanChimpGorilla ≠ ChimpHuman ≠ GorillaChimp

Same thing… s4s5 s1 s3 s2 s4s5 s1 s3 s2 =

The maximum parsimony principle Evaluation of the tree topology

Genes: 0 = absent, 1 = present speciesg1g2g3g4g5g6 s s s s s

s1s4s3 s2 s5 Evaluate this tree…

s1s4s3s2s5 Gene number

s1s4s3s2s5 Gene number 1, Option number

s1s4s3s2s5 Gene number 1, Option number 2. Number of changes for g1 =

s1s4s3 s2 s5 Gene number 2, Option number

s1s4s3 s2 s5 Gene number 2, Option number

s1s4s3 s2 s5 Gene number 2, Option number Number of changes for g2 = 2

s1s4s3 s2 s5 Gene number 3, Option number

s1s4s3 s2 s5 Gene number 3, Option number Number of changes for g3 = 1

s1s4s3 s2 s5 Gene number 4, Option number

s1s4s3 s2 s5 Gene number 4, Option number Number of changes for g4 = 2

Gene number 5 is the same as Gene number 4 Number of changes for g5 = 2

s1s4s3 s2 s5 Gene number 6, 1option only: Number of changes for g6 = 1

Sum of changes Number of changes for g6 = 1 Number of changes for g5 = 2 Number of changes for g4 = 2 Number of changes for g3 = 1 Number of changes for g2 = 2 Sum of changes for this tree topology = 9 Can we do better ??? Number of changes for g1 = 1

s1s4s3 s2 s5 The MP (most parsimonious) tree: Sum of changes for this tree topology = 8

How many rooted trees? ab abcbaccab N=3, TR(3) = 3 bcd a cbd a dbc a acd b cad b TR = “TREE ROOTED” N=2, TR(2) = 1 dac b abd c bad c dab c abc dbac d cab d bcd a cbd a dbc a N=4, TR(4) = 15

How many rooted trees 2 sequences:1 tree 3 sequences3 trees 4 sequences3*5=15 trees 5 sequences3*5*7=105 trees. … TR(n) = 1*3*5*7*…..*(2n-3)

Rooting the tree

Rooted vs. unrooted trees

The position of the root does not affect the MP score. Rooted vs. Unrooted:

s1s4s3s2s5 Gene number 1, Option number Intuition why rooting doesn’t change the score The change will always be on the same branch, no matter where the root is positioned… 1

How can we root the tree? we want rooted trees!

Gorilla gorilla (Gorilla) Homo sapiens (human) Pan troglodytes (Chimpanzee) Gallus gallus (chicken)

Evaluate all 3 possible UNROOTED trees: Human Chimp Chicken Gorilla Human Gorilla Chimp Chicken Human Chicken Chimp Gorilla MP tree

Rooting based on a priori knowledge: Human Chimp Chicken Gorilla HumanChimpChickenGorilla

Ingroup / Outgroup: HumanChimp Chicken Gorilla INGROUP OUTGROUP

Monophyletic groups HumanChimp Chicken Gorilla The Gorilla+Human+Chimp are monophyletic

How to efficiently compute the MP score of a tree

The Fitch algorithm (1971): AG C C A HumanChimp Chicken Gorilla Duck {A,G} {A,C,G} {A,C} Post-order tree scan. In each node, if the intersection between the child-nodes is empty: we apply a union operator. Otherwise, an intersection.

Number of changes AG C C A HumanChimp Chicken Gorilla Duck {A,G} {A,C,G} {A,C} Total number of changes = number of union operators.

Parsimony has many shortcomings. To name a few: (1) All changes are counted the same, which is not true for biological systems (Leu->Ile is much more likely than Leu->His). (2) Cannot take biological context into account (secondary structures, dependencies among sites, evolutionary distances between the analyzed organisms, etc). (3) Statistical basis questionable.

Alternative: MAXIMUM-LIKELIHOOD METHOD

Maximum likelihood uses a probabilistic model of evolution Each amino acid has a certain probability to change and this probability depends on the evolutionary distance. Evolutionary distances are inferred from the entire set of sequences.

Evolutionary distances Positions in an alignment can be conserved due to two reasons. Either because of functional constraints, or because a short evolutionary time elapsed since the divergence of the organisms. 5 replacements in 10 positions between 2 chimps, is considered very variable. 5 replacements between human and cucumber, is not considered too variable… Maximum likelihood takes this information into account.

Maximum ParsimonyMaximum Likelihood All changes are considered the same Different probabilities to different types of substitutions Statistically questionable Statistically robust Ignores biological context Accounts for biological context

The likelihood computations t1t1 t5t5 t3t3 X C K t2t2 Z Y MA t6t6 t4t4 With likelihood models we can: 1.Infer the most likely phylogenetic tree 2.Compute conservation for each site

Maximum likelihood tree reconstruction This is incredibly difficult (and challenging) from the computational point of view, but efficient algorithms to find approximate solutions were developed.

Two steps: 1.Compute a distance D(i,j) between any two sequences i and j. 2.Find the tree that agrees most with the distance table. Tree reconstruction using distance based methods

Neighbor-joining is based on Star decomposition A C B D E Red: best pair to group together D A D (C,B) A E ((C,B),E) In each step we cluster a pair so that the sum of branches is minimal

A few words on Human Immunodeficiency Virus (HIV) The virus = HIV The disease/syndrome = Aquired Immunodeficiency First recognized clinically in By 1992, it had become the major cause of death in individuals of years of age in the U.S.

HIV Till Dec 2002: 20 million people died of AIDS. Infected in 2002: 5 millions. Number of currently infected: ~42 millions 1 out of every 100 adults of age in the world population.

HIV HIV is the leading cause of death in sub- Sharan Africa. In some parts of this region 25-30% of the population is infected. 1 out of 3 children in these areas lost at least one of his parents.

Sub-Saharan Africa refers to the territories south to the Sahara. In the past the term ‘ Black Africa ’ has also been used to refer to the same region however today it is obsolete due to its ” politically incorrectness ” Tropical Africa might be taken as an alternative label of the same region however it excludes South Africa, which lies outside the tropics.

HIV is a lentivirus Species = HIV Genus = Lentiviruses Family = Retroviridae Lentiviruses have long incubation time, and are thus called “slow viruses”.

HIV-1 and HIV-2 In 1986, a distinct type of HIV prevalent in certain regions of West Africa was discovered and was termed HIV type 2. Individuals infected with type 2 also had AIDS, but had longer incubation time and lower morbidity.

Morbidity vs. Mortality Morbidity: the prevalence of a disease: שיעור התחלואה The probability that a randomly selected person out of the entire population is ill, at time t.

Morbidity vs. Mortality Mortality: Deaths from a disease or at general Mortality rate = Death rate שיעור התמותה

Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes Nature Vol Pages:

Five lines of evidence have been used to substantiate zoonotic transmission of primate lentivirus: 1. Similarities in viral genome organization; 2. Phylogenetic relatedness; 3. Geographic coincidence; 4. Plausible routes of transmission; 5. Prevalence in the natural host.

For HIV-2, a virus (SIVsm) that is genomically indistinguishable and phylogenetically closely related was found in substantial numbers of wild-living sooty mangabeys whose natural habitat coincides with the epicenter of the HIV- 2 epidemic

מנגבי, קוף ארוך זנב מסוג סרקוסבוס מצוי באזורי היערות של אפריקה

Close contact between sooty mangabeys and humans is common because these monkey are hunted for food and kept as pets. No fewer than six independent transmissions of SIVsm to humans have been proposed. The origin of HIV-1 is much less certain.

HIV-1 is most similar in sequence and genomic organization to viruses found in chimpanzees (SIVcpz).

BUT, there are several doubts casting the theory that chimpanzees are the natural host and reservoir for HIV-1 1.There is a wide spectrum of diversity between HIV-1 and SIVcpz. 2. An apparent low prevalence of SIVcpz infection in wild-living animals. 3. The presence of chimpanzees in geographic regions of Africa where AIDS was not initially recognized.

Rather, it has been suggested that another, yet unidentified, primate species could be the natural host for SIVcpz and HIV-1.

“We recently identified a fourth chimpanzee with natural SIVcpz infection…” This animal (Marilyn) was wild-caught in Africa (county of origin unknown), exported to the United States as an infant, and used as a breeding female in a primate facility until her death at age 26. Marilyn

During a serosurvey in 1985, Marilyn was the only chimpanzee of 98 tested who had antibodies strongly reactive against HIV-1 by enzyme-linked immunosorbent assay (ELISA) and western immunoblot. HOW was the SIV found

Maybe Marylin was infected with HIV during her stay in the U.S.? “She has never been used in AIDS research and had not received human blood products after She died in 1985 after giving birth to still-born twins.”

Endometritis: דלקת רירית הרחם Sepsis: אלח דם “An autopsy revealed endometritis, retained placental elements and sepsis as the final cause of death. Depletion of lymhoid tissues was not noted.” To convince that she did not have AIDS…

“PCR was used to amplify HIV- or SIV-related DNA sequences directly from uncultured (frozen) spleen and lymph-node tissue obtained at the autopsy in order to characterize the infection responsible for Marilyn’s HIV-1 seropositivity.”

Amplification and sequence analysis of subgenomic gag (508 base pairs (bp)) and pol (766 bp) fragments revealed the presence of a virus related to, but distinct from, known SIVcpz and HIV-1 strains.

PCR was used to amplify and sequence four overlapping subgenomic fragments that together comprised a complete proviral genome. The genome was termed SIVcpzUS.

Provirus The "provirus" is the form of the virus which is capable of being integrated into the host genome. In the case of HIV it means the DNA "copy" of the HIV genome (HIV normally carries its genes around in RNA form).

Provirus As far as the host cell's cellular machinery is concerned, this extra DNA is not different from the self DNA.

Only three other SIVcpz strains have been reported: Two from animals wild-caught in Gabon (SIVcpzGAB1 and SIVcpzGAB2) One from a chimpanzee exported to Belgium from Zaire (SIVcpzANT).

SIVcpzGAB1 and SIVcpzANT have been sequenced completely, but only 280bp of the pol sequence are available for SIVcpzGAB2.

To determine the evolutionary relationships of SIVcpzUS to these and other HIV and SIV sequences: 1.Sequences from the HIV sequence database ( were downloaded. 2.Neighbour-joining was used to construct the tree, based on the full-length Pol sequences. 3.Maximum likelihood was also used and “yielded very similar topologies”

The neighbour-joining method was applied to protein- sequence distances calculated by the method of Kimura. Clade support values were computed with 1,000 bootstrap replicates. NJ computations were computed using the CLUSTAL_X program.

These analyses identified SIVcpzUS unambiguously as a new member of the HIV-1/SIVcpz group of viruses.