Download presentation
Presentation is loading. Please wait.
Published byRegina Reeves Modified over 6 years ago
1
Dilvan Moreira (based on Prof. André Carvalho presentation)
PHYLOGENY Dilvan Moreira (based on Prof. André Carvalho presentation)
2
Reading Introduction to Computational Genomics: A Case Studies Approach Chapter 7
3
Topics SARS Origin and Evolution of the Epidemic Phylogenetic Analysis
Phylogenetic Tree Construction Neighbor-Joining Algorithm Case Study André de Carvalho - ICMC/USP 15/05/2018
4
Epidemic of SARS Severe Acute Respiratory Syndrome
Severe respiratory disease that hit the world in 2003 Caused by SARS coronavirus (SARS-COV) The term corona comes from the crown that appears when the virus is observed in an electron microscope It's not the bird flu André de Carvalho - ICMC/USP 15/05/2018
5
Epidemic of SARS Coronavirus
Pathogens which cause a variety of diseases in animals Any organism capable of causing infectious disease They may exhibit frequent mutations and thus infect other species Other coronaviruses have been identified as causing hepatitis in rats and gastroenteritis in pigs It is the most common viruses in veterinary pathology André de Carvalho - ICMC/USP 15/05/2018
6
Epidemic of SARS February 2003
French Hospital in Hanoi, Vietnam, called WHO to report a similar flu infection Highly contagious WHO expert in infectious diseases, Dr Carlo Urbani , concluded that it was a new and unusual pathogen André de Carvalho - ICMC/USP 15/05/2018
7
Epidemic of SARS February 2003
During his stay, Dr. Urbani collected exams, examined hospital documents and organized quarantine patients He was the first to identify and describe the new disease, SARS Fever, dry cough, asthma, progressive worsening of the respiratory system, death by respiratory system failure In three weeks, Dr. Urbani and five other hospital health workers have died of SARS André de Carvalho - ICMC/USP 15/05/2018
8
Epidemic of SARS March 2003 WHO emitted a global alert, alerting that SARS was a risk to global health André de Carvalho - ICMC/USP 15/05/2018
9
Epidemic of SARS Hanoi’s Hospital March 2003
André de Carvalho - ICMC/USP 15/05/2018
10
Origin of SARS Epidemic
First cases occurred in 11/2002 in the province of Guangdond, China 106 people got sick in a hospital in Guangzhou city Rest of the world did not know that A doctor of this hospital visited Hong Kong on February 21, 2003 He stayed on the 9th floor of the city hotel He also got sick and died, diagnosed with pneumonia Several people who were on the 9th floor became transmitters of the disease André de Carvalho - ICMC/USP 15/05/2018
11
André de Carvalho - ICMC/USP
15/05/2018
12
Origin of SARS Epidemic
One of the visitors of the 9th floor was an American executive First patient treated in a French hospital in Hanoi Infected 80 people before dying Other visitors of the 9th floor brought the disease to Canada, Singapore and the US In April 2003, 4,300 cases were reported with 250 deaths in 25 countries André de Carvalho - ICMC/USP 15/05/2018
13
Origin of SARS Epidemic
March 2003 Earlier this month, WHO coordinated international research In the end of the month, a new viruses that caused SARS was identified independently in: Germany, Canada , USA and Hong Kong SARS coronavirus is a viral RNA (such as HIV) Common in humans and animals, coronaviruses cause ~25% of all upper respiratory infections Eg: common cold André de Carvalho - ICMC/USP 15/05/2018
14
SARS Number of Reported Cases
André de Carvalho - ICMC/USP 15/05/2018
15
SARS Statistics
16
Coronavirus SARS Source: BBC André de Carvalho - ICMC/USP
17
Coronavirus SARS André de Carvalho - ICMC/USP
18
Origin of SARS Epidemic
April 2003 Canadian laboratory sequenced the RNA sequence of the SARS CoV Phylogenetic Analysis do vírus mostrou que o coronavirus mais próximo é o da civeta] Phylogenetic Analysis showed that the closest coronavirus is the one from civet Popular food in Guangdong André de Carvalho - ICMC/USP 15/05/2018
19
Origin of SARS Epidemic
May 2003 Two articles in Science presented the complete genome of the SARS CoV Genome contains 29,751 bp It is substantially different from all human CoVs Also different from CoVs birds - no relation to avian flu End of 2003 SARS has spread throughout the world André de Carvalho - ICMC/USP 15/05/2018
20
Phylogenetic Analysis of SARS
Phylogenetic Analysis can answer questions like: What kind of virus caused the original infection? What is the source of the infection? When and where the virus has crossed boundaries between species? What are the key changes that made this intersection possible? What is the trajectory followed to the spread of the virus? André de Carvalho - ICMC/USP 15/05/2018
21
Phylogenetic Analysis of SARS
To respond to the previous questions, we: Examine some key algorithms of Phylogenetic Analysis Apply these algorithms to SARS data Available in GenBank and on the book’s site André de Carvalho - ICMC/USP 15/05/2018
22
Trees and Evolution SARS advancing path can be represented by a tree All SARS virus that appeared in the world originated from the virus found in China New branches appear when the virus spreads Traditionally, evolutionary history linking groups of species has been represented by a tree Unique figure in the Darwin’s book "On the origin of species" André de Carvalho - ICMC/USP 15/05/2018
23
Phylogenetic Trees Orangutan Gorilla Chimpanzee Human
Source: Tree of the Life Website, University of Arizona Orangutan Gorilla Chimpanzee Human André de Carvalho - ICMC/USP 15/05/2018
24
Phylogenetic Trees Mother DNA: tctgcctc tctgcctc gatgcctc tctgcctcggg
gatgcatc gacgcctc gctgcctcggg gatgaatc gccgcctc gctaagcctcggg Current species André de Carvalho - ICMC/USP 15/05/2018
25
Phylogeny Study of the evolutionary relationship between different groups of organisms Species Populations Etc. Represented by the diagram in form of tree (phylogenetic tree) Cladistics analysis Usually based on morphological data André de Carvalho - ICMC/USP 15/05/2018
26
Cladistics André de Carvalho - ICMC/USP 15/05/2018
27
André de Carvalho - ICMC/USP
15/05/2018
28
Phylogenetic Trees Show the evolutionary relationship between different species or individuals Believed to have a common ancestor Cardiogram form Each node with descendants represents the most recent common ancestor of them Size of the edges corresponds to the time estimates André de Carvalho - ICMC/USP 15/05/2018
29
Phylogenetic Trees Each node is called a taxonomic unit (taxon, plural taxa) Internal nodes are hypothetical taxonomic units They can not be directly observed More complex relationships may take the form of networks André de Carvalho - ICMC/USP 15/05/2018
30
Structure of Phylogenetic Trees
Trees have two or more taxa Species or individual External nodes represent existing taxa Internal nodes represent their ancestors (usually extinct) Trees may be: Bifurcation Each internal node has a maximum of 2 children “Multifurcation” Each internal node can have more than 2 children Trees may or may not have root André de Carvalho - ICMC/USP 15/05/2018
31
Rooted Phylogenetic Trees
It is defined a special internal node called root Common ancestor to all other nodes All evolutionary paths lead to the root Branches are oriented from the root to the external nodes André de Carvalho - ICMC/USP 15/05/2018
32
Rooted Phylogenetic Trees
time Branch or edge Internal node External node (leaf) Drosophila human Puffer fish rat André de Carvalho - ICMC/USP 15/05/2018
33
Rooted Phylogenetic Trees
André de Carvalho - ICMC/USP 15/05/2018
34
Non Rooted Phylogenetic Trees
Branches do not have orientation Show topological relationship between taxa without identifying a common ancestor There are methods to define a root of a tree without roots Choose an edge to put a root node Requires external biological information or at least a guess to where to insert the root André de Carvalho - ICMC/USP 15/05/2018
35
Non Rooted Phylogenetic Trees
frog human Internal node Puffer fish rat External node (leaf) Branch or edge André de Carvalho - ICMC/USP 15/05/2018
36
Non Rooted Phylogenetic Trees
The root generally is defined including one or more taxa in the data set Which is known to be result of an older division Most distant relationship to each taxa This external taxon is called outgroup A branch where the outgroup links to the other contains the root node André de Carvalho - ICMC/USP 15/05/2018
37
Non Rooted Phylogenetic Trees
Drosophila root frog human Internal Node Puffer fish rat External Node Branch or edge André de Carvalho - ICMC/USP 15/05/2018
38
Non Rooted Phylogenetic Trees
Drosophila root frog human Internal Node Puffer fish rat External Node Branch or edge André de Carvalho - ICMC/USP 15/05/2018
39
Structure of Phylogenetic Trees
Rotation of the branches of an Internal Node does not alter the relationship between taxa = A B C B A C Invariant to rotation André de Carvalho - ICMC/USP 15/05/2018
40
Number of Possible Trees
Phylogenetic tree reconstruction from DNA sequences Complicated because of the large number of possible trees Possible trees without root (n 3 ) Possible trees with root (n 2 ) n: number of taxa André de Carvalho - ICMC/USP 15/05/2018
41
Tree Representation There are several non-graphical ways to represent a tree 9 8 7 3 6 2 1 4 5 4 5 1 2 Popular standard format: Newick: (((1,2),3),(4,5)) André de Carvalho - ICMC/USP 15/05/2018
42
Inferring Trees Until recently, the relationship between taxa was inferred by morphological characteristics DNA sequences are currently used Sequencing technology Mutations leave a trail Trees can be inferred from the similarity between homologous sequences André de Carvalho - ICMC/USP 15/05/2018
43
Inferring Trees Tree branches can have different sizes,
The greater the number of mutations, bigger the branch André de Carvalho - ICMC/USP 15/05/2018
44
Inferring Trees Given homologous sequences of a taxa group
There are several methods to reconstruct their phylogenetic relationships Methods can be divided into two groups: The ones that order all possible tree through some criteria to find the best one Those who build the tree directly from the data (without setting an optimization function) André de Carvalho - ICMC/USP 15/05/2018
45
Inferring Trees Methods that order possible trees
Criteria that (in general) seek for the tree with the fewest number of changes Because of the huge Number of Possible Trees: It may take much time to find the best tree Using approaches to accelerate taxa search, one can not find the better tree André de Carvalho - ICMC/USP 15/05/2018
46
Inferring Trees Methods that build from the data
A phylogenetic tree is constructed by methods that use algorithms and statistics Often based on computing the distance between pairs taxa Very popular because they are usually quick André de Carvalho - ICMC/USP 15/05/2018
47
Methods that build from the data
Method based on the most popular distance is the Neighbor-Joining Algorithm (NJ) Although not necessarily so well statistically supported as other methods Robust and accurate Guaranteed to infer the true tree if the used distances reflect the actual distance among sequences Results not guaranteed by other methods more sophisticated (statistically) André de Carvalho - ICMC/USP 15/05/2018
48
Methods Based in Distance
Given n taxa Building Distance Matrix among taxa If the branches of the tree have a specific size, distance between any two nodes can be easily computed Total size of the only way linking them Specifies distance among tree node leaf using additive distance André de Carvalho - ICMC/USP 15/05/2018
49
Methods Based in Distance
Additive distance Biologically, the additivity is an important property for a Distance Matrix The number of substitutions separating 2 taxa from their last common ancestor forms an additive distance Distance on the path from node i to node j The Jukes-Cantos Model is frequently used as a Substitution Model André de Carvalho - ICMC/USP 15/05/2018
50
Methods Based in Distance
Distance Matrix 9 8 7 L1 L2 L3 L4 L5 L L L L L 6 L1 L2 L3 L4 L5 André de Carvalho - ICMC/USP 15/05/2018
51
Neighbor-Joining Algorithm
Greedy algorithm It begins with a phylogeny star-shaped All taxa are directly connected to a single root node Iteratively combines pairs of nodes André de Carvalho - ICMC/USP 15/05/2018
52
Neighbor-Joining Algorithm
Key to the success of the algorithm: Criteria that defines how nodes are selected to be combined each iteration Identifies nodes that are topologically neighbors on the tree Selected taxa are combined into a taxon A new Distance Matrix is then calculated Process repeated until all taxa are combined Generated tree is rootless André de Carvalho - ICMC/USP 15/05/2018
53
Neighbor-Joining Algorithm
Calculation of the size of the branches Assume 3 taxa in a tree without root 3-points formula: Lx + Ly = dAB Lx + Lz = dAC Ly + Lz = dBC Lx = (dAB+dAC-dBC)/2 Ly = (dAB+dBC-dAC)/2 Lz = (dAC+dBC-dAB)/2 A B center C Lx Ly Lz 3-point formula: André de Carvalho - ICMC/USP 15/05/2018
54
Neighbor-Joining Algorithm
Uses 4-point condition to select neighboring nodes to be combined Assume that 1 and 2 are neighbors d(1,2) + d(i,j) < d(i,1) + d(2,j) Ri = ∑j d(i,j) M(i,j) = (n-2)d(i,j) – Ri – Rj M(i,j) < M(i,k) para todo k j 1 2 centro i Lx Ly Lz j Lq Degree of vizinhancidade Neighborhood degree André de Carvalho - ICMC/USP 15/05/2018
55
Neighbor-Joining Algorithm
Entry: nxn array of distances D and one outgroup Output: phylogenetic tree T with root Update table M using D and choosing the lower value of M to select two taxa to match Combine two taxa ti and tj in a new V node and use 3-points formula to update the Distance Matrix D' where ti e tj are replaced by V. Calculate the size of the branches of tk to V by using the 3-point formula, T(V,1) = ti e T(V,2) = tj , TD(ti) = L(ti,V) e TD(ti) = L(ti,V) Distance Matrixs D ' now has n-1 taxa. if there is 2 more taxa go 1: if there are two taxa , combines them by a size field d(ti,tj). Set the root node as the branch connecting the outgroup to the rest of the tree 5. André de Carvalho - ICMC/USP 15/05/2018
56
UPGMA NJ algorithm can be reduced to a simpler method, UPGMA, when M = D Unweighted Pair Group Method with Arithmetic Averages One of the first in methods based in distance Distance from the external node to the root is the same for all external nodes Ultrametricity Almost never valid for DNA sequences Can lead to incorrect inference of a tree André de Carvalho - ICMC/USP 15/05/2018
57
UPGMA A 2 2 Unweighted Pair Group Method with Arithmetic Averages
UPGMA One of the first Methods Based in Distance Branch length assumes “molecular clock” (ultrametricidade) Each son have equal lenght considering its father Ultrametricity extremally simple to find the branch lenght A C B A UPGMA Methods Based in the first Distance 2 2 Phylogenetic Trees
58
UPGMA Example: (Source: Mona Singh – U. Princeton) Phylogenetic Trees
59
UPGMA Unweighted Pair Group Method with Arithmetic Averages
Phylogenetic Trees
60
UPGMA Unweighted Pair Group Method with Arithmetic Averages
Phylogenetic Trees
61
UPGMA Unweighted Pair Group Method with Arithmetic Averages
If a tree with exact values exists? Example: Phylogenetic Trees
62
UPGMA Almost never valid for DNA sequences
Can lead to incorrect inference Phylogenetic Trees
63
SARS: Case Study Phylogenetic Analysis of the Epidemic of SARS
SARS coronavirus genome has 6 genes Host: civet What do we know? Epidemiological tree Date of Origin Source area André de Carvalho - ICMC/USP 15/05/2018
64
SARS: How do we know that came from Civeta?
André de Carvalho - ICMC/USP 15/05/2018
65
SARS: Distance Matrix The Distance Matrix was obtained:
calculating the genetic distance of the global sequence alignment of the spike gene using the Jukes-Cantor corrections
66
SARS: Epidemiological tree How does it spread? Outliner civeta
15/05/2018
67
SARS: When does it started??
With 95 % confidence , September 16, 2002 Genetics Distance of Civet SARS X Sample Collection Date
68
SARS: How do we know that it started in Guangding?
Epidemiologic Tree shows Guangdong Can we verify it in another way? Lower variation outside of Guangdong Province André de Carvalho - ICMC/USP 15/05/2018
69
Questions?
70
André de Carvalho - ICMC/USP
15/05/2018
71
Newick Format Matrix format: 1 2 3 A B C D Newick Format:
A B C D Newick Format: ((A, B), (C, D)); Size of the branches can be represented: ((A:1.0, B:1.0):2, (C:1, D:1):2); Names can be assigned to internal nodes: ((A, B)2, (C, D)3)1; 1 2 3 Convention A B C D André de Carvalho - ICMC/USP 15/05/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.