Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dilvan Moreira (based on Prof. André Carvalho presentation)

Similar presentations


Presentation on theme: "Dilvan Moreira (based on Prof. André Carvalho presentation)"— Presentation transcript:

1 Dilvan Moreira (based on Prof. André Carvalho presentation)
PHYLOGENY Dilvan Moreira (based on Prof. André Carvalho presentation)

2 Reading Introduction to Computational Genomics: A Case Studies Approach Chapter 7

3 Topics SARS Origin and Evolution of the Epidemic Phylogenetic Analysis
Phylogenetic Tree Construction Neighbor-Joining Algorithm Case Study André de Carvalho - ICMC/USP 15/05/2018

4 Epidemic of SARS Severe Acute Respiratory Syndrome
Severe respiratory disease that hit the world in 2003 Caused by SARS coronavirus (SARS-COV) The term corona comes from the crown that appears when the virus is observed in an electron microscope It's not the bird flu André de Carvalho - ICMC/USP 15/05/2018

5 Epidemic of SARS Coronavirus
Pathogens which cause a variety of diseases in animals Any organism capable of causing infectious disease They may exhibit frequent mutations and thus infect other species Other coronaviruses have been identified as causing hepatitis in rats and gastroenteritis in pigs It is the most common viruses in veterinary pathology André de Carvalho - ICMC/USP 15/05/2018

6 Epidemic of SARS February 2003
French Hospital in Hanoi, Vietnam, called WHO to report a similar flu infection Highly contagious WHO expert in infectious diseases, Dr Carlo Urbani , concluded that it was a new and unusual pathogen André de Carvalho - ICMC/USP 15/05/2018

7 Epidemic of SARS February 2003
During his stay, Dr. Urbani collected exams, examined hospital documents and organized quarantine patients He was the first to identify and describe the new disease, SARS Fever, dry cough, asthma, progressive worsening of the respiratory system, death by respiratory system failure In three weeks, Dr. Urbani and five other hospital health workers have died of SARS André de Carvalho - ICMC/USP 15/05/2018

8 Epidemic of SARS March 2003 WHO emitted a global alert, alerting that SARS was a risk to global health André de Carvalho - ICMC/USP 15/05/2018

9 Epidemic of SARS Hanoi’s Hospital March 2003
André de Carvalho - ICMC/USP 15/05/2018

10 Origin of SARS Epidemic
First cases occurred in 11/2002 in the province of Guangdond, China 106 people got sick in a hospital in Guangzhou city Rest of the world did not know that A doctor of this hospital visited Hong Kong on February 21, 2003 He stayed on the 9th floor of the city hotel He also got sick and died, diagnosed with pneumonia Several people who were on the 9th floor became transmitters of the disease André de Carvalho - ICMC/USP 15/05/2018

11 André de Carvalho - ICMC/USP
15/05/2018

12 Origin of SARS Epidemic
One of the visitors of the 9th floor was an American executive First patient treated in a French hospital in Hanoi Infected 80 people before dying Other visitors of the 9th floor brought the disease to Canada, Singapore and the US In April 2003, 4,300 cases were reported with 250 deaths in 25 countries André de Carvalho - ICMC/USP 15/05/2018

13 Origin of SARS Epidemic
March 2003 Earlier this month, WHO coordinated international research In the end of the month, a new viruses that caused SARS was identified independently in: Germany, Canada , USA and Hong Kong SARS coronavirus is a viral RNA (such as HIV) Common in humans and animals, coronaviruses cause ~25% of all upper respiratory infections Eg: common cold André de Carvalho - ICMC/USP 15/05/2018

14 SARS Number of Reported Cases
André de Carvalho - ICMC/USP 15/05/2018

15 SARS Statistics

16 Coronavirus SARS Source: BBC André de Carvalho - ICMC/USP

17 Coronavirus SARS André de Carvalho - ICMC/USP

18 Origin of SARS Epidemic
April 2003 Canadian laboratory sequenced the RNA sequence of the SARS CoV Phylogenetic Analysis do vírus mostrou que o coronavirus mais próximo é o da civeta] Phylogenetic Analysis showed that the closest coronavirus is the one from civet Popular food in Guangdong André de Carvalho - ICMC/USP 15/05/2018

19 Origin of SARS Epidemic
May 2003 Two articles in Science presented the complete genome of the SARS CoV Genome contains 29,751 bp It is substantially different from all human CoVs Also different from CoVs birds - no relation to avian flu End of 2003 SARS has spread throughout the world André de Carvalho - ICMC/USP 15/05/2018

20 Phylogenetic Analysis of SARS
Phylogenetic Analysis can answer questions like: What kind of virus caused the original infection? What is the source of the infection? When and where the virus has crossed boundaries between species? What are the key changes that made this intersection possible? What is the trajectory followed to the spread of the virus? André de Carvalho - ICMC/USP 15/05/2018

21 Phylogenetic Analysis of SARS
To respond to the previous questions, we: Examine some key algorithms of Phylogenetic Analysis Apply these algorithms to SARS data Available in GenBank and on the book’s site André de Carvalho - ICMC/USP 15/05/2018

22 Trees and Evolution SARS advancing path can be represented by a tree All SARS virus that appeared in the world originated from the virus found in China New branches appear when the virus spreads Traditionally, evolutionary history linking groups of species has been represented by a tree Unique figure in the Darwin’s book "On the origin of species" André de Carvalho - ICMC/USP 15/05/2018

23 Phylogenetic Trees Orangutan Gorilla Chimpanzee Human
Source: Tree of the Life Website, University of Arizona Orangutan Gorilla Chimpanzee Human André de Carvalho - ICMC/USP 15/05/2018

24 Phylogenetic Trees Mother DNA: tctgcctc tctgcctc gatgcctc tctgcctcggg
gatgcatc gacgcctc gctgcctcggg gatgaatc gccgcctc gctaagcctcggg Current species André de Carvalho - ICMC/USP 15/05/2018

25 Phylogeny Study of the evolutionary relationship between different groups of organisms Species Populations Etc. Represented by the diagram in form of tree (phylogenetic tree) Cladistics analysis Usually based on morphological data André de Carvalho - ICMC/USP 15/05/2018

26 Cladistics André de Carvalho - ICMC/USP 15/05/2018

27 André de Carvalho - ICMC/USP
15/05/2018

28 Phylogenetic Trees Show the evolutionary relationship between different species or individuals Believed to have a common ancestor Cardiogram form Each node with descendants represents the most recent common ancestor of them Size of the edges corresponds to the time estimates André de Carvalho - ICMC/USP 15/05/2018

29 Phylogenetic Trees Each node is called a taxonomic unit (taxon, plural taxa) Internal nodes are hypothetical taxonomic units They can not be directly observed More complex relationships may take the form of networks André de Carvalho - ICMC/USP 15/05/2018

30 Structure of Phylogenetic Trees
Trees have two or more taxa Species or individual External nodes represent existing taxa Internal nodes represent their ancestors (usually extinct) Trees may be: Bifurcation Each internal node has a maximum of 2 children “Multifurcation” Each internal node can have more than 2 children Trees may or may not have root André de Carvalho - ICMC/USP 15/05/2018

31 Rooted Phylogenetic Trees
It is defined a special internal node called root Common ancestor to all other nodes All evolutionary paths lead to the root Branches are oriented from the root to the external nodes André de Carvalho - ICMC/USP 15/05/2018

32 Rooted Phylogenetic Trees
time Branch or edge Internal node External node (leaf) Drosophila human Puffer fish rat André de Carvalho - ICMC/USP 15/05/2018

33 Rooted Phylogenetic Trees
André de Carvalho - ICMC/USP 15/05/2018

34 Non Rooted Phylogenetic Trees
Branches do not have orientation Show topological relationship between taxa without identifying a common ancestor There are methods to define a root of a tree without roots Choose an edge to put a root node Requires external biological information or at least a guess to where to insert the root André de Carvalho - ICMC/USP 15/05/2018

35 Non Rooted Phylogenetic Trees
frog human Internal node Puffer fish rat External node (leaf) Branch or edge André de Carvalho - ICMC/USP 15/05/2018

36 Non Rooted Phylogenetic Trees
The root generally is defined including one or more taxa in the data set Which is known to be result of an older division Most distant relationship to each taxa This external taxon is called outgroup A branch where the outgroup links to the other contains the root node André de Carvalho - ICMC/USP 15/05/2018

37 Non Rooted Phylogenetic Trees
Drosophila root frog human Internal Node Puffer fish rat External Node Branch or edge André de Carvalho - ICMC/USP 15/05/2018

38 Non Rooted Phylogenetic Trees
Drosophila root frog human Internal Node Puffer fish rat External Node Branch or edge André de Carvalho - ICMC/USP 15/05/2018

39 Structure of Phylogenetic Trees
Rotation of the branches of an Internal Node does not alter the relationship between taxa = A B C B A C Invariant to rotation André de Carvalho - ICMC/USP 15/05/2018

40 Number of Possible Trees
Phylogenetic tree reconstruction from DNA sequences Complicated because of the large number of possible trees Possible trees without root (n  3 ) Possible trees with root (n  2 ) n: number of taxa André de Carvalho - ICMC/USP 15/05/2018

41 Tree Representation There are several non-graphical ways to represent a tree 9 8 7 3 6 2 1 4 5 4 5 1 2 Popular standard format: Newick: (((1,2),3),(4,5)) André de Carvalho - ICMC/USP 15/05/2018

42 Inferring Trees Until recently, the relationship between taxa was inferred by morphological characteristics DNA sequences are currently used Sequencing technology Mutations leave a trail Trees can be inferred from the similarity between homologous sequences André de Carvalho - ICMC/USP 15/05/2018

43 Inferring Trees Tree branches can have different sizes,
The greater the number of mutations, bigger the branch André de Carvalho - ICMC/USP 15/05/2018

44 Inferring Trees Given homologous sequences of a taxa group
There are several methods to reconstruct their phylogenetic relationships Methods can be divided into two groups: The ones that order all possible tree through some criteria to find the best one Those who build the tree directly from the data (without setting an optimization function) André de Carvalho - ICMC/USP 15/05/2018

45 Inferring Trees Methods that order possible trees
Criteria that (in general) seek for the tree with the fewest number of changes Because of the huge Number of Possible Trees: It may take much time to find the best tree Using approaches to accelerate taxa search, one can not find the better tree André de Carvalho - ICMC/USP 15/05/2018

46 Inferring Trees Methods that build from the data
A phylogenetic tree is constructed by methods that use algorithms and statistics Often based on computing the distance between pairs taxa Very popular because they are usually quick André de Carvalho - ICMC/USP 15/05/2018

47 Methods that build from the data
Method based on the most popular distance is the Neighbor-Joining Algorithm (NJ) Although not necessarily so well statistically supported as other methods Robust and accurate Guaranteed to infer the true tree if the used distances reflect the actual distance among sequences Results not guaranteed by other methods more sophisticated (statistically) André de Carvalho - ICMC/USP 15/05/2018

48 Methods Based in Distance
Given n taxa Building Distance Matrix among taxa If the branches of the tree have a specific size, distance between any two nodes can be easily computed Total size of the only way linking them Specifies distance among tree node leaf using additive distance André de Carvalho - ICMC/USP 15/05/2018

49 Methods Based in Distance
Additive distance Biologically, the additivity is an important property for a Distance Matrix The number of substitutions separating 2 taxa from their last common ancestor forms an additive distance Distance on the path from node i to node j The Jukes-Cantos Model is frequently used as a Substitution Model André de Carvalho - ICMC/USP 15/05/2018

50 Methods Based in Distance
Distance Matrix 9 8 7 L1 L2 L3 L4 L5 L L L L L 6 L1 L2 L3 L4 L5 André de Carvalho - ICMC/USP 15/05/2018

51 Neighbor-Joining Algorithm
Greedy algorithm It begins with a phylogeny star-shaped All taxa are directly connected to a single root node Iteratively combines pairs of nodes André de Carvalho - ICMC/USP 15/05/2018

52 Neighbor-Joining Algorithm
Key to the success of the algorithm: Criteria that defines how nodes are selected to be combined each iteration Identifies nodes that are topologically neighbors on the tree Selected taxa are combined into a taxon A new Distance Matrix is then calculated Process repeated until all taxa are combined Generated tree is rootless André de Carvalho - ICMC/USP 15/05/2018

53 Neighbor-Joining Algorithm
Calculation of the size of the branches Assume 3 taxa in a tree without root 3-points formula: Lx + Ly = dAB Lx + Lz = dAC Ly + Lz = dBC Lx = (dAB+dAC-dBC)/2 Ly = (dAB+dBC-dAC)/2 Lz = (dAC+dBC-dAB)/2 A B center C Lx Ly Lz 3-point formula: André de Carvalho - ICMC/USP 15/05/2018

54 Neighbor-Joining Algorithm
Uses 4-point condition to select neighboring nodes to be combined Assume that 1 and 2 are neighbors d(1,2) + d(i,j) < d(i,1) + d(2,j) Ri = ∑j d(i,j) M(i,j) = (n-2)d(i,j) – Ri – Rj M(i,j) < M(i,k) para todo k  j 1 2 centro i Lx Ly Lz j Lq Degree of vizinhancidade Neighborhood degree André de Carvalho - ICMC/USP 15/05/2018

55 Neighbor-Joining Algorithm
Entry: nxn array of distances D and one outgroup Output: phylogenetic tree T with root Update table M using D and choosing the lower value of M to select two taxa to match Combine two taxa ti and tj in a new V node and use 3-points formula to update the Distance Matrix D' where ti e tj are replaced by V. Calculate the size of the branches of tk to V by using the 3-point formula, T(V,1) = ti e T(V,2) = tj , TD(ti) = L(ti,V) e TD(ti) = L(ti,V) Distance Matrixs D ' now has n-1 taxa. if there is 2 more taxa go 1: if there are two taxa , combines them by a size field d(ti,tj). Set the root node as the branch connecting the outgroup to the rest of the tree 5. André de Carvalho - ICMC/USP 15/05/2018

56 UPGMA NJ algorithm can be reduced to a simpler method, UPGMA, when M = D Unweighted Pair Group Method with Arithmetic Averages One of the first in methods based in distance Distance from the external node to the root is the same for all external nodes Ultrametricity Almost never valid for DNA sequences Can lead to incorrect inference of a tree André de Carvalho - ICMC/USP 15/05/2018

57 UPGMA A 2 2 Unweighted Pair Group Method with Arithmetic Averages
UPGMA One of the first Methods Based in Distance Branch length assumes “molecular clock” (ultrametricidade) Each son have equal lenght considering its father Ultrametricity extremally simple to find the branch lenght A C B A UPGMA Methods Based in the first Distance 2 2 Phylogenetic Trees

58 UPGMA Example: (Source: Mona Singh – U. Princeton) Phylogenetic Trees

59 UPGMA Unweighted Pair Group Method with Arithmetic Averages
Phylogenetic Trees

60 UPGMA Unweighted Pair Group Method with Arithmetic Averages
Phylogenetic Trees

61 UPGMA Unweighted Pair Group Method with Arithmetic Averages
If a tree with exact values ​​exists? Example: Phylogenetic Trees

62 UPGMA Almost never valid for DNA sequences
Can lead to incorrect inference Phylogenetic Trees

63 SARS: Case Study Phylogenetic Analysis of the Epidemic of SARS
SARS coronavirus genome has 6 genes Host: civet What do we know? Epidemiological tree Date of Origin Source area André de Carvalho - ICMC/USP 15/05/2018

64 SARS: How do we know that came from Civeta?
André de Carvalho - ICMC/USP 15/05/2018

65 SARS: Distance Matrix The Distance Matrix was obtained:
calculating the genetic distance of the global sequence alignment of the spike gene using the Jukes-Cantor corrections

66 SARS: Epidemiological tree How does it spread? Outliner  civeta
15/05/2018

67 SARS: When does it started??
With 95 % confidence , September 16, 2002 Genetics Distance of Civet SARS X Sample Collection Date

68 SARS: How do we know that it started in Guangding?
Epidemiologic Tree shows Guangdong Can we verify it in another way? Lower variation outside of Guangdong Province André de Carvalho - ICMC/USP 15/05/2018

69 Questions?

70 André de Carvalho - ICMC/USP
15/05/2018

71 Newick Format Matrix format: 1 2 3 A B C D Newick Format:
A B C D Newick Format: ((A, B), (C, D)); Size of the branches can be represented: ((A:1.0, B:1.0):2, (C:1, D:1):2); Names can be assigned to internal nodes: ((A, B)2, (C, D)3)1; 1 2 3 Convention A B C D André de Carvalho - ICMC/USP 15/05/2018


Download ppt "Dilvan Moreira (based on Prof. André Carvalho presentation)"

Similar presentations


Ads by Google