Available at DNA variation in Ecology and Evolution DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic.

Slides:



Advertisements
Similar presentations
Introduction to Molecular Evolution
Advertisements

Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Introduction into Phylogenetics Katja Nowick Group Leader “TFome and Transcriptome Evolution” Bioinformatics Group Paul-Flechsig-Institute for Brain Research.
Phylogenetic Trees Lecture 4
Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic reconstruction
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Bioinformatics and Phylogenetic Analysis
BME 130 – Genomes Lecture 26 Molecular phylogenies I.
Phylogenetic reconstruction
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Pattern Recognition Introduction to bioinformatics 2005 Lecture 4.
Maximum Parsimony.
Lecture 24 Inferring molecular phylogeny Distance methods
Phylogeny Tree Reconstruction
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Consensus Trees Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Pattern Recognition Introduction to bioinformatics 2006 Lecture 4.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.
MS Sequence Clustering
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Phylogeny Ch. 7 & 8.
Phylogenetics.
Phylogenetic Trees - Parsimony Tutorial #13
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Methods in Phylogenetic Inference Chris Castorena Thornton Lab.
Biochemistry and Molecular Genetics Computational Bioscience Program Consortium for Comparative Genomics University of Colorado School of Medicine
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogenetic basis of systematics
Lecture 6B – Optimality Criteria: ML & ME
Linkage and Linkage Disequilibrium
Phylogenetic Inference
Goals of Phylogenetic Analysis
Methods of molecular phylogeny
Molecular basis of evolution.
Molecular Evolution.
DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction Maria Eugenia D’Amato This presentation is focused on methods.
Lecture 6B – Optimality Criteria: ML & ME
BNFO 602 Phylogenetics – maximum likelihood
BNFO 602 Phylogenetics Usman Roshan.
Presentation transcript:

Available at DNA variation in Ecology and Evolution DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction Maria Eugenia D’Amato BCB 705:Biodiversity

Organization of the presentation Phylogenetic reconstruction Networks Multivariate analysis Distance ML MP

Characters: Characters: independent homologous Continuous Discrete Binary Multistate

DNA sequence characters Alignment = hypothesizing of a homology relationship for each site Sequence comparison BLAST search - GenBank Coding sequenceblastn blastx Non-coding DNA blastn

Blast search results Score E Score E Sequences producing significant alignments: (Bits) Value gi| |dbj|AB | Mantella baroni mitochondrial ND e-18 gi|343991|dbj|D |FRGMTURF2 Rana catesbeiana mitochondri e-17 gi| |gb|AF |AF Rana sylvatica NADH dehydr e-16 The lower the E-value, the better the alignment GeneBank Accession numbers for the sequence Species that match the query

Blast search results >gi| |dbj|AB | Mantella baroni mitochondrial ND5, ND1, ND2 genes for NADH dehydrogenasegi| |dbj|AB | subunit 5, NADH dehydrogenase subunit 1, NADH dehydrogenase subunit 2, complete cds Length=10814 Identities = 99/115 (86%), Gaps = 0/115 (0%) Strand=Plus/Minus 5’end Score = 101 bits (51), Expect = 3e-18 Query 451 TTAGTTGAGGATTAAATTTTAGGATAATAACTATTCAGCCGAGGTGGCTGATGGAAGAAA 510 ||||||||||||||||||||| ||||||| ||||||||| ||||| | |||||||| | Sbjct TTAGTTGAGGATTAAATTTTAAAATAATAAGTATTCAGCCCAGGTGACCAATGGAAGAGA Query 511 AAGCTAAAATTTTACGTAGTTGTGTTTGGCTAATGCCGCCTCATCCGCCTACAAG 565 | |||| ||||||||||||||| |||||| |||| || ||||| || |||||||| Sbjct AGGCTATAATTTTACGTAGTTGAGTTTGGTTAATACCCCCTCAACCTCCTACAAG Description of the genes contained in the sequence with this Accession number Strands aligned alignment

Phylogenetic reconstruction Phylogenetic reconstruction Distance methods C1 C2 C3 C4 C5 C6 C Distance criterion Similarity / dissimilarity criterion dendrogram 5 x 5 5 X 7

Distances criterion for binary data a a + b + c a = bands common to a and b b = bands exclusive to a c = bands exclusive to b J = (x1, y1) (x2, y2) Jaccard’s distance Manhattan distance M = P1 P2  (x1-x2) 2 + (x2-y2) 2 Euclidean distance

Distance criterion for DNA data- Distance criterion for DNA data- Models of DNA susbstitution p = n of different nucleotides/ total n nucleotides f AA f AC f AG f AT f CA f CC f CG f CT f GA f GC f GG f GT f TA f TC f TG f TT Fxy = a b c d e f g h i j k l m n o p Fxy =

Models of DNA susbstitution Jukes and Cantor D = 1 – ( a + f + k + p) dxy = - ¾ ln (1- 4/3 D) F81 B = 1 – (  2 A +  2 C +  2 G +  2 T ) dxy = - B ln (1- D/B) Equal rate Unequal base freqs K2P P = c + h + i + nTransitions Q = b + d + e + g + j + l + m + oTransversions 1 1-2P-Q dxy = 1 ln 2 1 ln Q +

Distances criterion for diploid data Dn -ln Jx i y i  Jx i Jy i Nei 1972 = I Jx = xi 2 Jx = yi 2 Jxy = xiyi Cavalli Sforza 1967 Darc =  (1/L)  (2  /  ) 2  = cos -1   xiyi

Phylogenetic reconstruction criterion for distance data V1 V2 V3 V4 V5 A B C D Additive tree (NJ) Ultrametric tree (UPGMA) A B C V1 V2 V3 V4 Properties dAB = v1 + v2 dAC = v1 + v3 + v4 dAD = v1 + v3 + v5 dBC = v2 + v3 + v5 dCD = v4 + v5 dAB = v1 + v2 + v3 dAC = v1 + v2+ v4 dBC = v3 + v4 v3 = v4 v1 = v2 + v3 = v2 = v4

Maximum Likelihood (1) (1)C….GGACACGTTTA….C (2) (2)C….AGACACCTCTA….C (3) (3)C….GGATAAGTTAA….C (4) (4)C….GGATAGCCTAG….C 1 J n C ACG C ACG Unrooted tree Tree after rooting at an internal node Lj = Prob A A C ACG + Prob A C + Prob……. L = L 1 x L 2 x L 3 …x L N. =  Lj LnL = ln L 1 + ln L 2 + …. L N =  ln Lj L D = Pr (D H)

Hypothesis testing Hypothesis testing Likelihood ratio test  = log L 1 – log L 0 Rate variation Appropriate substitution Model 2   2 distribution d.f. = N sequences in the tree –2; or d.f = difference number of parameters H1 and H0

Bootstrapping Bootstrapping H ow well supported are the groups? Trumpet fish

Maximum Parsimony Minimize tree length To obtain rooted trees (and character polarity) use an outgroup. The ingroup is monophyletic. 1 1ATATT 2 2ATCGT 3 3GCAGT 4 4GCCGT Tree (first site) change 5 changes G G AG A A G G GA A A

C Maximum Parsimony- Maximum Parsimony- example C T C T T Site 2Site 3 AAC A A C C C C C AA Site 4 T G G G GG Site 5 No changes TT T T T T Tree length L =  k i=1 li

Maximum parsimony: Maximum parsimony: example Sites Total ((1,2),(3,4)) ((1,3),(2,4)) ((1,4),(2,3)) Tree Phylogenetically informative sites

Networks Phylogenetic representation allowing reticulation More appropriate for intraespecific data Ancestor is alive hybridization, recombination, horizontal transfer, polyploidization agct acat acct acatagctacct

Multivariate clustering C1 C2 C3 C4 C5 C6 C X 7 similarity criterion correlations 7 x 7 Calculate eigenvectors with highest eigenvalues Project data onto new axes (eigenvectors) X 1 st axis Y 2 nd axis Z 3 rd axis