Download presentation
Presentation is loading. Please wait.
Published byPatrick Hawkins Modified over 9 years ago
1
Alexis Dereeper Homology analysis and molecular phylogeny CIBA courses – Brasil 2011
2
Alexis Dereeper Data selection Sequence alignment Method selection Bayesian Maximum likelihood Parsimony Calculate or estimate the better tree fitting the data Test the reliability of the obtained tree Probabilistic methods Distance methods Calculate distance Model? Optimization 1234 4 steps for a phylogenetic analysis CIBA courses – Brasil 2011
3
Alexis Dereeper Phylogeny.fr “The Phylogeny.fr platform transparently chains programs to automatically perform phylogenetic analysis tasks” CIBA courses – Brasil 2011
4
Alexis Dereeper Homology analysis What is sequence homology? Not a quantitative concept (to differentiate to similarity or identity : 28%identity): genes are homologous or not Homologs: genes coming from a common ancestor Paralogs: homologs coming from a duplication event Orthologs: homologs coming from a speciation event Homology and function: homology does not mean same function systematically. Closest orthologs may have the same function but more distant orthologs show rarely the same phenotypic role (but same role in a specific metabolic pathway) On the other hand, paralogs rapidly acquire different functions. CIBA courses – Brasil 2011
5
Alexis Dereeper How are homologous sequences similar? From 100% identity to a few nt/aa in common No rule, no limit. Estimation is based on the probability that 2 sequences are similar by chance (e-value): DNA: e-value 70% Protein: e-value 25% Sequences without noticeable resemblance can be homologous (similarity found at the 3D structure level). Otherwise, a important resemblance is generally interpreted as a homology, and not as a convergent evolution CIBA courses – Brasil 2011 Homology analysis
6
Alexis Dereeper How to detect homology? By sequence comparison= sequence alignment 1- Local alignment (ex:Blast) Conceived to search for similar regions Alignment of a particular sequence against a bank of sequences (Swith &Waterman) 2- Global alignment (ex: ClustalW) Conceived to compare homologous sequences on their full length (Needleman & Wunsh) CIBA courses – Brasil 2011 Homology analysis
7
Alexis Dereeper Classical Blast output Different Blast programs : ●BlastN (Query: DNA / Subject : DNA) ●BlastP (Query: protein/ Subject : protein) ●BlastX (Query: DNA / Subject : protein) ●TBlastN (Query: protein/ Subject : DNA) ●TBlastX (Query: translated DNA / Subject : translated DNA) score Evalue= inform the accuracy of score CIBA courses – Brasil 2011 Homology analysis
8
Alexis Dereeper Blast Explorer Enable an assisted selection of homologous sequences using various criterias Post-processing of Blast results: Guide tree (similarity tree) and possible selection on branches and leaves Score / evalue distribution Taxonomic arborescence of hits CIBA courses – Brasil 2011
9
Alexis Dereeper BBMH method (Best Blast Mutual Hits) ou RBH (Reciprocal Best Hit) Ortholog databases/banks: ●Inparanoid (eukaryotes) ●HomoloGene (eukaryotes) ●OrthoMCL DB ●COG (Clusters of Ortholog Groups of proteins) (prokaryotes et eukaryotes) ●GreenPhyl (plants) Proteome Species1 Proteome Species2 CIBA courses – Brasil 2011 Homology analysis
10
Alexis Dereeper Phylogenetic analysis Step 1 : Multiple alignment (global alignment) Alignment softwares: ClustalW Muscle Tcoffee 3DCoffee (optimize the alignment with 3D structure) Mafft Alignment formats : Fasta, Clustal, Phylip, Nexus Alignment visualization/edition softwares SeaView Jalview BioEdit fast slow CIBA courses – Brasil 2011
11
Alexis Dereeper Step 2 : Alignment cleaning Removal of divergent regions showing a low phylogenetic signal (not very informative) These regions may not be homologous or may have been saturated by substitutions (ex: synonymous sites in coding regions) => Cleaned alignment more suitable for a phylogenetic analysis Alignment curation software GBlocks CIBA courses – Brasil 2011 Phylogenetic analysis
12
Alexis Dereeper Step 3 : Phylogenetic reconstruction Step 3a: Choose a method for phylogenetic reconstruction 4 main methods/algorithms: Distance method 2 by 2 (UPGMA, Neighbor Joining) oFastDist, BIONJ, Neighbor Maximum parsimony oDNAPars, TNT Maximum likelihood oPhyML, PAML Bayesian inference oMrBayes, Beast Output format : distance matrix, Newick format Choose the correct compromise between speed and performance CIBA courses – Brasil 2011 Phylogenetic analysis
13
Alexis Dereeper Step 3 : Phylogenetic reconstruction Step 3b: Choose parameters and evolution models Different evolution models indicating the substitution rate for aa or nt: DNA oJuke Cantor, Kimura, F81, HKY85, GTR protein oJTT, WAG, Dayhoff Evolution test softwares: Test and selection of the best substitution model (and parameters) adapted to dataset (having the maximum likelihood) ProtTest, ModelTest (based on PhyML) CIBA courses – Brasil 2011 Phylogenetic analysis
14
Alexis Dereeper Step 3 : Phylogenetic reconstruction Step 3c: Estimate the branch robustness Bootstrap procedure 1- Re-sampling of sequences on columns : creation of a pseudo-alignment by taking some sites randomly and tree computing again. 2- Reiterate the process N times. 3- For each branch of the initial tree, we count the number of times we can observe it into bootstrap trees. The higher is this number, the more accurate is the branch aLRT test (approximate Likelihood Ratio Test) (Anisimova & Gascuel, Syst Biol, 2006) Integrated in PhyML Much faster (PhyML launched only one time) CIBA courses – Brasil 2011 Phylogenetic analysis
15
Alexis Dereeper Step 4 : Visualization and edition of phylogenetic tree Graphical tools available to display trees from Newick format : TreeDyn DrawGram, DrawTree ATV NJPlot Graphical output formats : PNG, SVG, PDF… Step 5 : Interpretation of the tree CIBA courses – Brasil 2011 Phylogenetic analysis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.