Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alignement multiple: progrès et perspectives dans l’estimation et l’exploitation des algorithmes et des données Marseille 17 Novembre 2005 Laboratory of.

Similar presentations


Presentation on theme: "Alignement multiple: progrès et perspectives dans l’estimation et l’exploitation des algorithmes et des données Marseille 17 Novembre 2005 Laboratory of."— Presentation transcript:

1 Alignement multiple: progrès et perspectives dans l’estimation et l’exploitation des algorithmes et des données Marseille 17 Novembre 2005 Laboratory of Integrative Bioinformatics and Genomics Department of Structural Biology and Genomics Dino Moras Institut de Génétique et de Biologie Moléculaire et Cellulaire Illkirch-Graffenstaden Strasbourg France Olivier Poch et Jean Claude Thierry Collège de France

2 Laboratory of Integrative Bioinformatics and Genomics BioInformatics and BioAnalysis/Genomics Algorithms for interconnected analysis of high throughput sequence data Algorithms for interconnected analysis of high throughput sequence data (automatic and high quality collection, validation, curation, analysis and maintenance) Specialized benchmark databases and software development Specialized benchmark databases and software development Specialized developments scientifically driven by biological projects Specialized developments scientifically driven by biological projects (genome annotation, transcriptomics data from cancers, retinal illness…) Major thematics : Major thematics : Integration and analysis of high quality Sequence/Structure/Evolution information in Systems Biology informational systems informational systems (replication, transcription, translation) Backup Lab. of BioInformatics platforms (RIO Génopole Alsace Lorraine) Centre of open biocomputing resources Centre of open biocomputing resources - 750 user accounts, Web Servers - Software, database maintenance & implementation ( generalist and specialized ) - Formation and courses Centre of competence and know how Centre of competence and know how - hot line ( bioinformatics analysis, program implementation and use… ) - solution design ( high throughput, specialized projects ) http://bips.u-strasbg.fr/

3 Structure comparison, modelling Interaction networks Hierarchical function annotation: homologs, domains, motifs Phylogenetic studies Human genetics, SNPs Therapeutics, drug discovery Therapeutics, drug design DBD LBD insertion domain binding sites / mutations Gene identification, validation RNA sequence, structure, function Comparative genomics MACS Multiple Alignment of Complete Sequences : central role MACS

4 Structures : - Structural genomics : complement the protein fold universe (~1000?) - “complete” : proteome - “specialized” : all family members (kinase, helicase, nuclear receptor…) GENERATED BY HIGH THROUGHPUT PROCESSES Sequences : - “Complete” genomes of virus, prokaryotes, eukaryotes, organelles… - “Specialized” sequencing : partial genomes, ESTs, … Biology : new context « from data poor to data rich science » Growth of PDB

5 MACS : new landscape Length: from tens of amino acids or nucleotides to thousands or millions (genomes) Length: from tens of amino acids or nucleotides to thousands or millions (genomes) Number: from tens up to thousands of sequences Number: from tens up to thousands of sequences Variability: from small percent identity to almost identical Variability: from small percent identity to almost identical Complexity: of the sequences to be aligned Complexity: of the sequences to be aligned - Family with linear or highly irregular repartition of sequence variability - Heterogeneity of length, structure or composition (large insertions or extensions, repeats, circular permutations, transmembrane regions…) Fidelity: from 15-30% errors (sequence, eucaryotic gene prediction, annotation…) Fidelity: from 15-30% errors (sequence, eucaryotic gene prediction, annotation…) High volume & heterogeneity of sequence data

6 MACS : new concepts Distinct objectives imply distinct needs & strategies Overview of one sequence family to quickly infer and integrate information from a limited number of closely related, well annotated sequences (reliable and efficient) Overview of one sequence family to quickly infer and integrate information from a limited number of closely related, well annotated sequences (reliable and efficient) Exhaustive analysis of one sequence family for (very high quality) Exhaustive analysis of one sequence family for (very high quality) - homology modeling - phylogenetic studies - subfamily-specific features (differentially conserved domains, regions or residues) Massive analysis of sets of sequences (reliable/high quality and efficient) Massive analysis of sets of sequences (reliable/high quality and efficient) - phylogenetic distribution, co-presence and co-absence and structural complex - genome annotation - target characterisation for functional genomics studies (transcriptomics…)

7 MACS : new questions Can one unique algorithm process all sequence alignment types ? What is the pertinent information available within a sequence alignment ? What are the strengths and weaknesses of the different algorithms ? How can we evaluate the quality of highly heterogeneous alignments ? How we can identify and exploit the pertinent information ? Construction of a benchmark database to evaluate algorithms : BAliBASE (1999) Definition of a objective function to evaluate sequence alignments : NorMD (2001) Development of cooperative algorithms PipeAlign (2003) Construction of an ontology to integrate and exploit the information : MAO (2005)

8 BAliBASE: objective evaluation of MACS programs High-quality alignments based on 3D structural superpositions and manually verified Alignments compared only in reliable ‘core blocks’, excluding non-superposable regions Separate reference sets specifically designed to address distinct alignment problems reference setdescription 1small number of sequences: divergence, length 2a family with one to 3 orphans 3several sub-families 4long N/C terminal extensions 5long insertions 6repeats 7transmembrane regions 8circular permutations BAliBASE1 :Thompson et al. 1999 Bioinformatics BAliBASE2 : Bahr et al, 2001 Nucl Acids Res.

9 Example of BAliBASE Alignment CORE BLOCKS are defined, which exclude non-superposable regions eg. borders of helices, loops Up to 30%!

10 multal N/AN/AN/AN/A multalign pileup clustalx prrp saga hmmt N/AN/A MLpima dialign SBpima Reference 1: < 6 sequences Tous < 100 résidues > 400 résidues Reference 2: a family with an orphan Reference 3: several sub-families Reference 4: long N/C terminal extensions Reference 5: long insertions Global algorithms work well when sequences are homologous over their full lengths, local algorithms are better for non-colinear sequences Comparison of multiple alignment methods All PPPPIIIPPIPPPPIIIPPI Iterative algorithms can improve alignment quality, but are too slow for most applications Thompson et al. 1999 Nucl. Acids Res.

11 DbClustal (Thompson et al, 2000) http://www-igbmc.u-strasbg.fr/BioInfo/ - integrates local motifs mined by a database search in a ClustalW global alignment T-COFFEE (Notredame et al, 2000) http://igs-server.cnrs-mrs.fr/Tcoffee/ - uses DP to compute ALL local and global alignments for each pair of sequences MAFFT (Katoh et al, 2002) http://www.biophys.kyoto-u.ac.jp/˜katoh/programs/align/mafft/ - detects locally conserved segments using a Fast Fourier Transform MUSCLE (Edgar, 2004) http://www.drive5.com/muscle -kmer distances and log-expectation scores, progressive and iterative refinement PROBCONS (Do et al, 2005) http://probcons.stanford.edu/ -pairwise consistency based objective function More recent developments : cooperative algorithms local and global

12 BAliBASE 3.0 http://www-bio3d-igbmc.u-strasbg.fr/balibase/ More difficult test cases More divergent sequences (V3 excluded) More multi-domain proteins More protein folds : (SCOP coverage) More sequences (total proteins increased from 1444 to 6255) Full-length sequences for all test cases Semi-automatic update protocol Annotations in XML format Web site re-designed Thompson et al. 2005 Proteins

13 BAliBASE 3.0 Reference 3 (subfamilies): protein kinase pkinase focal_at polo_box pkinase_c fha sh2 sam pb1 sh3 polo_box conserved domain core blocks

14 BAliBASE 3.0 ATP binding active site alpha helix beta strand core block

15 Multiple Alignment Quality Ref1Ref2Ref3Ref4Ref5Time V1 (<20%)V2 (20- 40%) orphanssubgroupsextensionsinsertion s (sec) ClustalW1.830.420.780.420.520.410.38902 Dialign2.2.10.310.710.370.390.450.435993 Mafft5.320.440.780.490.530.470.4896 Maffti5.320.540.830.560.600.490.57327 Muscle3.510.520.820.500.580.460.54523 Muscle_fast0.400.770.430.440.350.4934 Muscle_med0.450.800.500.590.440.51219 Tcoffee2.660.470.840.500.640.540.58216133 Probcons1.10.630.870.600.650.540.6319035 muscle_fast : muscle –maxiters=1 –diags1 –sv –distance1 kbit20_3 muscle_medium : muscle –maxiters=2 Truncated Alignments 2. Twilight zone still exists 3. Probcons scores best in all tests, but is MUCH slower than MAFFT or MUSCLE 4. MAFFTI scores slightly better than MUSCLE in all test, and is more efficient 1. Significant improvement in accuracy/efficiency since 2000

16 Multiple Alignment Quality Ref1Ref2: orphansRef3: subgroupsTime (sec) for all refs V1 (<20%)V2 (20-40%) T FL T T T T ClustalW1.8 3 0.42 0.24 0.78 0.72 0.42 0.20 0.52 0.27 902 2227 Dialign2.2.10.31 0.26 0.71 0.70 0.37 0.29 0.39 0.31 5993 12595 Mafft5.320.44 0.25 0.78 0.75 0.49 0.35 0.53 0.38 96 312 Maffti5.320.54 0.35 0.83 0.80 0.56 0.40 0.60 0.50 327 1409 Muscle3.510.52 0.34 0.82 0.79 0.50 0.36 0.58 0.39 523 3608 Muscle_fast0.40 0.28 0.77 0.72 0.43 0.29 0.44 0.33 34 132 Muscle_med0.45 0.29 0.80 0.74 0.50 0.34 0.59 0.38 219 1601 Tcoffee2.660.47 0.35 0.84 0.82 0.50 0.40 0.64 0.49 216133 341578 Probcons1.10.63 0.43 0.87 0.86 0.60 0.41 0.65 0.54 19035 58488 Comparison: truncated versus full-length sequences 1.Loss of accuracy is more important in twilight zone (Ref1 V1, orphans, and subgroups) 2.Probcons still scores best in all tests 3.MAFFT still scores better than MUSCLE in all tests

17 Evaluation of alignment quality Objective Function : Estimation of the quality of a Multiple Alignment of Complete Sequences Detection of badly aligned or unrelated sequences Detection of badly aligned or non superposable regions Use of MACS in automatic high-throughput genome analysis projects

18 MD : Mean Distance Coordinates of each AA of a sequence in each column according to the substitution values found in the Gonnet matrix AA of Seqi in Col.I AA of Seqj in Col.I Ala axis Cys axis Tyr axis Gly axis Other AA axes * * MD scoring Calculate the distance for each column betweeen each pair of sequences MD : exponential of the negative weighted mean distance (Q) Range of values is equal for all columns : from 0 to 1 for a completely conserved column Incorporation of sequence weights D ij

19 NorMD : Normalised Mean Distance MD – GAPCOST MaxMD*LQRID NORMD = Distribution of Pairwise Sequence Hash Scores Number of pairs Pairwise Score 25% LQRID LQRID : representative of potential orphan sequences Introduction of GOP (Gap Opening Penalty) and GEP (Gap Extension Penallty) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1.4.4.4.2.2 M M M M M MaxMD : maximum score if the set of studied sequences were all identical Thompson J.D. et al. (2001) J. Mol. Biol., 314 (4): 937-951.

20 Evaluation of Objective Functions using BAliBASE Sum of Pairs Relative Entropy Mean Distances OrphansSub-families Extensions/ Insertions Small number of sequences ShortMedium Long

21 NorMD: Normalised Mean Distance 0.5 OrphansSub-families Extensions/ Insertions Small number of sequences ShortMedium Long

22 Major observations and perspectives Above 30-35% identity : all algorithms perform reliably Above 30-35% identity : all algorithms perform reliably 15-30 % : dependant on the algorithm and the sequence family 15-30 % : dependant on the algorithm and the sequence family More information is needed More information is needed - coupling of local and global strategies - structural data (when available) : e.g. 3D-COFFEE Iteration can improve quality (but can be time consuming) Iteration can improve quality (but can be time consuming) MORE PERTINENT INFORMATION IS NEEDED better understanding of the local information : more robust statistics better understanding of the local information : more robust statistics better understanding of initial heterogeneity of the data better understanding of initial heterogeneity of the data -e.g. composition (MUSCLE), length, … integration of “non-sequence” information integration of “non-sequence” information -Fragments, Phylogenetic position, Fold plasticity Analysis and post processing to eliminate alignment incongruities Analysis and post processing to eliminate alignment incongruities -RASCAL (horizontal, vertical clustering) and LEON incorporated in PipeAlign To improve quality in the ‘twilight zone’ :

23 LMS (local maximum segments) BlastP search, E<10 Plewniak et al. (2000) Bioinformatics Plewniak et al. (2003) Nucl. Acids Res. Ballast AnchorsDbClustal Alignment Query Sequence Anchors Thompson et al. (2000) Nucl Acids Res. PipeAlign : high quality MACS production RASCALED MACS Multiple Alignment of Complete Sequences Thompson et al. (2003) Bioinformatics. Thompson et al (2004) Nucl Acids Res. Homologous regions Thompson et al. (2001) J Mol Biol. http://bips.u-strasbg.fr/PipeAlign

24 LMS (local maximum segments) BlastP search, E<10 Plewniak et al. (2000) Bioinformatics Ballast AnchorsDbClustal Alignment Query Sequence Anchors Thompson et al. (2000) Nucl Acids Res. RASCALED MACS Multiple Alignment of Complete Sequences Thompson et al. (2003) Bioinformatics. Thompson et al (2004) Nucl Acids Res. Homologous regions Thompson et al. (2001) J Mol Biol. Secator/DPC : automatic clustering algorithms Secator/DPC : automatic clustering algorithms Wicker et al. (2001) Mol Biol Evol. Wicker et al. (2002) Nucl Acids Res. OrdAli : Ordered Alignment analysis of differentially conserved OrdAli : Ordered Alignment analysis of differentially conserved residues with automatic visualization on structure Strictly/mostly conserved (black, grey) Conserved between groups (red + yellow = orange) Conserved within group (red, yellow, blue) PipeAlign : high quality MACS production http://bips.u-strasbg.fr/PipeAlign

25 Database searches : Extended mining: text, structures, OMIM… Statistics: hyper local p-value, correlation… (Daedalus …) Higher quality alignment : Post-processing approaches Clustering algorithms: sequence, evolution (Rascal, Leon …) Information cross-validation and analysis: Clustering algorithms: hierarchized info. Correlation and combinatorial algorithms Mined info. analysis and propagation (MACSIM & MAO…) Developments Exploit the informational content of MACS

26 Integration of mined structural/functional information (Daedalus/SRS) Cross-validation analysis and propagation Graphical interface to access the information SH3 SH2 PI-PLC-X PI-PLC-Y PH C2 CH rhoGEF DAG_PE-bind MACSIM : Integration of structural/functional information in the context of the multiple sequence alignment

27 ******** E E E E C C C C MACSIM : cross-validation and propagation GSVPTG GSTKVG GETRTG GSTEVG GSVSAG GSRDVG GSTNVF GSTAVF BAliBASE reference 3: aldehyde dehydrogenase-like NAD binding Active site Uniprot annotation

28 Application: target characterisation for SPINE (Structural Proteomics IN Europe) 223 (44%)PipeAlign (PDB-Blast) 196 (38%)PipeAlign (BlastP E<10) 142 (28%)BlastP (E<10 -7 ) 166 (33%)BlastP (E<10 -4 ) No. of targets with at least 1 PDB neighbour Detection of structural homologs, for a training set of 510 potential targets : No. of targets with at least 1 domainTotal no. of domains Pfam APfam A / BPfam APfam A / B Pfam database288 ( 56%)336 (67%)5051013 propagated414 (81%)477 (94%)7721614 Domain organisation:

29 SPINE target identity cards Target: nuclear receptor coactivator 2 (NCOA-2) ODD PAC PAS SWISSPROT domains HLH Dna-binding Pfam domains NTAD CTAD Interaction with CREBBP Acetyltransferase activity ID NCoA-2 Clock HIF-1  Single- minded BMAL NCoA-3 PAC PASHLH CREBBP interaction AT Poly-Gln PAS LXXLL acetylation (by CREBBP) S-nitrosylation NCoA-2 Receptor-interacting domain >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> hydroxylation S-nitrosylation acetylation LXXLL (In collaboration with EBI)

30 Semantic differences: terminology, naming conventions - Sequence ‘names’: Genbank GI ≠ EMBL ID - Function definitions e.g. Glycyl-tRNA synthetase alpha chain, Glycyl-tRNA synthetase alpha subunit, Glycine-tRNA synthetase, Glycine--tRNA ligase, GlyQ Natural language parsing: text mining, statistical approaches Constrained vocabularies, ontologies explicitly define concepts Data integration issues : Multiple Alignment Ontology ( MAO) Integration of data from different domains poses a number of problems: Syntactic differences: file formats - Sequence databases: GenBank, TrEMBL, Swissprot, PDB Indexing applications: Entrez, SRS Standard data exchange formats: XML

31 Ontologies provide a community reference: knowledge is authored in a single language. a standardised vocabulary that facilitates data integrationa standardised vocabulary that facilitates data integration Concepts are structured in a hierarchy that represents knowledge and allows computational reasoningConcepts are structured in a hierarchy that represents knowledge and allows computational reasoning What is an ontology? An ontology is a formal specification of a shared conceptualisation of a domain of interest Gruber, 1993 cell nuclear membrane cell wall nucleus nucleoplasm inner membraneouter membrane part_of data exchange formats for program communication, reuse of softwaredata exchange formats for program communication, reuse of software integration of information from different databases (e.g. transcriptomics, proteomics, metabolomics, toxicology)integration of information from different databases (e.g. transcriptomics, proteomics, metabolomics, toxicology) more efficient database queries: exact terms for searching can be usedmore efficient database queries: exact terms for searching can be used e.g. searching for `mitochondrial double stranded DNA binding proteins', all and only those proteins will be found computational reasoning: automatic, large-scale analysescomputational reasoning: automatic, large-scale analyses presentation of relevant information to the biologistpresentation of relevant information to the biologistApplications

32 MAO : Multiple Alignment Ontology http://www-igbmc.u-strasbg.fr/BioInfo/MAO/mao.html Also available from OBO web site: http://obo.sourceforge.net MAO consortium: - RNA analysis - RNA analysis (Steve HOLBROOK, Berkeley) - MACS algorithm - MACS algorithm (Kazutake KATOH, Kyoto) - Protein 3D analysis - Protein 3D analysis (Patrice KOEHL, Davis) - Protein 3D structure - Protein 3D structure (Dino MORAS, Strasbourg) - 3D RNA structure - 3D RNA structure (Eric WESTHOF, Strasbourg) Thompson et al. (2005) Nucleic Acids Res.

33 Hierarchical organisation, characterisation multiple_sequence_alignment sub_alignment alignment_column alignment_sequence sequence_feature residue amino_acid column_conservation nucleotide sequence_feature_type part_of is_a is_attribute part_of is_attribute domain is_a Multiple sequence alignment sub_alignment domain motif is_a alignment_sequence alignment_column Most of the features associated with multiple alignments are defined as MAO concepts, ranging from a single residue to sub-families of sequences and/or 3D structures. Concepts are organised in a DAG (directed acyclic graph). Links are provided to OBO ontologies and external databases. Scope and structure

34 Sequence-structure relationships Either link to existing PDB entry or enter 3D coordinates for atoms multiple_sequence_alignment sub_alignment alignment_column alignment_sequence residue amino_acid nucleotide atom 3d_atomic_point is_attribute part_of is_a part_of is_attribute pdb_name part_of ndb_name is_attribute

35 OBO interactions alignment_column alignment_sequence residue is_attribute part_of residue_function is_attribute structural_location column_conservation is_attribute binding_site is_a ptm is_a mutation is_a MI structural_bond is_a GO is_attribute is_a TAXID is_attribute accession is_attribute EC is_attribute enzyme_active_site CSA_catalytic_site is_a phenotype is_attribute feature domain Interpro is_a

36 IL-1 proteins (C-terminal mature form) are involved in the inflammatory response and immunity. IL1Fx IL1A IL1B IL1RN Interleukin-1 Interleukin-1 propeptide MAO Knowledge Base Differential effects of IL-1 on tumor development: IL1A reduces tumorigenicity; IL1B promotes invasiveness (Song et al, 2003) Within the nucleus, the IL1A propeptide may interact with elements of RNA processing affecting alternate splicing of genes involved in the regulation of apoptosis. (Pollock et al, 2003). NLS RNA interaction myristoylation phosphorylation mutation R>Q «damaging » mutation A>S «benign » cleavage sites IIL1A propeptide processing and comparison with IL1B ** mutations from SeattleSNPs IL1A IL1B Knowledge base of annotated protein family alignments

37 Rational for objective sequence/structure/evolution analysis Rational for objective sequence/structure/evolution analysis - role of sequence conservation in structure (fold, plasticity, oligomerisation, …) - impact of sequence changes (evolutionary, “indel”, mutation…) - spatial relation between conservation and physico-chemical properties… Automatic correction and integration of high throughput data Automatic correction and integration of high throughput data - from sequence/structure/evolution data to systems biology DbW: automatic daily update of protein alignments (Prigent et al, 2005 Bioinformatics) vALId: validation of predicted protein sequences (Bianchetti et al, 2005 JBCB) GOAnno: GO annotation based on multiple alignment (Chalmel et al, 2005 Bioinformatics) Promotor analysis : Multiple alignment algorithms in test Promotor analysis : Multiple alignment algorithms in test - phylogenetic footprinting coupled to MACS, promotor site prediction and statistical estimation Perspectives for algorithm developments

38 Bioanalysis and biological projects Implications in cancer and inherited disease Implications in cancer and inherited disease : Target characterisation and high throughput data analysis & integration (transcriptomics, interactomics…) Cancer targets in Structural Proteomics IN Europe (SPINE, European Integrated Project 2000) Cancer targets in Structural Proteomics IN Europe (SPINE, European Integrated Project 2000) Head and Neck Squamous Cell Carcinomas (HNSCC, European I. P. 2000) Head and Neck Squamous Cell Carcinomas (HNSCC, European I. P. 2000) Prostate cancer (ProCure BioPharm, European I. P. 2001) Prostate cancer (ProCure BioPharm, European I. P. 2001) Prostate cancer ( Prima, European I. P. 2003) Prostate cancer ( Prima, European I. P. 2003) Retinal disease (RetNet, European Research Training Network, 2003) WP5 Retinal disease (RetNet, European Research Training Network, 2003) WP5 Functional Genomics of the Retina (EVI-GENORET, European I.P. 2005) WP14 & 16 Functional Genomics of the Retina (EVI-GENORET, European I.P. 2005) WP14 & 16 Annotation of 10 000 full length cDNAs from the thermotolerant metazoan Alvinella pompejana (Alvinella consortium, Genoscope) Annotation of 10 000 full length cDNAs from the thermotolerant metazoan Alvinella pompejana (Alvinella consortium, Genoscope) Implications in integrated and automated processes Construction of a « GRID version » of PipeAlign (IBM/AFM/CNRS) Transcriptomic and Bioinformatic platforms (CRP santé, Luxembourg) Transcriptomic and Bioinformatic platforms (CRP santé, Luxembourg) MS2PH project : from Structural Mutation to Phenotype of Human Pathology (Decrypthon program, IBM/AFM/CNRS) MS2PH project : from Structural Mutation to Phenotype of Human Pathology (Decrypthon program, IBM/AFM/CNRS) Integrative approach of start codon prediction (ACI Protéomique et génie des protéines) Integrative approach of start codon prediction (ACI Protéomique et génie des protéines)

39 Laboratory of Integrative Genomics and Bioinformatics IGBMC, Strasbourg

40 Integrated analyses Sequence validation ~44% of predicted proteins from whole-genome shotgun sequencing projects, ~30% of high-throughput cDNA (HTC) may contain errors (Bianchetti et al. 2005) Structural characterisation ~50% sequences in GenBank can be assigned to known structures (PSSH, Schafferhans et al, 2003) Functional characterisation 20-30% of ORFs are ‘hypothetical proteins’ (Siew, 2004) cross-validation of experimental and predicted data propagation of information from known to unknown proteins  Multiple alignments of complete sequences (MACS) provide an ideal environment :  Integrated processes for automatic information collection, validation and analysis High quality, automatic multiple alignments

41 Multiple alignment methods : a brief history Local Global SBpima multal multalign pileup clustalx MLpima Progressive Iterative prrp dialign saga hmmer SEGMENT GA HMM

42 Progressive multiple alignment methods Local Global SB ML UPGMA NJ SBpima multal multalign pileup clustalx MLpima SB - sequential branching UPGMA- Unweighted Pair Grouping Method ML - maximum likelihood NJ - neighbor-joining


Download ppt "Alignement multiple: progrès et perspectives dans l’estimation et l’exploitation des algorithmes et des données Marseille 17 Novembre 2005 Laboratory of."

Similar presentations


Ads by Google