Multiple Sequence Alignment Urmila Kulkarni-Kale Bioinformatics Centre University of Pune urmila@bioinfo.ernet.in October 2k5
Approaches: MSA Dynamic programming Progressive alignment: ClustalW Genetic algorithms: SAGA October 2k5
Progressive alignment approach Align most related sequences Add on less related sequences to initial alignment Perform pairwise alignments of all sequences Use alignment scores to produce phylogenetic tree Align sequences sequentially, guided by the tree Gaps are added to an existing profile in progressive methods October 2k5
No of pairwise alignments: N*(N-1)/2 October 2k5
October 2k5
Steps in ClustalW Algorithm Pairwise alignment: Calculate the distance matrix Unrooted Neighbor-joining tree Rooted NJ tree Sequence weights Progressive alignment using Guide tree Steps in ClustalW Algorithm October 2k5
ClustalW: weight groups of related sequences receive lower weight highly divergent sequences without any close relatives receive high weights October 2k5
ClustalW: affine Gap penalty GOP: Gap Opening Penalty GEP: Gap Extension Penalty Heuristics in calculating gap penalty Position specific penalty gap at position? yes lower GOP and GEP no, but gap within 8 residues increase GOP stretch of hydrophilic residues? yes lower GOP no use residue-specific gap propensities Once a gap, always a gap October 2k5
Variation in local GOP Lowest GOP in Hydrophilic regions Highest GOP in ‘Gapped regions’ Initial GOP October 2k5
MSA: help detect Similarity Hemoglobin: Human, chimpanzee, Goat, pig, horse & mouse October 2k5
Sample MSA October 2k5
Applications of MSA Detecting diagnostic patterns Phylogenetic analysis Primer design Prediction of protein secondary structure Finding novel relationships between genes Similar genes conserved across organisms Same or similar function Simultaneous alignment of similar genes yields: regions subject to mutation regions of conservation mutations or rearrangements causing change in conformation or function October 2k5
Limitations of Progressive alignment approach Greedy nature Any errors in the initial alignment are carried through More efficient for closely related sequences than for divergent sequences October 2k5