Nadia Léonard Unité de Recherche en Biologie Moléculaire F.U.N.D.P. Developing a reliable methodology to align a sequence of known structure and a sequence.

Nadia Léonard Unité de Recherche en Biologie Moléculaire F.U.N.D.P. Developing a reliable methodology to align a sequence of known structure and a sequence with low homology, to model it

Introduction 3D Structure : information to understand function to plan directed mutagenesis Number of known structures (8000) smaller than known sequences (500000). Experimental techniques : long and expensive Alternative: modeling Homology modeling : two homologues adopt the same structure

Pairwise alignment: most features well predicted multiple alignment 100 50 40 30 25 20 0 Twilight zone Midnight zone fold recognition (not very reliable) Homology modeling (reliable) Not homologous BUT proteins of different sequences can adopt the same structure %id. Consensus of alignments, some features well predicted

Sequence alignment is the critical step for homology modeling Below 30% of identities, there is no automatic method which allows reliable protein modeling

Aim of our work to propose a reliable alignment method for proteins sharing a small percentage of identities with their template (<30%)

General strategy for homology modeling General strategy for homology modeling Search databanks (PSI-BLAST) Multiple alignment of sequences target-template alignment Modeling Theoretical model evaluation Comparison model to real structure PDB template Critical step

Our methodology 1. Target selection : PDB proteins of which template shares between10 and 30 % of identities (ALIGN) 2. Improvement of sequence-structure alignment Building of 3 alignments 2 from our method (consensus 1 and 2) pairwise alignment PSI-BLAST (best alignment method for Twilight Zone proteins) 3. Homology modeling from each target-template alignment 4. evaluation :geometrical features of the models 5. Comparison of each model to the real structure

Our approach consists in building consensus of several alignment programs Multiple alignment Target template Several programs Several programs Multiple alignment

Our approach consists in building consensus of several alignment programs Multiple alignment Targettemplate Pairwise alignment Several programs Several pairwise alignment Multiple alignment Pairwise alignment

Our approach consists in building consensus of several alignment programs Multiple alignment Target template Pairwise alignment Several programs Several pairwise alignments consensus consensus Multiple alignment Pairwise alignment Consensus building

Multiple alignments (8 alignements) multiple alignments (12 alignments) 13 pairwise alignments Consensus 2 8 pairwise alignments Consensus 1 pairwise alignment PSI-BLAST Databank searching PSI-BLAST Model PSI-BLAST Model 1 Model 2 1) Alignment and modeling

2) Comparison of models to real structure global RMSD between model and structure after superposition local RMSD :percentage of well predicted residues Lower the distance, closer the model from the real structure.Lower the distance, closer the model from the real structure. A wrong modeled region can dramatically increase the global RMSD.

3pte: D-alanyl- D- alanine carboxypeptidase de Streptomyces sp R161 Mod 2 PSI-BLAST Real structure

Results 9 proteins have been modelled. We can distinguish: 3 proteins of the midnight zone (<20% id.) 6 proteins of the twilight zone (20-30%)

Comparison of models to the real structure Midnight Zone proteins (<20% id) For all methods (models 1, 2, PSI), very bad results: most of the residues have been badly modeled. Actually, no reliable alignment method exists below 20%. Our method (models 1 et 2) can not lower this threshold. Modeling of these 3 proteins confirms the limits Modeling of these 3 proteins confirms the limits of alignment methods below 20%.

Twilight Zone proteins (20-30% id) global and local RMS : most accurate models (4/6 et 5/6) come from our method (consensus 1 and 2). In general, model 2 gives better results than model 1 and model PSI-BLAST. It is better to use many alignment programs. models build from our methodology seem to be better than PSI-BLAST models.

Comparison to CASP (Critical Assessment of techniques for protein Structure Prediction) modeling of proteins for which structure is unknown by the entrants (revealed after competition) comparison to the real structure (global RMS) The best CASP ’s models are taken as reference

Conclusions Limits of our method are defined below 20% of identities. Our alignment method appears to be better than PSI-BLAST (above 20% id.) Our results are comparable to the best CASP ’s performances (cfr. graph) consensus for sequence alignment has a future for homology modeling of Twilight Zone proteins.

Perspectives (1) Test our approach on a large set of proteins improve our method: giving more weight to better alignment programs increasing the number of alignment programs using several templates using SSP and fold recognition

Perspectives (2) Evaluate the confidence of regions predicted by a lot of programs take part in CASP competition Automate : expert system (PHD thesis)

61 1d2f MHGVFGYSRW KNDE-FLAAI AHWFSTQHYT AIDSQTVVYG PSVIYMVSEL IRQWSETGEG consensus1 AQGKTKYAPP AGIPELREAL AEKFRRENGL SVTEEETIVT VGGKQALFNL FQAILDPGDE score1 4444444464 4444466666 6666666666 4444446666 6666666666 6688888888 consensus2 AQGKTKYAPP AGIPELREAL AEKFRRENGL SVTPEETIVT VGGKQALFNL FQAILDPGDE score2 5555555555 5434354455 4445444444 4443356666 6666666666 6688888888 121 1d2f VVIHTPAYDA FYKAIEGNQR TVMPVALEKQ ADGWFCDMGK LEAVLAKPEC KIMLLCSPQN consensus1 VIVLSPYWVS YPEMVRFAGG VVVEVETL-- ---------- --R----R-T KALVVNSPNN score1 8888888888 8888888666 66686664-- ---------- --4----4-4 4888888888 consensus2 VIVLSPYWVS YPEMVRFAGG VVVEVETL-P EEGFVPD-PE RVRRAITPRT KALVVNSPNN score2 8888888888 8888877777 77777553-2 1222222-33 3333444445 5888888888 181 1d2f PTGKVWTCDE LEIMADLCER HGVRVISDEI HMDMVWGEQP HIPWSNVARG DWALLTSGSK consensus1 PTGAVYPKEV LEALARLAVE HDFYLVSDEI YEHLLYEG-E HFSPGRVAPE HTLTVNGAAK score1 8888888888 8888888888 8888888888 8888888824 4666444466 4446446668 consensus2 PTGAVYPKEV LEALARLAVE HDFYLVSDEI YEHLLYEGEH FSPGRVA-PE HTLTVNGAAK score2 8888888888 8888888888 8888888888 8888888833 4444443-44 4445556668

1d2f modèle 2

Nadia Léonard Unité de Recherche en Biologie Moléculaire F.U.N.D.P. Developing a reliable methodology to align a sequence of known structure and a sequence.

Similar presentations

Presentation on theme: "Nadia Léonard Unité de Recherche en Biologie Moléculaire F.U.N.D.P. Developing a reliable methodology to align a sequence of known structure and a sequence."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Nadia Léonard Unité de Recherche en Biologie Moléculaire F.U.N.D.P. Developing a reliable methodology to align a sequence of known structure and a sequence.

Similar presentations

Presentation on theme: "Nadia Léonard Unité de Recherche en Biologie Moléculaire F.U.N.D.P. Developing a reliable methodology to align a sequence of known structure and a sequence."— Presentation transcript:

Similar presentations

About project

Feedback