Aligning Kinases Applying MSA Analysis to the CDK family
Building A Multiple Sequence Alignment
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : Extrapolation Motifs/Patterns Phylogeny Profiles Struc. Prediction Multiple Alignments Are CENTRAL to MOST Bioinformatics Techniques. Potential Uses of A Multiple Sequence Alignment?
1 Organizing a Family Gathering The CDK example
Choosing the Right Sequences SwisProt Litterature Other Databases
Organizing the Data SRS Public Data IGS Data Aventis CDK Genecard Manual Automatic
Accessing the Data: The Fischer Server Fischer will Contain – A collection of Flat files – A secure SRS server – File Formats The server is a Technology Pipeline – Can be adapted in real time – Can be Transfered
Our CDK Data CDKs and CDK-like – Protein Information Functional Features Structural Information – Genomic Information Genes Variant SNPs
Our MSA dataset 29 amino acid sequences (CDKS and Aurora families, stemming from primary transcripts) – 2 isoforms of a cdk member 4 PDB structures : – 1MUO (AUR A) – 1BLX (CDK 6 ) – 1b38 (CDK 2) – 1H4L (CDK 5) Use of T-coffee release 1.78 with integration of the structure informations contained in pdb files
2 Aligning The Sequences
Building A Multiple Sequence Alignment ClustalW T-Coffee Muscle Hand Editing Combination Comparison
Using Structural Information 3D-Coffee Struct Vs Struct Seq Vs Struct Thread Superpose Seq Vs Seq Local Global
Method
Accessing the Methods: Fischer Public 3D-Coffee server – igs-server.cnrs-mrs.fr/TCoffee/ Fischer – Latest version of T-Coffee – Customised parameters – Coktails of MSA methods
3 Dressing Up a Multiple Sequence Alignment
Feature Dressing -25 Binding site -20 Phospho -40 nsSNP -50 Splice Site … Escript
Feature Dressing
4 How Good Is The Alignment ????
T-Coffee CORE Evaluation
CORE index Specificity ( ) and Sensitivity ( )
Feature Based Evaluation
Features mapping on multiple alignment T-coffee ATP binding site Glycine loop ATP binding site Glycine loop Non-synonymous SNP ClustalW
Structure Based Evaluation APDB
Include Sequences with Known Structures – Do Not use Structural Information Score 1 – Use Structural Information:Score 2 If Score1 ~ Score 2 – Structural Information does not help much – The alignment is of reasonnable quality
Evaluating a Multiple Sequence Alignment T-Coffee CORE index Feature Based Library APDB
Maninupulating and Comparing Alignments Reformating/Processing – seq_reformat – extract_from_pdb Coloring – seq_reformat – ESCript Comparing – aln_compare
5 Thinking Large ????
T-Coffee_dpa T-Coffee is limited to a small number of sequences T-coffee_dpa: Double Progressive Algo – Able to handle large datasets – 1000 sequences and more – Able to use structural information
Using A Multiple Sequence Alignment
1 Exploring The Alignment
Cdk's signature Cdk's T-loop (orange) and aurora's Activating loop Substrat recognition motif
2 Using The Alignment Does my Sequence Make Sense
Identifying Abnormalities within an MSA Insertion within the Nuc Binding Site…
Identifying Abnormalities within an MSA
Activation loop (orange)
Identifying Abnormalities within an MSA Retinoblastoma
2 Using The Alignment Analysing the Structure with The Alignment
The Evoltionnary Trace
3 Using The Alignment Spotting differences
What makes a CDK not and AurorA
4 Clustering and Correlating
Function Trees Vs Lead Trees 1-Select Functionnaly Important Positions 2-Make a tree based on these positions 3-Compare the tree with the lead tree PROBLEMS: – Choose on the right positions – Describe the Leads with the right determinants