Global Annotation of the Protein Kinase Family Michael Gribskov University of California, San Diego
Signaling Cascades
Statistics Arabidopsis 1028 putative kinase 58 Potentially alternatively spliced 82 % confirmed by full length cDNA Less than 100 experimentally investigated Rice 1565 putative kinases What are the functions of each protein kinase? Functional groupings Substrate prediction Pathway analysis and modeling
Targets Protein kinase Protein phosphatase Membrane transporters Proteasome complex
Some Receptor Kinases Class I (EGF receptor) Class II (Insulin receptor) Class III (FGF receptor)
Requirements for Functional Clustering Must handle very large number of objects (over 1200 for plants, over 9000 for all species) Must deal sensibly with paralogs from functional point of view Must be based on entire sequence, not just kinase catalytic domain Must be tolerant to sequence errors and omissions
Orthology vs Paralogy Relationships between genes in multigene families are complex Multiple genes may exist before speciation Genes may be lost and replaced along lineages “Function space” must be filled Species A Species B
Clustering
Clustering/Classification Maximum linkage
Clustering/Classification Pairwise distances All-against-all BLAST Uses entire sequence Alignments not required Longer matches, i.e. more domains, give better score
Basic Approach Maximum linkage clustering up to “natural” limit Recalculate average distances between groups Repeat until tree is complete
Complete Kinase Clustering
Statistics Class 1: RLKs (transmembrane) and RLCKs Class 2: “Raf-like” Class 3: Casein Kinase and CLK Class 4: Non-TM, Non-Receptor
BLAST Distance Entire Sequence
BLAST Distance Non-Kinase Domain
Yeast Signaling (MAPK)
Validating Transgenomic Predictions
SnRK At AKIN10 and AKin11 Rescue yeast SNF1 deletion Functional homolog
MAPK
MEME PSSM
PPC4.2.6 MEME Motifs
Summary Functional groups by clustering Functional assignment by transgenomic comparison Directed search for functional motifs by motif comparison Construction of public data resources
Bioinformatics Group Michael Gribskov Fariba Fana Degeng Wang Sheila Podell Tobey Tam * Jason Tchieu * Hannes Niedner Douglas Smith Guangfa Zhang * Jeff Harper Major Contributors Catherine Chan Alice Harmon Estelle Hrabak David Kerk Shinhan Shiu