Large-scale Prediction of Yeast Gene Function Introduction to Bio-Informatics Winter Roi Adadi Naama Kraus
Main Question Predict the function of hypothetical proteins which are inferred by genome sequencing Annotate proteins at one of possible three levels ◦ Function ◦ Biological process ◦ Cellular localization
Process Cluster the gene expressions using EPCLUST Two possible directions: Direction 1 ◦ Choose some "nice" cluster (e.g. a tied cluster) ◦ Identify a common function F using GO ◦ Search for hypothetical proteins in the cluster ◦ Predict their function as F ◦ Validate the prediction using other methods Use Blast to search for homologous proteins, do they contain F ? Use Meme/Pfam to identify a common Motif/Domain, does it relate to F ?
Process – cont’d Direction 2 ◦ Decide on some function of interest and search for a cluster where this function is common Identify a cluster with a significant localization function Look for a significant motif/domain in the mRNA UTRs of the sequences in the cluster using MEME/Pfam Search the motif/domain in other proteins, do they localize at the same location ?