1 Using Gene Ontology
2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented functional categories in the clusters (i.e., cluster contains may more genes known to be involved in a specific biological process than would be expected by chance) Why do you want to be able to do this? –If you find a gene of unknown function in a cluster of genes in which a known function is overrepresented, your gene of unknown function may have the same or a related function! Requirements for systematic analysis: –Standard assignment of genes into functional categories Gene Ontology or GO project at NCBI –Controlled vocabulary for describing biological processes (protein biosynthesis\translation, apoptosis\programmed cell death)
3 Gene Ontology (GO) project Purpose: 1) Establish a unified framework for organism-independent gene annotation 2) Define controlled terms (ontologies) for description of gene products from 3 aspects: Biological process (DNA repair, mitosis) Molecular function (protein serine/threonine kinase activity, transcription factor activity) Cellular component (nucleus, ribosome) Characteristics: 1) A gene can have multiple associations in each ontology 2) GO terms are organized in hierarchical structures called directed acyclic graphs (DAGs) - The most general classifications are at top levels of the graph - More specialized classifications at lower levels
4 Hierarchical classification scheme for proteins that function in M-phase of mitosis Any one gene can be a member of more than one GO classification
5 Example: Cluster 3, 95 genes
6 Identifying enriched GO categories in clusters In the previous example: –Total number of chip’s genes with annotation = 5000 Remember, some genes may be of unknown function and not be annotated. –Total number of chip’s genes associated with metabolism GO category = 3,600 (72%) –Number of annotated genes in cluster 3 = 73 –Number of metabolic genes in cluster 3 = 50 (68%) Is it reasonable to assume that genes in Cluster 3 are enriched for metabolic function? Statistical tests are essential to determine whether enrichment of a certain class of proteins is significant
7 Online Databases that annotate genes by GO Human –Entrez –GOA Mouse – Mouse Genome Informatics (MGI) – Rat – Rat Genome Database – Fly – FlyBase – Arabidopsis – TAIR – Yeast – Sacchromaces Genome Database – Affymetrix chips – Netaffx –