Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene Annotation & Gene Ontology May 24, 2016. Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.

Similar presentations


Presentation on theme: "Gene Annotation & Gene Ontology May 24, 2016. Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following."— Presentation transcript:

1 Gene Annotation & Gene Ontology May 24, 2016

2 Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following information? Gene name or symbol Ratio between groups (UP or DOWN) One or more database IDs (accession numbers) How do you figure out the role of the genes in the model you are studying?

3 Gene annotation Process of assigning descriptions to a transcript or gene product. Includes: –Official gene symbol & name –Protein features: domains, functional elements such as nuclear localization signals –Predicted molecular function, biological process and cellular location –Experimentally derived information function, process and cellular location –References –....

4 Who does the gene annotation? Refseq & Gene databases –NCBI staff Ensemble databases –http://useast.ensembl.orghttp://useast.ensembl.org –EMBL & Welcome Trust at Sanger Institute Uniprot –Staff at European Bioinformatics Institute (EBI), Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR) Yeast DB, FlyBase, Mouse Genome Informatics (MGI) & other organism specific databases

5 Gene record for BEST1

6 Ensembl Gene record for BEST1

7 Uniprot record for BEST1

8 Gene, Ensembl or Uniprot? What information are you looking for? Comfort level with the interface All have a little to LOTS of information Use as a starting point

9 Dealing with gene lists How can you efficiently categorize the genes in in some biologically meaningful way? Batch download data from Gene or Uniprot and do a lot of reading? PubMed? One approach is to use meta-data in the form of terms assigned to each gene that describe its molecular function, participation in a biological process and its location in a cellular component

10 Gene Ontology Set of standard biological phrases (terms) which are applied to genes/proteins: –protein kinase –apoptosis –Membrane Attempt to standardize the representation of genes and gene product attributes across species and databases Maintained by Gene Ontology consortium –http://geneontology.org/http://geneontology.org/ –Individual groups contribute taxonomic specific terms

11 Cellular Component Where a gene product acts Mitochondria

12 Cellular Component Cellular components of a virus different than a cell

13 Cellular Component Enzyme complexes in the component ontology refer to places, not activities.

14 Molecular Function Activities or “ jobs ” of a gene product glucose-6-phosphate isomerase activity

15 Molecular Function insulin binding insulin receptor activity

16 Molecular Function A gene product may have several functions Sets of functions make up a biological process.

17 Biological Process a commonly recognized series of events cell division

18 Biological Process transcription

19 Biological Process regulation of gluconeogenesis

20 Biological Process limb development

21 Why use gene ontology? Allows biologists to make queries across large numbers of genes without researching each one individually Can find all the PI3 kinases in a given genome or find all proteins involved in oxidative stress response without prior knowledge of every gene

22 MAPK14 GO biological process: –3’UTR mediated mRNA stabilization –DNA damage checkpoint –Ras protein signal transduction GO molecular function: –ATP binding –MAP kinase activity –MAP kinase kinase activity GO cellular component –Cytoplasm –Extracellular exosome –Nucleoplasm

23 Generally biological process terms are more useful for putting gene lists into a context There are more GO terms assigned to process than to function or component Fewest terms assigned to component Function in the absence of any process information can imply a biological role – i.e. you are looking for transcription factors responsible for some response Gene Ontology for analysis

24

25 Ontology Structure Terms are linked by two relationships –is-a  –part-of 

26 Ontology Structure cell membrane chloroplast mitochondrial chloroplast membrane is-a part-of

27 is_a DNA binding is a type of nucleic acid binding. GO structure GO isn’t just a flat list of biological terms terms are related within a hierarchy Nucleic acid binding is a type of binding.

28 GO structure gene A A single gene associated with with a particular term is automatically annotated to all of the parent terms

29 GO structure This means genes can be grouped according to user-defined levels Allows broad overview of gene set or genome You can use the level of granularity that makes most sense

30 GO terms a name term: transcription initiation definition: Processes involved in the assembly of the RNA polymerase complex at the promoter region of a DNA template resulting in the subsequent synthesis of RNA from that promoter. a definition id: GO:0006352 an ID number Each concept has:

31 GO terms assigned to MAPK14

32 Types of evidence codes Experimental:

33 Computational: Types of evidence codes

34 Other evidence codes Types of evidence codes

35 Manual annotation In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… Molecular function Cellular component Biological process

36 Electronic Annotation Annotation derived without human validation –mappings file e.g. interpro2go, ec2go. –Blast search ‘ hits ’ Lower ‘ quality ’ than manual codes Used in non-model organisms

37 GO & analysis of gene lists www.geneontology.org –Maintains the databases of GO terms, serves a clearing house for terms as they are assigned in new organisms Tools for exploring gene lists using GO: –WebGestalt, gProfiler, Onto-Express, and GSEA to name a few –DAVID is a suite of tools for gene enrichment analysis that also includes GO. –We’ll use both DAVID and WebGestalt to explore our gene list

38 Gene Ontology tools input a gene list shows which GO categories have most genes associated with them or are “enriched” provides a statistical measure to determine whether enrichment is significant

39 Using GO in practice statistical measure –how likely your differentially regulated genes fall into that category by chance microarray 1000 genes experiment100 genes differentially regulated mitosis – 80/100 apoptosis – 40/100 Cell proliferation – 30/100 glucose transport – 20/100

40 Using GO in practice However, when you look at the distribution of all genes on the microarray: Proportions analysis –Chi-squared or Fisher’s exact test ProcessGenes on array # genes expected (out of 100) # genes observed Mitosis800/100080 Apoptosis400/100040 Cell proliferation100/10001030 Glucose transport50/1000520

41 Other sources of annotation Uniprot (Swiss-Prot) keywords Protein domain databases –PFAM, Panther, PDB, PROSITE, ect GeneDB summaries from NCBI Protein-protein interactions databases Pathway databases –KEGG, BioCarta, BBID, Reactome DAVID incorporates annotation from all of these and clusters the redundant terms

42 Today in computer lab Tutorial on using DAVID Tutorial on using WebGestalt Analysis of gene lists using DAVID and at least one other GO term enrichment tool


Download ppt "Gene Annotation & Gene Ontology May 24, 2016. Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following."

Similar presentations


Ads by Google