Canadian Bioinformatics Workshops www.bioinformatics.ca.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Applications of GO. Goals of Gene Ontology Project.
25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI.
European Bioinformatics Institute The Gene Ontology Annotation (GOA) Database and enhancement of GO annotations through InterPro2GO Nicky Mulder
Gene Ontology John Pinney
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 13: Protein Function Centre for Integrative Bioinformatics.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
CACAO - Penn State Gene Function and Gene Ontology January 2011
Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Using The Gene Ontology: Gene Product Annotation.
Gene Ontology (GO) Project
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
Ontologies, data standards and controlled vocabularies.
Tutorial session 2 Network annotation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
The Gene Ontology: a real-life ontology, progress and future. Jane Lomax EMBL-EBI.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology Project
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database.
BIOINFORMATIK I UEBUNG 2 mRNA processing.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Gene Ontology Consortium
Central dogma: the story of life RNA DNA Protein.
Data Mining in Ensembl with BioMart Giulietta Spudich.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Gene Ontology TM (GO) Consortium
Joined up ontologies: incorporating the Gene Ontology into the UMLS.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Canadian Bioinformatics Workshops
Module 1: Gene List/Network Intro Canadian Bioinformatics Workshops
Module 1: Gene Lists 1 Canadian Bioinformatics Workshops
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
Gene Ontology (and Pathway) Analysis Bioinformatics for Biomedical Practitioners Statistics and Bioinformatics Research Group Statistics department, Universitat.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Gene Annotation & Gene Ontology
Canadian Bioinformatics Workshops
Annotating with GO: an overview
Introduction to the Gene Ontology
Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI 25th June 2007 Jane Lomax.
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Ensembl Genome Repository.
Gene expression analysis
Insight into GO and GOA Angelica Tulipano , INFN Bari CNR
Presentation transcript:

Canadian Bioinformatics Workshops

2Module #: Title of Module

Module 1 Introduction to Gene Lists

Module 1: Introduction to Gene Lists bioinformatics.ca Gene Lists Overview Interpreting gene lists Gene attributes – Gene Ontology Ontology Structure Annotation – BioMart + other gene attribute sources Gene identifiers and mapping

Module 1: Introduction to Gene Lists bioinformatics.ca Interpreting Gene Lists My cool new screen worked and produced 1000 hits! …Now what? Genome-Scale Analysis (Omics) – Genomics, Proteomics Tell me what’s interesting about these genes GenMAPP.org ? Ranking or clustering

Module 1: Introduction to Gene Lists bioinformatics.ca Interpreting Gene Lists Ranking or clustering Eureka! New heart disease gene! Prior knowledge about cellular processes Analysis tools My cool new screen worked and produced 1000 hits! …Now what? Genome-Scale Analysis (Omics) – Genomics, Proteomics Tell me what’s interesting about these genes – Are they enriched in known pathways, complexes, functions

Module 1: Introduction to Gene Lists bioinformatics.ca Where Do Gene Lists Come From? Molecular profiling e.g. mRNA, protein – Identification  Gene list – Quantification  Gene list + values – Ranking, Clustering (biostatistics) Interactions: Protein interactions, microRNA targets, transcription factor binding sites (ChIP) Genetic screen e.g. of knock out library Association studies (Genome-wide) – Single nucleotide polymorphisms (SNPs) – Copy number variants (CNVs) Other examples?

Module 1: Introduction to Gene Lists bioinformatics.ca What Do Gene Lists Mean? Biological system: complex, pathway, physical interactors Similar gene function e.g. protein kinase Similar cell or tissue location Chromosomal location (linkage, CNVs) Data

Module 1: Introduction to Gene Lists bioinformatics.ca Biological Questions Step 1: What do you want to accomplish with your list (hopefully part of experiment design! ) – Summarize biological processes or other aspects of gene function – Find a controller for a process (TF, miRNA) – Find new pathways or new pathway members – Discover new gene function – Correlate with a disease or phenotype (candidate gene prioritization) – Perform differential analysis – what’s different between samples?

Module 1: Introduction to Gene Lists bioinformatics.ca Biological Answers Computational analysis methods – Gene set analysis: summarize, hypothesis generating – Pathway and network analysis – Gene function prediction But first! Gene list basics…

Module 1: Introduction to Gene Lists bioinformatics.ca Gene Lists Overview Interpreting gene lists Gene attributes – Gene Ontology Ontology Structure Annotation – BioMart + other sources Gene identifiers and mapping

Module 1: Introduction to Gene Lists bioinformatics.ca Gene Attributes Available in databases Function annotation – Biological process, molecular function, cell location Chromosome position Disease association DNA properties – TF binding sites, gene structure (intron/exon), SNPs Transcript properties – Splicing, 3’ UTR, microRNA binding sites Protein properties – Domains, secondary and tertiary structure, PTM sites Interactions with other genes

Module 1: Introduction to Gene Lists bioinformatics.ca Gene Attributes Available in databases Function annotation – Biological process, molecular function, cell location Chromosome position Disease association DNA properties – TF binding sites, gene structure (intron/exon), SNPs Transcript properties – Splicing, 3’ UTR, microRNA binding sites Protein properties – Domains, secondary and tertiary structure, PTM sites Interactions with other genes

Module 1: Introduction to Gene Lists bioinformatics.ca What is the Gene Ontology (GO)? Set of biological phrases (terms) which are applied to genes: – protein kinase – apoptosis – membrane Dictionary: term definitions Ontology: A formal system for describing knowledge Jane EBI

Module 1: Introduction to Gene Lists bioinformatics.ca GO Structure Terms are related within a hierarchy – is-a – part-of Describes multiple levels of detail of gene function Terms can have more than one parent or child

Module 1: Introduction to Gene Lists bioinformatics.ca GO Structure cell membrane chloroplast mitochondrial chloroplast membrane is-a part-of Species independent. Some lower-level terms are specific to a group, but higher level terms are not

Module 1: Introduction to Gene Lists bioinformatics.ca What GO Covers? GO terms divided into three aspects: – cellular component – molecular function – biological process glucose-6-phosphate isomerase activity Cell division

Module 1: Introduction to Gene Lists bioinformatics.ca Terms Where do GO terms come from? – GO terms are added by editors at EBI and gene annotation database groups – Terms added by request – Experts help with major development – terms, >99% with definitions biological_process 2859 cellular_component 9531 molecular_function As of July 15, 2010

Module 1: Introduction to Gene Lists bioinformatics.ca Genes are linked, or associated, with GO terms by trained curators at genome databases – Known as ‘gene associations’ or GO annotations – Multiple annotations per gene Some GO annotations created automatically (without human review) Annotations

Module 1: Introduction to Gene Lists bioinformatics.ca Annotation Sources Manual annotation – Curated by scientists High quality Small number (time-consuming to create) – Reviewed computational analysis Electronic annotation – Annotation derived without human validation Computational predictions (accuracy varies) Lower ‘quality’ than manual codes Key point: be aware of annotation origin

Module 1: Introduction to Gene Lists bioinformatics.ca Evidence Types Experimental Evidence Codes EXP: Inferred from Experiment IDA: Inferred from Direct Assay IPI: Inferred from Physical Interaction IMP: Inferred from Mutant Phenotype IGI: Inferred from Genetic Interaction IEP: Inferred from Expression Pattern IEA: Inferred from electronic annotation For your information Author Statement Evidence Codes TAS: Traceable Author Statement NAS: Non-traceable Author Statement Curator Statement Evidence Codes IC: Inferred by Curator ND: No biological Data available Computational Analysis Evidence Codes ISS: Inferred from Sequence or Structural Similarity ISO: Inferred from Sequence Orthology ISA: Inferred from Sequence Alignment ISM: Inferred from Sequence Model IGC: Inferred from Genomic Context RCA: inferred from Reviewed Computational Analysis

Module 1: Introduction to Gene Lists bioinformatics.ca Species Coverage All major eukaryotic model organism species Human via GOA group at UniProt Several bacterial and parasite species through TIGR and GeneDB at Sanger New species annotations in development

Variable Coverage Lomax J. Get ready to GO! A biologist's guide to the Gene Ontology. Brief Bioinform Sep;6(3):

Module 1: Introduction to Gene Lists bioinformatics.ca Contributing Databases – Berkeley Drosophila Genome Project (BDGP) Berkeley Drosophila Genome Project (BDGP – dictyBase (Dictyostelium discoideum) dictyBase – FlyBase (Drosophila melanogaster) FlyBase – GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum, Leishmania major and Trypanosoma brucei) GeneDBSchizosaccharomyces pombe – UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro databases UniProt KnowledgebaseInterPro – Gramene (grains, including rice, Oryza) Gramene – Mouse Genome Database (MGD) and Gene Expression Database (GXD) (Mus musculus) Mouse Genome Database (MGD) and Gene Expression Database (GXD) – Rat Genome Database (RGD) (Rattus norvegicus) – Reactome Reactome – Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae) Saccharomyces Genome Database (SGD) – The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana) The Arabidopsis Information Resource (TAIR) – The Institute for Genomic Research (TIGR): databases on several bacterial species The Institute for Genomic Research (TIGR) – WormBase (Caenorhabditis elegans) WormBase – Zebrafish Information Network (ZFIN): (Danio rerio) Zebrafish Information Network (ZFIN) For your information

Module 1: Introduction to Gene Lists bioinformatics.ca GO Slim Sets GO has too many terms for some uses – Summaries (e.g. Pie charts) GO Slim is an official reduced set of GO terms – Generic, plant, yeast Crockett DK et al. Lab Invest Nov;85(11):

Module 1: Introduction to Gene Lists bioinformatics.ca GO Software Tools GO resources are freely available to anyone without restriction – Includes the ontologies, gene associations and tools developed by GO Other groups have used GO to create tools for many purposes –

Module 1: Introduction to Gene Lists bioinformatics.ca Accessing GO: QuickGO

Module 1: Introduction to Gene Lists bioinformatics.ca Other Ontologies

Module 1: Introduction to Gene Lists bioinformatics.ca Gene Attributes Function annotation – Biological process, molecular function, cell location Chromosome position Disease association DNA properties – TF binding sites, gene structure (intron/exon), SNPs Transcript properties – Splicing, 3’ UTR, microRNA binding sites Protein properties – Domains, secondary and tertiary structure, PTM sites Interactions with other genes

Module 1: Introduction to Gene Lists bioinformatics.ca Sources of Gene Attributes Ensembl BioMart (eukaryotes) – Entrez Gene (general) – Model organism databases – E.g. SGD: Many others: discuss during lab

Ensembl BioMart Convenient access to gene list annotation Select genome Select attributes to download Select filters

Module 1: Introduction to Gene Lists bioinformatics.ca What Have We Learned? Many gene attributes in databases – Gene Ontology (GO) provides gene function annotation GO is a classification system and dictionary for biological concepts Annotations are contributed by many groups More than one annotation term allowed per gene Some genomes are annotated more than others Annotation comes from manual and electronic sources GO can be simplified for certain uses (GO Slim) Many gene attributes available from Ensembl and Entrez Gene

Module 1: Introduction to Gene Lists bioinformatics.ca Gene Lists Overview Interpreting gene lists Gene function attributes – Gene Ontology Ontology Structure Annotation – BioMart + other sources Gene identifiers and mapping

Module 1: Introduction to Gene Lists bioinformatics.ca Gene and Protein Identifiers Identifiers (IDs) are ideally unique, stable names or numbers that help track database records – E.g. Social Insurance Number, Entrez Gene ID Gene and protein information stored in many databases –  Genes have many IDs Records for: Gene, DNA, RNA, Protein – Important to recognize the correct record type – E.g. Entrez Gene records don’t store sequence. They link to DNA regions, RNA transcripts and proteins e.g. in RefSeq, which stores sequence.

Module 1: Introduction to Gene Lists bioinformatics.ca NCBI Database Links NCBI: U.S. National Center for Biotechnology Information Part of National Library of Medicine (NLM) For your information

Module 1: Introduction to Gene Lists bioinformatics.ca Common Identifiers Species-specific HUGO HGNC BRCA2 MGI MGI: RGD 2219 ZFIN ZDB-GENE FlyBase CG9097 WormBase WBGene or ZK SGD S or YDL029W Annotations InterPro IPR OMIM Pfam PF09104 Gene Ontology GO: SNPs rs Experimental Platform Affymetrix _3p_s_at Agilent A_23_P99452 CodeLink GE60169 Illumina GI_ S Gene Ensembl ENSG Entrez Gene 675 Unigene Hs RNA transcript GenBank BC RefSeq NM_ Ensembl ENST Protein Ensembl ENSP RefSeq NP_ UniProt BRCA2_HUMAN or A1YBP1_HUMAN IPI IPI EMBL AF PDB 1MIU Red = Recommended For your information

Module 1: Introduction to Gene Lists bioinformatics.ca Identifier Mapping So many IDs! – Mapping (conversion) is a headache Four main uses – Searching for a favorite gene name – Link to related resources – Identifier translation E.g. Genes to proteins, Entrez Gene to Affy – Unification during dataset merging Equivalent records

Module 1: Introduction to Gene Lists bioinformatics.ca ID Mapping Services Synergizer – gizer/translate/ Ensembl BioMart – PICR (proteins only) –

Module 1: Introduction to Gene Lists bioinformatics.ca ID Mapping Challenges Avoid errors: map IDs correctly Gene name ambiguity – not a good ID – e.g. FLJ92943, LFS1, TRP53, p53 – Better to use the standard gene symbol: TP53 Excel error-introduction – OCT4 is changed to October-4 Problems reaching 100% coverage – E.g. due to version issues – Use multiple sources to increase coverage Zeeberg BR et al. Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics BMC Bioinformatics Jun 23;5:80

Module 1: Introduction to Gene Lists bioinformatics.ca Recommendations For proteins and genes – (doesn’t consider splice forms) Map everything to Entrez Gene IDs using a spreadsheet If 100% coverage desired, manually curate missing mappings Be careful of Excel auto conversions – especially when pasting large gene lists!

Module 1: Introduction to Gene Lists bioinformatics.ca What Have We Learned? Genes and their products and attributes have many identifiers (IDs) Genomics requirement to convert or map IDs from one type to another ID mapping services are available Use standard, commonly used IDs to reduce ID mapping challenges

Module 1: Introduction to Gene Lists bioinformatics.ca Lab: Gene IDs, Attributes and Networks Use yeast demo gene list – Learn about gene identifiers – Learn to use Synergizer and BioMart Do it again with your own gene list – If compatible with covered tools, run the analysis. If not, instructors will recommend tools for you.

Module 1: Introduction to Gene Lists bioinformatics.ca Lab – Until 12:30pm Use yeast demo dataset Convert Gene IDs to Entrez Gene – Use Synergizer Get GO annotation + evidence codes – Use Ensembl BioMart – Summarize terms & evidence codes in a table

Module 1: Introduction to Gene Lists bioinformatics.ca Lunch On your own E.g. Cafeteria Downstairs Back at 1:30pm