Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.

Slides:



Advertisements
Similar presentations
Asking translational research questions using ontology enrichment analysis Nigam Shah
Advertisements

CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Gene Ontology John Pinney
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Getting the numbers comparable
Gene expression analysis summary Where are we now?
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Statistical Analysis of Microarray Data
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Internet tools for genomic analysis: part 2
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Introduction to DNA microarrays DTU - January Hanne Jarmer.
Scanning and image analysis Scanning -Dyes -Confocal scanner -CCD scanner Image File Formats Image analysis -Locating the spots -Segmentation -Evaluating.
Epistasis Analysis Using Microarrays Chris Workman.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
Automatic methods for functional annotation of sequences Petri Törönen.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Gene Set Enrichment Analysis (GSEA)
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Introduction to DNA microarrays DTU - May Hanne Jarmer.
Inferring Function From Known Genes Naomi Altman Nov. 06.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Statistical Testing with Genes Saurabh Sinha CS 466.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Statistical Analysis of Microarray Data By H. Bjørn Nielsen.
Introduction to Microarrays. The Central Dogma.
Flat clustering approaches
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
2/3/2005 Gene Ontology (GO) The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions.
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
GO : the Gene Ontology & Functional enrichment analysis
Statistical Testing with Genes
Analysis of GO annotation at cluster level by Agnieszka S. Juncker
Overview Gene Ontology Introduction Biological network data
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Gene Expression Analysis
Statistical Testing with Genes
Presentation transcript:

Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU

The DNA Microarray Analysis Pipeline Sample Preparation Hybridization Array design Probe design Experimental Design Buy standard Chip / Array Statistical Analysis Fit to Model (time series) Expression Index Calculation Advanced Data Analysis ClusteringPCAGene Annotation AnalysisPromoter Analysis ClassificationMeta analysisSurvival analysisRegulatory Network Comparable Gene Expression Data Normalization Image analysis Question/hypothesis

Gene Ontology Gene Ontology (GO) is a collection of controlled vocabularies describing the biology of a gene product in any organism Very useful for interpreting biological function of microarray data Organized in 3 independent sets of ontologies in a tree structure –Molecular function (MF), Biological process (BP), Cellular compartment (CC)

Tree structure Controlled networked terms (total ~25.000) –Parent / child network organized as a tree –Terms get more detailed as you move down the network

Relationship A gene can be –present in any of the ontologies (MF / BP / CC) –a member of several GO terms True path rule –If a gene is member of a term it is also member of the terms parents

GO Tree example visit for more informationwww.geneontology.org

KEGG KEGG PATHWAYS: –Manually drawn pathway maps representing our knowledge on the molecular interaction and reaction networks, for a large selection of organisms 1. Metabolism 2. Genetic Information Processing 3. Environmental Information Processing 4. Cellular Processes 5. Human Diseases 6. Drug Development Other pathway database: Reactome

KEGG example

Using Gene ontology Input: Any list of genes; from microarray exp. –Cluster of genes with similar expression –Up/down regulated genes Question we ask: –Are any GO terms overrepresented in the gene list, compared to what would happen by chance? Method –Hypergeometric testing

The hypergeometric distribution arises from sampling from a fixed population. 10 balls We want to calculate the probability for drawing 7 or more white balls out of 10 balls given the distribution of balls in the urn 20 white balls out of 100 balls Hypergeometric test

Example List of 80 significant genes from a microarray experiment of yeast (~ 6000 genes) 10 of the 80 genes are in BP-GO term: DNA replication –Total nr of yeast genes in GO term is 100 What is the probability of this occurring by chance? The GO term DNA replication is overrepresented in our list 100 white balls out of 6000 balls 10 x 70 x Total 80 balls p = 6.2 * 10 -8