Inferring Function From Known Genes Naomi Altman Nov. 06.

Slides:



Advertisements
Similar presentations
Linear Models for Microarray Data
Advertisements

Asking translational research questions using ontology enrichment analysis Nigam Shah
Fission Yeast Computing Workshop -1- Exercise 5: Looking for overreprsented GO terms in a gene set using Onto-Express GO annotations can be used to obtain.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Gene Ontology John Pinney
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Gene expression analysis summary Where are we now?
Microarrays Dr Peter Smooker,
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
Quantitative Genetics
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
with an emphasis on DNA microarrays
Automatic methods for functional annotation of sequences Petri Törönen.
GO::TermFinder Gavin Sherlock Department of Genetics Stanford University
Gene Set Enrichment Analysis (GSEA)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
CDNA Microarrays MB206.
Suppose we have analyzed total of N genes, n of which turned out to be differentially expressed/co-expressed (experimentally identified - call them significant)
SPH 247 Statistical Analysis of Laboratory Data 1May 14, 2013SPH 247 Statistical Analysis of Laboratory Data.
FMRI guided Microarray analysis Imaging-Guided Microarray: Isolating Molecular Profiles That Dissociate Alzheimer’s Disease from Normal Aging  A.C. Pereira,
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Gene expression analysis
UBio Training Courses Micro-RNA web tools Gonzalo
Primary Mets Node Patient 1Patient 2Patient 3 Primary Mets Node Patient 1Patient 2Patient 3 Primary Mets Node Patient 1Patient 2Patient 3 Primary Mets.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Hierarchical Bayesian Model Specification Model is specified by the Directed Acyclic Network (DAG) and the conditional probability distributions of all.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Statistics for Differential Expression Naomi Altman Oct. 06.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Statistical Testing with Genes Saurabh Sinha CS 466.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Proteomics, the next step What does each protein do? Where is each protein located? What does each protein interact with, if anything? What role does it.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
CuffDiff ran successfully. Output files include gene_exp.diff What are the next steps? Use Navigation bar to find files; they may be under DNA Subway if.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray: An Introduction
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Detecting DNA with DNA probes arrays. DNA sequences can be detected by DNA probes and arrays (= collection of microscopic DNA spots attached to a solid.
1 A Discussion of False Discovery Rate and the Identification of Differentially Expressed Gene Categories in Microarray Studies Ames, Iowa August 8, 2007.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Statistical Testing with Genes
Functional Genomics in Evolutionary Research
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Data Mining Functionalities (2)
Different Plant Hormones Regulate Similar Processes through Largely Nonoverlapping Transcriptional Responses  Jennifer L. Nemhauser, Fangxin Hong, Joanne.
Statistical Testing with Genes
Presentation transcript:

Inferring Function From Known Genes Naomi Altman Nov. 06

Objective There are 3 major objectives for microarray studies: 1) Understand the function of genes. 2) Understand a biological process. 3) Classify samples. 3 is "easy" since it is purely observational. 1&2 start from genetic methods (i.e. 1 gene at a time)

Genetic Methods Tools: "knock-out" genotypes in which the gene is mutated to be non-functional "tagged" genotypes in which a fluorescent tag is activated when the gene is activated "in situ" hybridization in which the gene product is labeled in thin sections which can be viewed under a microscope "genetically modified" strains in which foreign genes have been added "genetically modified" strains in which selected genes are over- or under-expressed

Using Known Genes There are several ways in which known genes can be used to infer the function of unknown genes in a microarray experiment. 1) Seeded clustering Assume that the nearest neighbors (according to some expression metric) of the known gene have similar function. 2) Unsupervised clustering majority rule - in a cluster consisting of both known and unknown genes, assume that the cluster has the majority function

Using Known Genes There are several ways in which known genes can be used to infer the function of unknown genes in a microarray experiment. 3) Pathway analysis If the genes are sufficiently well understood, they may be assembled into networks showing which genes regulate other genes. Unknown genes that have expression patterns similar to those in the network can be placed in the network. BioPixie (for yeast) will be demonstrated by 2 project groups. PathAssist ?

Understanding the Biological Process We also use the known genes to infer the biological processes underlying our experimental treatments. The primary tools are gene classification methods based on function and/or sequence. The most used tool is probably the Gene Ontology Project.

Gene Ontology Project (try entering "DNA repair" "ribosome" "protease" or the term of your choice under gene or protein: try "ap2"

How does GO "work" The terms are in a tree-like structure but some nodes have multiple "parents". The annotations are assigned by a team of collaborators and may also be submitted by other biologists (i.e. somewhat like Wikipedia, but with a more formalized central group) Each annotation also has an "evidence" annotation, which can be used to assess reliability.

Using GO with Microarray Data 1) Compile a list of differentially expressed genes. 2) Obtain the GO annotations of all genes on the array. 3) Extract the GO annotations of the DE genes. Determine which GO annotations are over- or under- represented among the DE genes.

Descriptive Use of GO

Testing For Enrichment InOut DEN 11 N 12 N1N1 not DEN 21 N 22 N2N2 N 1 N 2 N H 0 : The percentage of DE genes In the GO category is proportional to the number on the array in the category percentage on array: N 1 /N expected: N 1 N 1 /N observed: N 11 test (O-E) 2 /E (Chi-squared test)

Testing For Enrichment InOut DEN 11 N 12 N1N1 not DEN 21 N 22 N2N2 N 1 N 2 N Problems: 1) Multiple testing: e.g. the Ontology for humans contains over 1000 terms 2) The ontology categories are nested. 3) The test statistic assumes that the genes are selected independently (but they are generally dependent). 4) Large N can lead to spuriously small p-values.

GO in R The GOstats package in R will: take a genelist with Entrez IDs take a reference genelist with Entrez IDs take an annotation package (you can make your own) Find all the "significantly enriched" or "significantly depleted" GO categories. (The documentation was not very readable, but it was simple once I found an example.)

Using GOstats with the Human vs Chimp Brain Data The limma output was saved in efit.contrast. I selected all genes with highly significant differential expression among the treatments. select=efit.contrast$F.p.value< genes=efit.contrast$genes[select,] #affy probe id library(hgu95av2) w1<-as.list(hgu95av2ENTREZID) #Entrez id for every probeset on the array

Using GOstats with the Human vs Chimp Brain Data entrez=w1[genes] entrezID=unlist(entrez) params <- new("GOHyperGParams", geneIds = entrezID, universeGeneIds = unlist(w1), annotation = "hgu95av2", ontology = "BP", pvalueCutoff =.001, conditional = FALSE, testDirection = "over") HGOver=hyperGTest(params) #tests all nodes htmlReport(HG0ver,"h.html")#report on sig nodes summary(HG0ver) #report in R

Results Gene to GO BP test for over-representation GOBPID Pvalue OddsRatio ExpCount Count Size Term GO: establishment of blood-nerve barrier GO: nervous system development GO: system development GO: transmission of nerve impulse GO: central nervous system development GO: brain development GO: synaptic transmission GO: neurophysiological process GO: cell communication GO: peptide hormone secretion GO: insulin secretion GO: glutamate signaling pathway GO: cell-cell signaling GO: potassium ion transport GO: cyclic nucleotide metabolism N 11 N 21