Microarray statistical validation and functional annotation

Slides:



Advertisements
Similar presentations
Linear Models for Microarray Data
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Microarray Data Analysis Day 2
Asking translational research questions using ontology enrichment analysis Nigam Shah
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms Jan Komorowski and Astrid Lägreid.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Extraction and comparison of gene expression patterns from 2D RNA in situ hybridization images BIOINFORMATICS Gene expression Vol. 26, no. 6, 2010, pages.
Using Gene Ontology Models and Tests Mark Reimers, NCI.
COG and GO tutorial.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Tutorial 5 Motif discovery.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Sai Moturu. Introduction Current approaches to microarray data analysis –Analysis of experimental data followed by a posterior process where biological.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Fuzzy K means.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab.
Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.
Pathway analysis Daniel Hurley Pathway analysis: summary A popular buzzword… but what does it mean? A popular buzzword… but what does it mean? How do.
Analysis of microarray data
Multiple testing correction
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.
Evolva Biotech SA Microarray and Macro opportunities for Discovery informatics Head of Informatics Mobile.
Automatic methods for functional annotation of sequences Petri Törönen.
1 Identifying differentially expressed sets of genes in microarray experiments Lecture 23, Statistics 246, April 15, 2004.
Gene Set Enrichment Analysis (GSEA)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
) Linked2Safety Project (FP7-ICT – 5.3 ) A NEXT-GENERATION, SECURE LINKED DATA MEDICAL INFORMATION SPACE FOR SEMANTICALLY-INTERCONNECTING ELECTRONIC.
Assay Development Breakout (red) Who was in the room? About half of attendees are active NGS users N=1 doing whole genome analyses Everyone else doing.
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Finish up array applications Move on to proteomics Protein microarrays.
Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
PaLS: Pathways and Literature Strainer Filtering common literature, ontology terms and pathway information. Andrés Cañada Pallarés Instituto Nacional de.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Central dogma: the story of life RNA DNA Protein.
Statistical Testing with Genes Saurabh Sinha CS 466.
09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Introducing “Proteins” into Genetic Algorithms Virginie LEFORT, Carole KNIBBE, Guillaume.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Cluster validation Integration ICES Bioinformatics.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
GO enrichment and GOrilla
Shankar Subramaniam University of California at San Diego Data to Biology.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Other uses of DNA microarrays
Biases and their Effect on Biological Interpretation
Statistical Testing with Genes
Artefacts and Biases in Gene Set Analysis
Artefacts and Biases in Gene Set Analysis
Volume 132, Issue 6, Pages (March 2008)
Statistical Testing with Genes
Presentation transcript:

Microarray statistical validation and functional annotation

Microarrays DNA microarray technology is an high throughput method for gaining information on gene function. Microarray technology is based on the availability of gene sequences arrayed on a solid surface and it allows parallel expression analysis of thousands of genes.

Microarrays Microarray can be a valuable tool to define transcriptional signatures bound to a pathological condition to rule out molecular mechanisms tightly bound to transcription Since our actual knowledge on genes function in high eukaryotes is quite limited Microarray analysis frequently does not imply a final answer to a biological problem but allows the discovery of new research paths which let to explore it by a different perspective

Microarrays A gold standard methodology to identify, with high sensitivity and precision, “biologically meaningful” differentially expressed genes is not yet available. Therefore, various approaches are under development to optimize the extraction of data linked to the “biology” of the problem under study.

Microarrays The principal steps of a microarray analysis are: Gene intensity measurements and data normalization. Statistical validation of differential expression. Functional data mining.

Microarrays Statistical validation usually implies the selection from the user of statistical significance parameters. For example: SAM (Significance Analysis of Microarrays) always requires the input of a “delta” value which defines the threshold of false positive in the validated dataset. If the stringency of the statistical validation is too high biologically meaningful genes can be lost making more difficult to role out functional correlations between the differentially expressed genes. If the stringency of the statistical validation is too loose the increase of false positives creates background noise from which is difficult to extract trustful functional correlations between the differentially expressed genes.

Microarrays

Microarrays

Microarrays Statistical validation implies the selection from the user of statistical significance parameters. For example: SAM (Significance Analysis of Microarrays) requires the definition of a “delta” value which defines the threshold of false positive in the validated dataset. When Fisher’s test is used the definition of a threshold value is even more hard.

Microarrays

Microarrays It is important to remark that: A statistical validation not always implies the selection of the most “biologically” meaningful dataset Therefore we are trying to integrate “biologically” important parameters, as Gene ontology, in the statistical validation.

Microarrays Gene Ontology (GO) is a dynamic controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing. GO might help to link differentially expressed genes to specific functional classes.

Microarrays Molecular Function: the tasks performed by individual gene, products; examples are transcription factor and DNA helicase.

Microarrays Biological Process: broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions

Microarrays Cellular Component: subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and origin recognition complex

Microarrays Recently has been shown that: There is a strong instability of the size and overlap of the gene lists that result from varying gene selection methods. (Hosack et al, Genome Biology 2003, 4:P4)

Microarrays The percentage of genes overlapping in any two lists was highly variable, and ranged from 7% to 60%. (Hosack et al, Genome Biology 2003, 4:P4)

Microarrays In spite of this striking variation: The top five biological biologically themes linked to the data sets are the same. This evidence suggests that the conversion of genes to themes favour the "biological result" of the experiment to be determined despite substantial differences in gene list content resulting from the use of various normalization, gene intensity and statistical selection methods. (Hosack et al, Genome Biology 2003, 4:P4)

Microarrays (Hosack et al, Genome Biology 2003, 4:P4)

Microarrays Integrating GO in statistical validation: The number of GO classes are counted in the data set under statistical validation. SAM analyses are performed using various delta parameters. The GO classes present in the statistically validated subsets are counted. The presence of enrichment of GO classes in the SAM validated sets is evaluated using a binomial test corrected for Type I errors. A score for each GO class is generated performing the log2(p-value * % hits) The SAM subset showing the best compromise between number of enriched GO classes and number of HITs for each class is selected for further studies

Atypical hyperplasia and in situ carcinomas CONCORDANT MORPHOLOGIC AND GENE EXPRESSION DATA SHOW THAT A VACCINE FREEZES HER-2/neu PRENEOPLASTIC LESIONS Atypical hyperplasia and in situ carcinomas 10 wks Cured mammary gland 22 wks Lobular carcinoma 22 wks (Quaglino et al submitted)

Microarrays log2(p-value * %HITs)

Microarrays We observed that: simple statistical validation and statistical validation mediated by GO classes analysis have strong overlap. However, some interesting differentially expressed genes can be only detected using GO mediated statistical validation.

Ig-linked immuno response common to simple statistical analysis b c d -3.0 3.0 1:1 e Ig-linked immuno response common to simple statistical analysis and GO-mediated statistical validation Cell-linked immuno response specific of GO-mediated statistical validation

AM is over-represented We also observed that the previously described approach can also be used to improve data mining related to the transcriptional signature present in co-regulated gene Subsets of SAM validated genes (SSVG) Consensus program Alignment matrices (AMs) Patser Starting dataset (SD) SAM Any AM is over-represented in SSVG? Selected SSVG Yes No Discard Run SAM with at least 3 different threshold? min(AMs specific p-value) Filtering by AMs specific P-value (a) (b) (c) (d) (e) (f) (g) (h) (i) (l) (m) (n)