Microarray Analysis Software Maximiliano Corredor Institute of Biology, Leiden University.

Slides:



Advertisements
Similar presentations
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
Advertisements

Introduction to Microarray Analysis and Technology Dave Lin - November 5, 2001.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Microarray Data Analysis Stuart M. Brown NYU School of Medicine.
TIGR Spotfinder: a tool for microarray image processing
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Extraction and comparison of gene expression patterns from 2D RNA in situ hybridization images BIOINFORMATICS Gene expression Vol. 26, no. 6, 2010, pages.
Getting the numbers comparable
Microarrays Dr Peter Smooker,
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray Data Preprocessing and Clustering Analysis
Gene Expression Data Analyses (3)
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Packard BioScience. Packard BioScience What is ArrayInformatics?
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
Microarray Analysis Software at NIH. BRB ArrayTools Visualization and Statistical analysis of gene expression data Features –Excel Add-in –Flexible Data.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Introduce to Microarray
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Analysis of microarray data
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Image Quantitation in Microarray Analysis More tomorrow...
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Copyright 2000, Media Cybernetics, L.P. Array-Pro ® Analyzer Software.
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Agenda Introduction to microarrays
Dr Paul Lewis Lecturer in Bioinformatics Lecturer in Bioinformatics Cardiff University Cardiff University Biostatistics & Bioinformatics Unit Biostatistics.
Microarray - Leukemia vs. normal GeneChip System.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Scenario 6 Distinguishing different types of leukemia to target treatment.
3/24/2005 TIGP 1 Bioinformatics for Microarray Studies at IBS Pei-Ing Hwang, Ph.D. Mar. 24, 2005.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
ImArray - An Automated High-Performance Microarray Scanner Software for Microarray Image Analysis, Data Management and Knowledge Mining Wei-Bang Chen and.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Analyzing Expression Data: Clustering and Stats Chapter 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Microarray Data Analysis The Bioinformatics side of the bench.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray: An Introduction
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Microarray - Leukemia vs. normal GeneChip System.
Getting the numbers comparable
Presentation transcript:

Microarray Analysis Software Maximiliano Corredor Institute of Biology, Leiden University

Steps of a Microarray Experiment RNA cDNA RT cDNA-Cy3 / -Cy5 labeling hybridization Image Processing Genomic sequence / EST library sequence Annotation Probe design Statistical Analysis

Bioinformatic steps of MA experiments Probre design Image processing (with QC) Normalisation (with QC) Statistical analysis and data mining Database management

Probe design software Array Designer - a software that can design hundreds of primer for DNA or oligonucleotide microarrays, product of Premier Biosoft.Array DesignerPremier Biosoft OligoArray2 - a free software that computes gene specific oligonucleotides for genome-scale oligonucleotide microarray construction.OligoArray2 OligoWiz2 Server - server for designing oligonucleotide probes for microarrays.OligoWiz2 Server ProbeWiz Server - The CBS ProbeWiz WWW server predicts optimal PCR primer pairs for generation of probes for cDNA arrays.ProbeWiz Server Primer3 - a common used software for designing primers for microarray construction.Primer3

Image processing Addressing: estimate location of spot centers Segmentation: classify pixels as foreground or background Information Extraction: for each spot on the array and each channel Foreground intensities Background intensities quality measures

Image processing software GenePix ProGenePix Pro (Axon Instruments) for Windows Spot identification, scatter plot, histogram, normalization, quality control ScanArrayScanArray (PerkinElmer) for Windows Quantitation, spot quality measures and normalization ScanAlyzeScanAlyze (Eisen's lab, Lawrence Berkeley National Lab (LBNL). For Windows Process fluorescent images of microarrays. Semi-automatic definition of grids and complex pixel and spot analyses. Free for academic TIGR SpotfinderTIGR Spotfinder (TIGR) for Windows Spot identification; Microarray image processing. Free

Image processing with GenePix

QC: Background substraction Background arises from glass autofluorescence, dust particles or washing defects BG and specific hybridisation are assumed additive (but look at the image!!) Low background can be substracted from the average intensity of the spot. High background features should be removed from analysis: artificial saturation may occur and therefore the maximum measure is not the addition of background and real specific intensity. Features with high negative intensities after background substraction (like those of the image) should also be removed. Features with background similar to spot intensity will give a normal distribution centered in 0 intensity and can therefore be considered absent.

Background correction Different types of background substraction Possibility of flagging features that don’t match our QC criteria: - high background intensity - % of pixels above background - background higher than foreground

QC: Histogram and scatterplot The intensities should follow a normal distribution with: –Natural lower limit: only positive intensities exist (minimum RNA concentration is 0) –Long tail to the higher intensities –Artificial upper limit: saturation of detector and/or TIFF file. This can cause an accumulation of points at the highest intensity This effect can also be observed in the scatterplot

QC: Std. Dev. vs. Avg Good spots should be homogenous: low standard deviation Linear correlation std. dev. vs average Higher std dev = variability within spot Lower std dev = uniformity within spot (saturation)

Sources of technical variability Chip production efficiencies of -RNA extraction -reverse transcription -labeling -photodetection SYSTEMATIC Calibration can correct for them PCR yield DNA quality spotting efficiency, spot size cross-/unspecific hybridization stray signal STOCHASTIC Error model normalization

Normalisation Several assumptions: –Normal distribution of intensities –All channels behave equally Centering and scaling: –Intensities are transformed in a way that the averages and ranges are the same (and therefore comparable) Within hyb normalisation: –In two channel data, both channels are centered and scaled. –More complex normalisations may be needed in order to ensure linearity along all intensities range. Between hybs normalisation: –Every time that two or more different chips are going to be compared, it’s necessary that all of them are centered and scaled –Normalisation should be made taking into account the experimental design; error model must include distinction between experimental units, biological replicates and technical replicates

Normalisation software Basic normalisation within hybridisation is possible in GenePix Acuity includes more advanced normalization algorithms (Lowess, etc) Rosetta implements several pipelines for normalization –Within hybs when uploaded to the database, using manufacturer indications for developing their error-models (providing therefore with p-values) –Between hybs when compared to each other (centering and scaling)

QC: M vs A M stands for Log(Ratio); A is the product of the Log(Intensity) of both channels. If the two channels behave symmetrically, everything is OK. Otherwise, we may have dye bias It is very common to find such deviations in the tails of the distribution (lowess normalisation can help here).

QC: M vs A Before normalisation (left), average ratio was higher than 0. Intensity saturation of one channel produces skewed tail. This effect is not removed with normalisation, requires calibration of the image acquisition (or elimination of saturated spots from analysis)

QC and basic statistics software Some image processing packages include basic statistics functions, like GenePix Numerous stand-alone programs and plug-ins or scripts for more general statistical packages, like R/Bioconductor, Matlab, SPSS, MS Excel… All microarray analysis packages include this functions and many more

Database systems AcuityAcuity (Axon Instruments) Runs on Windows 2000/XP client; Windows 2000 server (recommended) Stores data in relational database, Microsoft SQL or Oracle Various visualization tools; normalization; hierarchical, k-means, k-medians clustering with many different similarity metrics, SOM, PCA, gene shaving. Scripting engine for customizable analysis ArrayDBArrayDB (NHGRI) Html/ linux or Unix Analyzed expression data stored in a relational database a software suite that provides an interactive user interface for the mining and analysis of microarray gene expression data.

Database systems BASE (BioArray Software Environment)BASE (BioArray Software Environment) Department of Oncology, Lund University Linux server, MySQL, web client Manages biomaterial information, raw data and images, and provides integrated and "plug-in"-able normalization, data viewing and analysis tools. The system also has array production LIMS features; support MIAME and MAGE-ML Rosetta Resolver Rosetta Resolver (Rosetta Biosoftware) JAVA/ UNIX with Oracle relational database The Rosetta Resolver system combines advanced analysis software, a high-capacity database, and high-performance server framework in one enterprise-wide tool.

Database systems Stanford Microarray Database (SMD) packageStanford Microarray Database (SMD) package (Stanford University) Oracle server; web server; UNIX with Perl support SMD stores raw and normalized data from microarray experiments, as well as their corresponding image files. In addition, SMD provides interfaces for data retrieval, analysis and visualization. Longhorn Array DatabaseLonghorn Array Database (Institute for Cellular and Molecular Biology, University of Texas at Austin) Linux and PostgreSQL The Longhorn Array Database (LAD) is a MIAME compliant microarray database. It is a fully open source version of the Stanford Microarray Database (SMD)

Rosetta Resolver Excellent database But requires dedicated staff to maintain Ideal for institutions and big companies Who are the only ones able to afford it Includes a good set of statistical tools But it isn’t very transparent GUI user-friendly(ish) Flexible advanced statistics available as visual scripts and R implementation However this requires deep knowledge of the DB structure and some programming skills Compatible with multitude of data formats But hard to get info out of the system (no MIAME yet)

Statistical Analysis and Data Mining Basic output of a microarray experiment is a list of genes differentially transcribed. This can be obtained easily (Excel) from the image processing. However the list is arbitrary: fold-change values are arbitrarily chosen and there is no measure of the significance of the observed difference: to do science we need statistics Many packages like Acuity, BASE and Rosetta Resolver combine database and statistical analysis tools, but there are also many other programs exclusively devoted to the statistical analysis of microarray experiments: e.html

Statistical analysis and Data mining software GeneSpring (Silicon Genetics) Analyze various array types, scatter plot, cluster analysis, PCA, SOM, statistic tools, 2D, 3D plottingGeneSpring ( J-Express (MolMine) Hierarchical clustering, K-means particional clustering, Principal component anlaysis, Self-organizing maps, Profile similarity search, Normalization and filtering, Raw data import, Project organization. Free for academicsJ-Express BioConductor, an open source software project providing infrastructure in terms of design and software for analysing genomic data, some form of graphical user interface for selected libraries. For other microarray related R packages: SpotFire (Spotfire) Hierarchical, bi-directional hierarchical and K-means cluster analysis, PCA, profile search, coincidence testing, normalization, a number of interactive plots for visualization of data, access GATC databasesSpotFire

Basic plots and tables

Classification tasks for microarrays Classification of SAMPLES Generate gene expression profiles that can (i) discriminate between different known cell types or conditions, e.g. between tumor and normal tissue, (ii) identify different and previously unknown cell types or conditions, e.g. new subclasses of an existing class of tumors. Classification of GENES (i) Assign an unknown cDNA sequence to one of a set of knowngene classes. (ii) Partition a set of genes into new (unknown) functional classes on the basis of their expression patterns across a number of samples. Discriminant analysis: CLASSES KNOWN Cluster analysis: CLASSES NOT KNOWN

Cluster analysis Grouping a collection of objects into subsets or “clusters”, such that those within each cluster are more closely related to one another than objects assigned to different clusters. Two ingredients are needed to group objects: –Distance measurement –Clustering algorithm Clustering columns: grouping similar samples Clustering rows: grouping similarly expressed genes

Clustering of genes Genes with similar patterns of expression (synexpression groups) cluster together. Synexpression groups may be functional groups (this is a hypothesis that always has to be tested). Iyer et al., Science 1999

Clustering of samples Provided enough number of samples, functional relationships might be found Golub et al.

Discriminant analysis

Useful links Comprehensive recopilation of information on microarray software Catalogue of microarray analysis softwarehttps:// Stanford Microarray Database Software and Toolshttp://genome-www5.stanford.edu/resources/restech.shtml The Institute for Genomic Research Microarray Softwarehttp://