Analysis and Management of Microarray Data Dr G. P. S. Raghava.

Slides:



Advertisements
Similar presentations
Analysis of High-Throughput Screening Data C371 Fall 2004.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Outlines Background & motivation Algorithms overview
A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.
Microarray Data Analysis Day 2
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Microarray Data Analysis Stuart M. Brown NYU School of Medicine.
Microarrays Dr Peter Smooker,
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
Microarray Data Preprocessing and Clustering Analysis
Bio277 Lab 2: Clustering and Classification of Microarray Data Jess Mar Department of Biostatistics Quackenbush Lab DFCI
. Differentially Expressed Genes, Class Discovery & Classification.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Introduction to Bioinformatics - Tutorial no. 12
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Gene expression profiling identifies molecular subtypes of gliomas
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Data Mining Chun-Hung Chou
Whole Genome Expression Analysis
2007 GeneSpring MS GeneSpring for Metabolite BioMarker Analysis using Mass Spectrometry data Agilent Q-TOF VIP Visit Jan 16-17, 2007 Santa Clara, CA Thon.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Molecular Diagnosis Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
CDNA Microarrays MB206.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
More on Microarrays Chitta Baral Arizona State University.
Analysis of Microarray Data Analysis of images Preprocessing of gene expression data Normalization of data –Subtraction of Background Noise –Global/local.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Finish up array applications Move on to proteomics Protein microarrays.
Microarrays.
Dr Paul Lewis Lecturer in Bioinformatics Lecturer in Bioinformatics Cardiff University Cardiff University Biostatistics & Bioinformatics Unit Biostatistics.
Microarray - Leukemia vs. normal GeneChip System.
Gene expression analysis
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Analysis and Management of Microarray Data Previous Workshops –Computer Aided Drug Design –Public Domain Resources in Biology –Application of Computer.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Brad Windle, Ph.D Unsupervised Learning and Microarrays Web Site: Link to Courses and.
Data Mining and Decision Support
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.
Other uses of DNA microarrays
DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.
Data Mining ICCM
FINAL PROJECT- Key dates
Gene Expression Analysis
Molecular Classification of Cancer
Computational Diagnostics
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
Microarray Data Analysis
Presentation transcript:

Analysis and Management of Microarray Data Dr G. P. S. Raghava

Major Applications Identification of differentially expressed genes in diseased tissues (in presence of drug) Identification of differentially expressed genes in diseased tissues (in presence of drug) Classification of differentially expressed (genes) or clustering/ grouping of genes having similar behaviour in different conditions Classification of differentially expressed (genes) or clustering/ grouping of genes having similar behaviour in different conditions Use expression profile of known disease to diagnosis and classify of unknown genes Use expression profile of known disease to diagnosis and classify of unknown genes

Management of Microarray Data n Magnitude of Data –Experiments n genes in human n 320 cell types n 2000 compunds n 3 times points n 2 concentrations n 2 replicates –Data Volume n 4*10 11 data-points n = 1 petaB of Data

Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene expression matrix

Management of Microarray Data Major Issues n Large volume of microarray data in last few years –Storage and efficient access –Comparison and integration of data n Problem of data access and exchange –Data scattered around Internet –Supplementary material of publications –Difficult for user to access relivent data n Problems with existing databases –Diverse purpose –Developed for specific purpose

Management of Microarray Data n Specific Database –Platform (eg.Stanford MA Database; SMD) –Organism (Yeast MA global viewer) –Project (Life cycle database of Drosophila) n Problem with Supplement and MA databases –Lack of direct access –Quality not checked –No standard format –Incomplete data

n Comprehensive database server to manage massive amount of Microarray Data –Biomaterial Information –Raw Data & Images –Web Tools (normalization; data viewing; analysis) n Run on local servers allows full management and permission to add and view data n Minimum Information about Microarray Experiment (MIAME) n BASE

Public Databases n Gene Expression data is an essential aspect of annotating the genome n Publication and data exchange for microarray experiments n Data mining/Meta-studies n Common data format - XML n MIAME (Minimal Information About a Microarray Experiment)

GEO at the NCBI

Microarray Data Mining Challenges n too few records (samples), usually < 100 n too many columns (genes), usually > 1,000 n Too many columns likely to lead to False positives n for exploration, a large set of all relevant genes is desired n for diagnostics or identification of therapeutic targets, the smallest set of genes is needed n model needs to be explainable to biologists

Analysis of Microarray Data n Analysis of images n Preprocessing of gene expression data n Normalization of data –Subtraction of Background Noise –Global/local Normalization –House keeping genes (or same gene) –Expression in ratio (test/references) in log n Differential Gene expression –Repeats and calculate significance (t-test) –Significance of fold used statistical method n Clustering –Supervised/Unsupervised (Hierarchical, K-means, SOM) n Prediction or Supervised Machine Learnning (SVM)

Low Level Analysis or Preprocessing of gene expression data n Scale Transformation n Normalization and Scaling n Replicate Handling n Missing value Handling n Flat pattern filtering n Pattern standardization

Normalization Techniques n Global normalization –Divide channel value by means n Control spots –Common spots in both channels –House keeping genes –Ratio of intensity of same gene in two channel is used for correction n Iterative linear regression n Parametric nonlinear nomalization –log(CY3/CY5) vs log(CY5)) –Fitted log ratio – observed log ratio n General Non Linear Normalization –LOESS –curve between log(R/G) vs log(sqrt(R.G))

Classification n Task: assign objects to classes (groups) on the basis of measurements made on the objects n Unsupervised: classes unknown, want to discover them from the data (cluster analysis) n Supervised: classes are predefined, want to use a (training or learning) set of labeled objects to form a classifier for classification of future observations

Cluster analysis n Used to find groups of objects when not already known n “Unsupervised learning” n Associated with each object is a set of measurements (the feature vector) n Aim is to identify groups of similar objects on the basis of the observed measurements

Unsupervised Learnning n Hierarchical clustering: merging two branches at the time until all vari-ables(genes) are in one tree. [it does not answer the question of “howmany gene clusters there are”?] n K-mean clustering: assuming there are K clusters. [what if this assumption is incorrect?] n Self Organizing Maps (SOM) –Split all genes into similar sub-groups –Finds its own groups (machine learning) n Principle Component –every gene is a dimension (vector), find a single dimension that best represents the differences in the data n Model-based clustering: the number of clusters is determined dynamically [could be one of the most promising methods]

‘cluster’ unclustered Average linkage hierarchical clustering, melanoma only

Supervised Analysis n Fisher’s linear discriminant analysis n Quadratic discriminant analysis n Logistic regression (a linear discriminant analysis) n Neural networks n Support vector machine

Example: Tumor Classification n Reliable and precise classification essential for successful cancer treatment n Current methods for classifying human malignancies rely on a variety of morphological, clinical and molecular variables n Uncertainties in diagnosis remain; likely that existing classes are heterogeneous n Characterize molecular variations among tumors by monitoring gene expression (microarray) n Hope: that microarrays will lead to more reliable tumor classification (and therefore more appropriate treatments and better outcomes)

Higher Level Microarray data analysis n Clustering and pattern detection n Data mining and visualization n Controls and normalization of results n Statistical validatation n Linkage between gene expression data and gene sequence/function/metabolic pathways databases n Discovery of common sequences in co-regulated genes n Meta-studies using data from multiple experiments

Thanks