Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.

Slides:



Advertisements
Similar presentations
Basic Gene Expression Data Analysis--Clustering
Advertisements

Outlines Background & motivation Algorithms overview
Microarray Data Analysis Day 2
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Microarrays Dr Peter Smooker,
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Microarray Data Preprocessing and Clustering Analysis
Introduction to Computational Biology Topics. Molecular Data Definition of data  DNA/RNA  Protein  Expression Basics of programming in Matlab  Vectors.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
Microarrays Technology behind microarrays Data analysis approaches
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Gene Expression Analysis using Microarrays Anne R. Haake, Ph.D.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Analysis of microarray data
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Copyright 2000, Media Cybernetics, L.P. Array-Pro ® Analyzer Software.
Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.
COT 6930 HPC & Bioinformatics Microarray Data Analysis
Whole Genome Expression Analysis
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
From motif search to gene expression analysis
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Microarray data analysis
Microarrays.
Microarray - Leukemia vs. normal GeneChip System.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Gene expression analysis
A Short Overview of Microarrays Tex Thompson Spring 2005.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Analysis and Management of Microarray Data Previous Workshops –Computer Aided Drug Design –Public Domain Resources in Biology –Application of Computer.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray: An Introduction
ArrayExpress Ugis Sarkans EMBL - EBI
Expression Data Integration Microarray Gene Expression Database Meeting Sunday 14th November 1999.
DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.
FINAL PROJECT- Key dates
Gene Expression Analysis
Microarray - Leukemia vs. normal GeneChip System.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
DNA Chip Data Interpretation Tools: Genmapp & Dragon View
Dimension reduction : PCA and Clustering
Microarray Data Analysis
Presentation transcript:

Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent Systems Private Ltd Pune

Persistent Systems Pvt. Ltd. Topics 1.Introduction 2.Data Storage and Exchange Standards 3.Analysis (Clustering) 4.Conclusion and References

Persistent Systems Pvt. Ltd Introduction Structure Activity Relationship Structural vs. Functional Genomics Principals of Microarray Experiment Applications

Persistent Systems Pvt. Ltd. Structure Activity Relationship GENES (finite) FUNCTIONS (infinite) PROTEINS EXPERIMENTAL SETUP Functional Genomics OR Confirmation Work Structural Genomics OR Prediction Work

Persistent Systems Pvt. Ltd. Source:Yale Bioinformatics

Persistent Systems Pvt. Ltd. Principles of a Microarray Experiment: Hybridization 1.Environment  Functions  Proteins  mRNA  cDNA 2.Different incubations of cells results in up or down regulation of different sets of genes. 3.Microarray provides a medium for matching known and unknown DNA samples based on base-pairing rules and automating the process of identifying the unknowns 4.Set of expressed genes (at mRNA stage) isolated and identified using hybridization on a microarray chip

Persistent Systems Pvt. Ltd. HTS Using Hybridization Target: cDNA (variables to be detected) Probe: oligos/cDNA (gene templates) + Hybridization Pathways Functional Annotation Analysis of outcome Microarray Chip Samples Targets/LeadsDisease Class. Physiological states

Persistent Systems Pvt. Ltd. Timeline for drug discovery Discovery (5 yrs) 5000 Gene expression study Pre-Clinical (1 yr) 50 Clinical (6 yrs) 5 Review (2 yrs) 1 Marketed

Persistent Systems Pvt. Ltd Data Storage and Exchange Standards Raw and Processed Data Conceptual View of Database Example of ArrayExpress Issues Standardization for Exchange

Persistent Systems Pvt. Ltd. Raw data – images Red (Cy5) dot – overexpressed or up-regulated Green (Cy3) dot – underexpressed or down-regulated Yellow dot –equally expressed Intensity - “absolute” level red/green - ratio of expression – 2 - 2x overexpressed – x underexpressed log 2 ( red/green ) - “log ratio” – 1 2x overexpressed – -1 2x underexpressed cDNA plotted microarray

Persistent Systems Pvt. Ltd. Microarray Expression Value Representation expression value types primary images composite images e.g., green/red ratios primary spots composite spots primary measurements derived values Source: MGED

Persistent Systems Pvt. Ltd. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene expression matrix

Persistent Systems Pvt. Ltd.

DAG Representation of Biomaterials Sample source Primary sample 1 Primary sample 2 Derived sample 1 Labeled extract 1 Extract 1 Derived sample 2 A new state of sample source Extract 2 Labeled extract 2Hybridization labeling extraction treatment Source: MGED

Persistent Systems Pvt. Ltd. ArrayExpress (MGED) Design Source: MGED

Persistent Systems Pvt. Ltd. ArrayExpress (MGED) Architecture data submission & Curation database data warehouse application server Web server image server? ArrayExpress Curation pipeline MAML data Source: MGED

Persistent Systems Pvt. Ltd. Issues in Storage Size of Data –Experiments genes, 320 cell types 2000 compounds, 3 time points, 2 concentrations, 2 replicates –Data 8 x data-points 1 x = 1 petaB of data Others –Raw data are images –lack of standard measurement units for gene expression –lack of standards for sample annotation

Persistent Systems Pvt. Ltd. Standardization MIAME (Minimum Info About a Microarray Expt) –Experimental design, Array design –Samples, Hybridisations –Measurements, Controls OMG-LSR-DFT –Life Sciences Research, Domain Task Force Gene Expression RFP –EBI (MAML), Rosetta (GEML), NetGenics : submitters Proposed MAGEML (MAML +GEML) –Annotations + data; data stored as a set of external 2D matrices –Data format independent of particular scanner or image analysis software –Sample and treatment can be represented as a Directed Acyclic Graphs –Concept of composite images and composite spots

Persistent Systems Pvt. Ltd Data Analysis (Clustering) Normalization Hierarchical Clustering Divisive Clustering Other Methods Visual Tools

Persistent Systems Pvt. Ltd. Normalization Assumption –Average expression ratio =1 –Amount of mRNA from both the sample is same Total Intensity –Calculate a factor to rescale intensities of all te genes so that total Cy3= total Cy5 Regression Techniques –Adjust the intensities so that Slope of scatter plot of Cy3 vs Cy5 =1 Using ratio statistics –Based on ‘housekeeping genes’ expression a probability density ratio is developed which is used for normalization

Persistent Systems Pvt. Ltd.

Clustering Hierarchical –Single, Complete and Average Linkage Divisive –K-means –Self Organizing Maps (SOM) Others –Principal Component Analysis (PCA) –Supervised Methods

Persistent Systems Pvt. Ltd. Hierarchical clustering Distance metrics or Similarity Measures –Euclidian, Pearson, distance of slopes etc.. Cost functions –Single Linkage Min distance of any two members (one from each of the two clusters) –Complete Linkage Max distance of any two members (one from each of the two clusters) –Average Linkage UPGMA WPGMA Within Groups –Ward’s Method Join which produces smallest possible error in some of squared errors

Persistent Systems Pvt. Ltd.

Divisive clustering K-means –‘k’ random (or specified) points used to create clusters, average vectors for the clusters then used iteratively –Knowledge of probable no of clusters (k) needed –Used in combination with PCA and hierarchical clustering Self Organizing maps –User defined geometric configurations as partitions –Random vectors generated for each partition and TRAINED till convergence (ANN based) Visualization Methods –Helps in cluster visualization Scatter Plot, Web plot, histogram –May help in clustering itself E.g., SuperGrouper utility of MaxdView

Persistent Systems Pvt. Ltd.

Other Clustering Methods PCA (Principal Component Analysis) –Also called SVD (Singular Value Decomposition) –Reduces dimensionality of gene expression space –Finds best view that helps separate data into groups Supervised Methods –SVM (Support Vector Machine) –Previous knowledge of which genes expected to cluster is used for training –Binary classifier uses ‘feature space’ and ‘kernel function’ to define a optimal ‘hyperplane’ –Also used for classification of samples- ‘expression fingerprinting’ for disease classification

Persistent Systems Pvt. Ltd.

4. Conclusion and References Microarrays makes HTS with hybridization possible No single standard unit for measuring expression levels Handling and interpretation not yet exact Assumptions: Elements in cluster must share some commonality Classification depends on method used for clustering, normalization, distance function No “correct” way of classification, “biological understanding” is the ultimate guide Provides extension to existing knowledge (e.g., classifying a novel gene into a known pathway)

Persistent Systems Pvt. Ltd. Software Databases –Public repositories: GEO (NCBI), GeneX (NCGR), ArrayExpress (EBI) –In-house databases Stanford, MIT, University of Pennsylvania, –Organism specific databases Mouse Genome Informatics Database –Proprietary databases – Gene Logic, NCI, Synergy (NetGenics), Genomics Knowledge Platform (Incyte) Analysis Tools –Public Domain maxdView (University of Manchester) CyberT, RCuster interfaces of GeneX –Proprietary Spotfire, Xpression NTI (Informaxinc)

Persistent Systems Pvt. Ltd. References Microarray Gene Expression Database Group – National Center for Genomic Research – University of Manchester, Bioinformatics Group – Nature Reviews Genetics –