Presentation is loading. Please wait.

Presentation is loading. Please wait.

Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.

Similar presentations


Presentation on theme: "Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent."— Presentation transcript:

1 Persistent Systems Pvt. Ltd. http://www.persistent.co.in Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent Systems Private Ltd Pune

2 Persistent Systems Pvt. Ltd. http://www.persistent.co.in Topics 1.Introduction 2.Data Storage and Exchange Standards 3.Analysis (Clustering) 4.Conclusion and References

3 Persistent Systems Pvt. Ltd. http://www.persistent.co.in 1. Introduction Structure Activity Relationship Structural vs. Functional Genomics Principals of Microarray Experiment Applications

4 Persistent Systems Pvt. Ltd. http://www.persistent.co.in Structure Activity Relationship GENES (finite) FUNCTIONS (infinite) PROTEINS EXPERIMENTAL SETUP Functional Genomics OR Confirmation Work Structural Genomics OR Prediction Work

5 Persistent Systems Pvt. Ltd. http://www.persistent.co.in Source:Yale Bioinformatics

6 Persistent Systems Pvt. Ltd. http://www.persistent.co.in Principles of a Microarray Experiment: Hybridization 1.Environment  Functions  Proteins  mRNA  cDNA 2.Different incubations of cells results in up or down regulation of different sets of genes. 3.Microarray provides a medium for matching known and unknown DNA samples based on base-pairing rules and automating the process of identifying the unknowns 4.Set of expressed genes (at mRNA stage) isolated and identified using hybridization on a microarray chip

7 Persistent Systems Pvt. Ltd. http://www.persistent.co.in HTS Using Hybridization Target: cDNA (variables to be detected) Probe: oligos/cDNA (gene templates) + Hybridization Pathways Functional Annotation Analysis of outcome Microarray Chip Samples Targets/LeadsDisease Class. Physiological states

8 Persistent Systems Pvt. Ltd. http://www.persistent.co.in Timeline for drug discovery Discovery (5 yrs) 5000 Gene expression study Pre-Clinical (1 yr) 50 Clinical (6 yrs) 5 Review (2 yrs) 1 Marketed

9 Persistent Systems Pvt. Ltd. http://www.persistent.co.in 2. Data Storage and Exchange Standards Raw and Processed Data Conceptual View of Database Example of ArrayExpress Issues Standardization for Exchange

10 Persistent Systems Pvt. Ltd. http://www.persistent.co.in Raw data – images Red (Cy5) dot – overexpressed or up-regulated Green (Cy3) dot – underexpressed or down-regulated Yellow dot –equally expressed Intensity - “absolute” level red/green - ratio of expression – 2 - 2x overexpressed – 0.5 - 2x underexpressed log 2 ( red/green ) - “log ratio” – 1 2x overexpressed – -1 2x underexpressed cDNA plotted microarray

11 Persistent Systems Pvt. Ltd. http://www.persistent.co.in Microarray Expression Value Representation expression value types primary images composite images e.g., green/red ratios primary spots composite spots primary measurements derived values Source: MGED

12 Persistent Systems Pvt. Ltd. http://www.persistent.co.in Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene expression matrix

13 Persistent Systems Pvt. Ltd. http://www.persistent.co.in

14 DAG Representation of Biomaterials Sample source Primary sample 1 Primary sample 2 Derived sample 1 Labeled extract 1 Extract 1 Derived sample 2 A new state of sample source Extract 2 Labeled extract 2Hybridization labeling extraction treatment Source: MGED

15 Persistent Systems Pvt. Ltd. http://www.persistent.co.in ArrayExpress (MGED) Design Source: MGED

16 Persistent Systems Pvt. Ltd. http://www.persistent.co.in ArrayExpress (MGED) Architecture data submission & Curation database data warehouse application server Web server image server? ArrayExpress Curation pipeline MAML data Source: MGED

17 Persistent Systems Pvt. Ltd. http://www.persistent.co.in Issues in Storage Size of Data –Experiments 100 000 genes, 320 cell types 2000 compounds, 3 time points, 2 concentrations, 2 replicates –Data 8 x 10 11 data-points 1 x 10 15 = 1 petaB of data Others –Raw data are images –lack of standard measurement units for gene expression –lack of standards for sample annotation

18 Persistent Systems Pvt. Ltd. http://www.persistent.co.in Standardization MIAME (Minimum Info About a Microarray Expt) –Experimental design, Array design –Samples, Hybridisations –Measurements, Controls OMG-LSR-DFT –Life Sciences Research, Domain Task Force Gene Expression RFP –EBI (MAML), Rosetta (GEML), NetGenics : submitters Proposed MAGEML (MAML +GEML) –Annotations + data; data stored as a set of external 2D matrices –Data format independent of particular scanner or image analysis software –Sample and treatment can be represented as a Directed Acyclic Graphs –Concept of composite images and composite spots

19 Persistent Systems Pvt. Ltd. http://www.persistent.co.in 3. Data Analysis (Clustering) Normalization Hierarchical Clustering Divisive Clustering Other Methods Visual Tools

20 Persistent Systems Pvt. Ltd. http://www.persistent.co.in Normalization Assumption –Average expression ratio =1 –Amount of mRNA from both the sample is same Total Intensity –Calculate a factor to rescale intensities of all te genes so that total Cy3= total Cy5 Regression Techniques –Adjust the intensities so that Slope of scatter plot of Cy3 vs Cy5 =1 Using ratio statistics –Based on ‘housekeeping genes’ expression a probability density ratio is developed which is used for normalization

21 Persistent Systems Pvt. Ltd. http://www.persistent.co.in

22 Clustering Hierarchical –Single, Complete and Average Linkage Divisive –K-means –Self Organizing Maps (SOM) Others –Principal Component Analysis (PCA) –Supervised Methods

23 Persistent Systems Pvt. Ltd. http://www.persistent.co.in Hierarchical clustering Distance metrics or Similarity Measures –Euclidian, Pearson, distance of slopes etc.. Cost functions –Single Linkage Min distance of any two members (one from each of the two clusters) –Complete Linkage Max distance of any two members (one from each of the two clusters) –Average Linkage UPGMA WPGMA Within Groups –Ward’s Method Join which produces smallest possible error in some of squared errors

24 Persistent Systems Pvt. Ltd. http://www.persistent.co.in

25 Divisive clustering K-means –‘k’ random (or specified) points used to create clusters, average vectors for the clusters then used iteratively –Knowledge of probable no of clusters (k) needed –Used in combination with PCA and hierarchical clustering Self Organizing maps –User defined geometric configurations as partitions –Random vectors generated for each partition and TRAINED till convergence (ANN based) Visualization Methods –Helps in cluster visualization Scatter Plot, Web plot, histogram –May help in clustering itself E.g., SuperGrouper utility of MaxdView

26 Persistent Systems Pvt. Ltd. http://www.persistent.co.in

27 Other Clustering Methods PCA (Principal Component Analysis) –Also called SVD (Singular Value Decomposition) –Reduces dimensionality of gene expression space –Finds best view that helps separate data into groups Supervised Methods –SVM (Support Vector Machine) –Previous knowledge of which genes expected to cluster is used for training –Binary classifier uses ‘feature space’ and ‘kernel function’ to define a optimal ‘hyperplane’ –Also used for classification of samples- ‘expression fingerprinting’ for disease classification

28 Persistent Systems Pvt. Ltd. http://www.persistent.co.in

29 4. Conclusion and References Microarrays makes HTS with hybridization possible No single standard unit for measuring expression levels Handling and interpretation not yet exact Assumptions: Elements in cluster must share some commonality Classification depends on method used for clustering, normalization, distance function No “correct” way of classification, “biological understanding” is the ultimate guide Provides extension to existing knowledge (e.g., classifying a novel gene into a known pathway)

30 Persistent Systems Pvt. Ltd. http://www.persistent.co.in Software Databases –Public repositories: GEO (NCBI), GeneX (NCGR), ArrayExpress (EBI) –In-house databases Stanford, MIT, University of Pennsylvania, –Organism specific databases Mouse Genome Informatics Database –Proprietary databases – Gene Logic, NCI, Synergy (NetGenics), Genomics Knowledge Platform (Incyte) Analysis Tools –Public Domain maxdView (University of Manchester) CyberT, RCuster interfaces of GeneX –Proprietary Spotfire, Xpression NTI (Informaxinc)

31 Persistent Systems Pvt. Ltd. http://www.persistent.co.in References Microarray Gene Expression Database Group –http://www.mged.orghttp://www.mged.org National Center for Genomic Research –http://genex.ncgr.orghttp://genex.ncgr.org University of Manchester, Bioinformatics Group –http://bioinf.man.ac.uk/microarray/resources.htmlhttp://bioinf.man.ac.uk/microarray/resources.html Nature Reviews Genetics –http://www.nature.com/nrg/http://www.nature.com/nrg/


Download ppt "Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent."

Similar presentations


Ads by Google