March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory.

Slides:



Advertisements
Similar presentations
Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
Advertisements

RNA-seq: the future of transcriptomics ……. ?
More Microarray Analysis: Unsupervised Approaches Matt Hibbs Troyanskaya Lab.
Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis Gabriel Eichler Boston University Some slides adapted from: MeV documentation.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Gene expression analysis summary Where are we now?
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
Microarray Data Preprocessing and Clustering Analysis
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
Microarrays Technology behind microarrays Data analysis approaches
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.
Cluster analysis  Function  Places genes with similar expression patterns in groups.  Sometimes genes of unknown function will be grouped with genes.
Query-driven search methods for large microarray databases Matt Hibbs Troyanskaya Laboratory for BioInformatics and Functional Genomics.
Fuzzy K means.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Analysis of microarray data
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Department of Biomedical Informatics Biomedical Data Visualization Kun Huang Department of Biomedical Informatics OSUCCC Biomedical Informatics Shared.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
1 Machine Learning for Functional Genomics II Matt Hibbs
From motif search to gene expression analysis
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
Data Type 1: Microarrays
RNAseq analyses -- methods
Bioinformatics Brad Windle Ph# Web Site:
Visualization and analysis of microarray and gene ontology data with treemaps Eric H Baehrecke, Niem Dang, Ketan Babaria and Ben Shneiderman Presenter:
Microarrays.
Literature reviews revised is due4/11 (Friday) turn in together: revised paper (with bibliography) and peer review and 1st draft.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Lecture 7. Functional Genomics: Gene Expression Profiling using
1 Machine Learning for Functional Genomics I Matt Hibbs
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
High-throughput omic datasets and clustering
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Analyzing Expression Data: Clustering and Stats Chapter 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
Gene Expression Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Functional Genomics Carol Bult, Ph.D. Course coordinator The Jackson Laboratory Winter/Spring 2011 Keith Hutchison, Ph.D. Course co-coordinator.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
Microarray: An Introduction
Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Functional Genomics in Evolutionary Research
Gene Chips.
DNA Chip Data Interpretation Tools: Genmapp & Dragon View
Dimension reduction : PCA and Clustering
Gene Expression Analysis
Cancer Cell Line Encyclopedia
Presentation transcript:

March 4, Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, Transcriptomics & Gene Expression Simultaneous measurement of transcription for the entire genome Useful for broad range of biological questions DNA mRNA Proteins Ribosome Transcription Translation

March 4, Outline Technologies & Specific Concerns –cDNA microarrays (2-color & 1-color arrays) –RNA-seq Normalization visualizations Full data displays Dimensionality reduction Sequence-order displays Comparative visualization Future Directions

March 4, Technology: 2-color cDNA Microarrays Spot slide with known sequences Add mRNA to slide for Hybridization Scan hybridized array reference mRNAtest mRNA add green dye add red dye hybridize A1.5 B0.8 C-1.2 D0.1 A C B D A C B D A C B D

March 4, Technology: 2-color cDNA Microarrays

March 4, Technology: RNA-seq Image from WikiMedia

March 4, Normalization: MA-plot Need to account for intensity bias between channels (red/green, or mult. 1-color) MA-plot (also called RI-plot) shows relationship between ratio and intensity

March 4, Normalization: Box-Whisker Quantile Quantile normalization often used to adjust for between chip variance Box-Whisker plots typically used to visualize the process

March 4, Full Data Displays Techniques to show all of the data at once Heat Maps –Displays numerical values as colors –Good to see all data intuitively –Requires clustering to see patterns Parallel Coordinates –Line plots of high-dimensional data –Easy to see/select trends or patterns –Esp. good for course data (time, drug, etc.)

March 4, Heat Maps Under-ExpressedOver-Expressed ClusterRasterize … … 0+3-3

March 4, Heat Maps: Stats Clustering important to see patterns –Hierarchical, K-means, SOM, etc… –Choice of distance metric in addition to method Match the visualization mapping to the statistics used for analysis –Coloration based on actual numbers appropriate for Euclidian distance measures –Centered or normalized measures should use corresponding colorings

March 4, Heat Maps: Distance Metrics Euclidean Distance Pearson Correlation Spearman Correlation

March 4, Heat Maps: Stats Data clustered using a rank-based statistic lowest valuehighest value

March 4, Heat Maps: Overview + Detail Java TreeView, Saldanha et al. Data from Spellman et al., 1998

March 4, Parallel Coordinates View expression vectors as lines –X-axis = conditions –Y-axis = value Time Searcher, Hochheiser et al.

March 4, Parallel Coordinates Time Searcher, Hochheiser et al. Selection and Interaction methods can answer specific questions Brushing techniques to select patterns Cluttered displays for large datasets, limited number of conditions effectively shown

March 4, Dimensionality Reduction Project data from large, high dimensional space to a smaller space (usually 2 or 3 D) Several techniques: –SVD & PCA –Multidimensional scaling Once projected into lower dimension, use standard 2D (or 3D) techniques

March 4, Dimensionality Reduction

March 4, Dimensionality Reduction: SVD … … Transform original data vectors into an orthogonal basis that captures decreasing amounts of variation

March 4, Dimensionality Reduction: SVD SVD

March 4, SVD Example G1 S G2 M M/G1 Legend GeneVAnD, Hibbs et al. Data from Spellman et al., 1998

March 4, Sequence-based Visualization View data in chromosomal order –Copy number variation & aneuploidies common in cancers & other disorders –Competitive Genomic Hybridization (CGH) –mRNA sequencing (RNA-seq) –Borrows concepts from genome browsers

March 4, Sequence-based: CGH Karyoscope plots Java TreeView, Saldanha et al.

March 4, Sequence-based: RNA-seq IGV,

March 4, Comparative Visualization Using multiple simultaneous complementary views of data Each scheme emphasizes different aspects – use multiple to show overall picture Show multiple, related datasets to identify common and unique patterns

March 4, Comparative Visualization: Single Dataset MeV, Saeed et al.

March 4, Comparative Visualization: Single Dataset Spotfire GeneSpring

March 4, Comparative Visualization: Multi- dataset Dendrogram Heat Map Overview HIDRA Data from Spellman et al., 1998 Hibbs et al.

March 4, Comparative Visualization: Multi- dataset HIDRA Selection Synchronized Details Data from Spellman et al., 1998 Hibbs et al.

March 4, Comparative Visualization: Multi- dataset HIDRA Selection Data from Spellman et al., 1998 Hibbs et al.

March 4, Summary & Tools R & bioconductor Java TreeView (Saldanha, 2004) Time Searcher (Hochheiser et al., 2003) Integrative Genomics Viewer (IGV; TIGR’s MultiExperiment Viewer (MeV; Saeed et al., 2003) HIDRA (Hibbs et al., 2007)

March 4, Trends & Future Directions Emphasis on usability and audience –If a “wet bench” biologist can’t use it… Incorporate common statistical analysis techniques with visualizations –e.g. differential expression tests, GO enrichments, etc. Isoforms and Splice variants New user interaction schemes –e.g. multi-touch interfaces, large-format displays Low level “systems analysis” –linking together multiple types of data into unified displays

March 4, Acknowledgements Hibbs Lab –Karen Dowell –Tongjun Gu –Al Simons Olga Troyanskaya Lab –Patrick Bradley –Maria Chikina –Yuanfang Guan Chad Myers David Hess Florian Markowetz Edo Airoldi Curtis Huttenhower Kai Li Lab –Grant Wallace Amy Caudy Maitreya Dunham Botstein, Kruglyak, Broach, Rose labs Kyuson Yun Carol Bult

March 4, The Center for Genome Dynamics at The Jackson Laboratory Investigators use computation, mathematical modeling and statistics, with a shared focus on the genetics of complex traits Requires PhD (or equivalent) in quantitative field such as computer science, statistics, applied mathematics or in biological sciences with strong quantitative background Programming experience recommended The Jackson Laboratory was voted #2 in a poll of postdocs conducted by The Scientist in 2009 and is an EOE/AA employer Postdoctoral Opportunities in Computational & Systems Biology