Presentation is loading. Please wait.

Presentation is loading. Please wait.

March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory.

Similar presentations


Presentation on theme: "March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory."— Presentation transcript:

1 March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

2 March 4, 20102 Transcriptomics & Gene Expression Simultaneous measurement of transcription for the entire genome Useful for broad range of biological questions DNA mRNA Proteins Ribosome Transcription Translation

3 March 4, 20103 Outline Technologies & Specific Concerns –cDNA microarrays (2-color & 1-color arrays) –RNA-seq Normalization visualizations Full data displays Dimensionality reduction Sequence-order displays Comparative visualization Future Directions

4 March 4, 20104 Technology: 2-color cDNA Microarrays Spot slide with known sequences Add mRNA to slide for Hybridization Scan hybridized array reference mRNAtest mRNA add green dye add red dye hybridize A1.5 B0.8 C-1.2 D0.1 A C B D A C B D A C B D

5 March 4, 20105 Technology: 2-color cDNA Microarrays

6 March 4, 20106 Technology: RNA-seq Image from WikiMedia

7 March 4, 20107 Normalization: MA-plot Need to account for intensity bias between channels (red/green, or mult. 1-color) MA-plot (also called RI-plot) shows relationship between ratio and intensity

8 March 4, 20108 Normalization: Box-Whisker Quantile Quantile normalization often used to adjust for between chip variance Box-Whisker plots typically used to visualize the process

9 March 4, 20109 Full Data Displays Techniques to show all of the data at once Heat Maps –Displays numerical values as colors –Good to see all data intuitively –Requires clustering to see patterns Parallel Coordinates –Line plots of high-dimensional data –Easy to see/select trends or patterns –Esp. good for course data (time, drug, etc.)

10 March 4, 201010 Heat Maps Under-ExpressedOver-Expressed ClusterRasterize … … 0+3-3

11 March 4, 201011 Heat Maps: Stats Clustering important to see patterns –Hierarchical, K-means, SOM, etc… –Choice of distance metric in addition to method Match the visualization mapping to the statistics used for analysis –Coloration based on actual numbers appropriate for Euclidian distance measures –Centered or normalized measures should use corresponding colorings

12 March 4, 201012 Heat Maps: Distance Metrics Euclidean Distance Pearson Correlation Spearman Correlation

13 March 4, 201013 Heat Maps: Stats Data clustered using a rank-based statistic lowest valuehighest value

14 March 4, 201014 Heat Maps: Overview + Detail Java TreeView, Saldanha et al. Data from Spellman et al., 1998

15 March 4, 201015 Parallel Coordinates View expression vectors as lines –X-axis = conditions –Y-axis = value Time Searcher, Hochheiser et al.

16 March 4, 201016 Parallel Coordinates Time Searcher, Hochheiser et al. Selection and Interaction methods can answer specific questions Brushing techniques to select patterns Cluttered displays for large datasets, limited number of conditions effectively shown

17 March 4, 201017 Dimensionality Reduction Project data from large, high dimensional space to a smaller space (usually 2 or 3 D) Several techniques: –SVD & PCA –Multidimensional scaling Once projected into lower dimension, use standard 2D (or 3D) techniques

18 March 4, 201018 Dimensionality Reduction

19 March 4, 201019 Dimensionality Reduction: SVD … … Transform original data vectors into an orthogonal basis that captures decreasing amounts of variation

20 March 4, 201020 Dimensionality Reduction: SVD SVD

21 March 4, 201021 SVD Example G1 S G2 M M/G1 Legend GeneVAnD, Hibbs et al. Data from Spellman et al., 1998

22 March 4, 201022 Sequence-based Visualization View data in chromosomal order –Copy number variation & aneuploidies common in cancers & other disorders –Competitive Genomic Hybridization (CGH) –mRNA sequencing (RNA-seq) –Borrows concepts from genome browsers

23 March 4, 201023 Sequence-based: CGH Karyoscope plots Java TreeView, Saldanha et al.

24 March 4, 201024 Sequence-based: RNA-seq IGV, http://www.broadinstitute.org/igv

25 March 4, 201025 Comparative Visualization Using multiple simultaneous complementary views of data Each scheme emphasizes different aspects – use multiple to show overall picture Show multiple, related datasets to identify common and unique patterns

26 March 4, 201026 Comparative Visualization: Single Dataset MeV, Saeed et al.

27 March 4, 201027 Comparative Visualization: Single Dataset Spotfire GeneSpring

28 March 4, 201028 Comparative Visualization: Multi- dataset Dendrogram Heat Map Overview HIDRA Data from Spellman et al., 1998 Hibbs et al.

29 March 4, 201029 Comparative Visualization: Multi- dataset HIDRA Selection Synchronized Details Data from Spellman et al., 1998 Hibbs et al.

30 March 4, 201030 Comparative Visualization: Multi- dataset HIDRA Selection Data from Spellman et al., 1998 Hibbs et al.

31 March 4, 201031 Summary & Tools R & bioconductor Java TreeView (Saldanha, 2004) Time Searcher (Hochheiser et al., 2003) Integrative Genomics Viewer (IGV; www.broadinstitute.org/igv) TIGR’s MultiExperiment Viewer (MeV; Saeed et al., 2003) HIDRA (Hibbs et al., 2007)

32 March 4, 201032 Trends & Future Directions Emphasis on usability and audience –If a “wet bench” biologist can’t use it… Incorporate common statistical analysis techniques with visualizations –e.g. differential expression tests, GO enrichments, etc. Isoforms and Splice variants New user interaction schemes –e.g. multi-touch interfaces, large-format displays Low level “systems analysis” –linking together multiple types of data into unified displays

33 March 4, 201033 Acknowledgements Hibbs Lab –Karen Dowell –Tongjun Gu –Al Simons Olga Troyanskaya Lab –Patrick Bradley –Maria Chikina –Yuanfang Guan Chad Myers David Hess Florian Markowetz Edo Airoldi Curtis Huttenhower Kai Li Lab –Grant Wallace Amy Caudy Maitreya Dunham Botstein, Kruglyak, Broach, Rose labs Kyuson Yun Carol Bult

34 March 4, 201034 The Center for Genome Dynamics at The Jackson Laboratory www.genomedynamics.org Investigators use computation, mathematical modeling and statistics, with a shared focus on the genetics of complex traits Requires PhD (or equivalent) in quantitative field such as computer science, statistics, applied mathematics or in biological sciences with strong quantitative background Programming experience recommended The Jackson Laboratory was voted #2 in a poll of postdocs conducted by The Scientist in 2009 and is an EOE/AA employer Postdoctoral Opportunities in Computational & Systems Biology


Download ppt "March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory."

Similar presentations


Ads by Google