Day 2: Session 8: Questions and follow-up…. James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
Day 2 Session 9: Visualization I: Basic techniques James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
What Data Do You Need? Data Needed Format Reason Raw Array Data CEL file (Affymetrix) Submit to GEO Run analysis from beginning Processed, normalized data MAS5 RMA (*.txt, Excel) Present/Absent call Primary data Filtered to remove low expression genes *.txt Excel Used for statistical analysis Differential Expressed Gene List *.gct For GenePattern For DAVID, Reactome, IPA Fleet 2016
“learning a hidden data concept” Clustering Methods Grouping a large dataset into smaller data sets based upon some similarity “learning a hidden data concept” Marker selection-based: Heat maps Unsupervised learning/Class Discovery: Connectivity-based: Hierarchical Centroid-based: K-means Self organizing maps PCA Karimpour-Fard et al. (2015) Hum Genomics. 2015 Oct 28;9:28 Fleet 2016
Hierarchical Clustering Bottom-up or Agglomerative: Start with all obs in one cluster then split recursively going down the hierarchy https://www.youtube.com/watch?v=2z5wwyv0Zk4 Top-down or Divisive: Bottom-up or Agglomerative: Start with each obs = cluster then merge going up the hierarchy Fleet 2016
Heat Maps https://www.youtube.com/watch?v=f5MhPVeA3Z4 Table of values where the value in a cell is color coded Ordered so similar patterns are next to one another Fleet 2016
K-Means Clustering (2) (1) (3) (4) https://www.youtube.com/watch?v=zHbxbb2ye3E k clusters created by linking each observation to a mean k initial means randomly generated (2) (1) (3) (4) Repeat 2 and 3 until convergence Centroid of each k clusters becomes the new mean Fleet 2016
Self Organizing Maps (SOM) Fleet 2016
Principle Components Analysis (PCA) https://www.youtube.com/watch?v=eJ08Gdl5LH0 ID consecutive factors that are uncorrelated/orthogonal to each other Scores Plot: used for interpreting relationships among observations Scree Plot: Defines the number of principle components worth considering Account for less and less variability Principle Component 2 Principle Component 1 http://documents.software.dell.com/Statistics/Textbook/Principal-Components-Factor-Analysis Fleet 2016
a freely available open-source software package developed at the Broad Institute of MIT and Harvard for the analysis of genomic data. http://www.broadinstitute.org/cancer/software/genepattern# Fleet 2016
GCT File Format for GenePattern Fleet 2016
Tuesday BREAK #1
Day 2 Session 10: Functional Assessment I: Gene Set Enrichment Analysis NIH David James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
…a collection of annotated gene sets for use with GSEA software …a collection of annotated gene sets for use with GSEA software. 8 collections curated by the Broad Institute. http://software.broadinstitute.org/gsea/msigdb/ Subramanian et al. (2005) Proc Natl Acad Sci 102:15545. Fleet 2016
Data format for Functional Classification https://david.ncifcrf.gov/ Huang et al. (2009) Nat Protocols 4:44. Fleet 2016
Day 2 Session 11: Special Lecture Sean Davis, PhD Staff Scientist National Cancer Institute http://watson.nci.nih.gov/%7Esdavis/
Day 2 Session 12: Visualization II: Intro to Pathway Analysis > Reactome > Pathview James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
Data Formats for Analysis http://www.reactome.org/ http://onlinelibrary.wiley.com/doi/10.1002/pmic.201100066/full Fleet 2016
Data Format for Visualization http://pathview.uncc.edu/ Data Format for Visualization GeneID DCIS_1 DCIS_2 DCIS_3 10000 -0.3076 -0.1472 -0.0238 10001 0.4159 -0.3348 -0.5131 10002 0.1985 0.0379 0.3419 10003 -0.2316 -0.0966 -0.1047 100048912 -0.0449 -0.0520 0.0364 10004 -0.0876 -0.0503 0.0018 10005 -0.1263 0.4778 -0.1061 10006 0.6503 0.1951 -0.0054 *.txt or *.csv format Fold change or raw data Also available in R as a Bioconductor package http://www.bioconductor.org/packages/release/bioc/html/pathview.html Fleet 2016
Making Sense of Specific Gene Expression Data BioGPS: Tissue Expression Pattern Coremine: Gene Annotation http://biogps.org/#goto=welcome http://www.coremine.com/medical/#search Fleet 2016
Tuesday BREAK #2
Day 2 Session 13: Pathway Analysis continued…. James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
Day 2 Session 14: Challenges of Metabolomic and Proteomic data James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
Levels of Metabolic Research What may happen Genomics ~25,000 genes DNA (Genotype) What can happen Phenotype Transcriptomics ~100,000 transcripts What has happened mRNA Metabolomics Lipidomics ~40,000 Metabolic pathway Substrate Product Challenges: Many metabolites Location Chemistry Proteomics ~ a million proteins What is happening Adapted from Navas-Iglesias et al. (2009) Analytical Chemistry, 28, 393. by C. Ferreira (Purdue)
The Increasing Complexity of Omic Analysis + location, interaction, and chemistry differences Genome ~25,000 genes Transcriptome ~100,000 mRNA + miRNA + lncRNA Proteome ~1,000,000 Alternative promoters, splicing, editing Post-translationalmodification Fleet 2016
Proteomics/Metabolomics Workflow Data Filters Intensity Fold Well Designed Study Interpret Experiment Normalized Vetted Data Network building Sample Preparation Statistical Analysis Sample Analysis and QC analysis Pathway and Geneset Analysis Link Peaks to Databases, Quantification, Normalization Differentially Expressed List Clustering and visualization Raw Reads Fleet 2016
Generalized Proteomic Pipeline Steps 1-4 are distinct from the genomics pipeline Walther and Mann (2010) J Cell Biol 190:491. Fleet 2016
MaxQuant File formats MS-based peptide profile wifi RAW .d mzxml http://www.biochem.mpg.de/227318/MaxQuant File formats wifi RAW .d mzxml Link to peptide database Integrate to proteins Modified from Tiago Sobreira, Purdue Fleet 2016
Proteomic Resources http://www.ebi.ac.uk/pride/archive/ http://www.peptideatlas.org/
http://www.metaboanalyst.ca/faces/home.xhtml
Metabolomics Resources http://metabolomicsworkbench.org/ http://metabolomicssociety.org/resources/metabolomics-databases