Download presentation
Presentation is loading. Please wait.
Published byClaire Anne-Marie Lepage Modified over 6 years ago
1
Day 2: Session 8: Questions and follow-up…. James C. Fleet, PhD
Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
2
Day 2 Session 9: Visualization I: Basic techniques James C. Fleet, PhD
Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
3
What Data Do You Need? Data Needed Format Reason Raw Array Data
CEL file (Affymetrix) Submit to GEO Run analysis from beginning Processed, normalized data MAS5 RMA (*.txt, Excel) Present/Absent call Primary data Filtered to remove low expression genes *.txt Excel Used for statistical analysis Differential Expressed Gene List *.gct For GenePattern For DAVID, Reactome, IPA Fleet 2016
4
“learning a hidden data concept”
Clustering Methods Grouping a large dataset into smaller data sets based upon some similarity “learning a hidden data concept” Marker selection-based: Heat maps Unsupervised learning/Class Discovery: Connectivity-based: Hierarchical Centroid-based: K-means Self organizing maps PCA Karimpour-Fard et al. (2015) Hum Genomics Oct 28;9:28 Fleet 2016
5
Hierarchical Clustering Bottom-up or Agglomerative:
Start with all obs in one cluster then split recursively going down the hierarchy Top-down or Divisive: Bottom-up or Agglomerative: Start with each obs = cluster then merge going up the hierarchy Fleet 2016
6
Heat Maps https://www.youtube.com/watch?v=f5MhPVeA3Z4
Table of values where the value in a cell is color coded Ordered so similar patterns are next to one another Fleet 2016
7
K-Means Clustering (2) (1) (3) (4)
k clusters created by linking each observation to a mean k initial means randomly generated (2) (1) (3) (4) Repeat 2 and 3 until convergence Centroid of each k clusters becomes the new mean Fleet 2016
8
Self Organizing Maps (SOM)
Fleet 2016
9
Principle Components Analysis (PCA)
ID consecutive factors that are uncorrelated/orthogonal to each other Scores Plot: used for interpreting relationships among observations Scree Plot: Defines the number of principle components worth considering Account for less and less variability Principle Component 2 Principle Component 1 Fleet 2016
10
a freely available open-source software package developed at the Broad Institute of MIT and Harvard for the analysis of genomic data. Fleet 2016
11
GCT File Format for GenePattern
Fleet 2016
12
Tuesday BREAK #1
13
Day 2 Session 10: Functional Assessment I:
Gene Set Enrichment Analysis NIH David James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
14
…a collection of annotated gene sets for use with GSEA software
…a collection of annotated gene sets for use with GSEA software. 8 collections curated by the Broad Institute. Subramanian et al. (2005) Proc Natl Acad Sci 102:15545. Fleet 2016
15
Data format for Functional Classification
Huang et al. (2009) Nat Protocols 4:44. Fleet 2016
16
Day 2 Session 11: Special Lecture Sean Davis, PhD Staff Scientist
National Cancer Institute
17
Day 2 Session 12: Visualization II: Intro to Pathway Analysis
> Reactome > Pathview James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
18
Data Formats for Analysis
Fleet 2016
19
Data Format for Visualization
Data Format for Visualization GeneID DCIS_1 DCIS_2 DCIS_3 10000 10001 0.4159 10002 0.1985 0.0379 0.3419 10003 0.0364 10004 0.0018 10005 0.4778 10006 0.6503 0.1951 *.txt or *.csv format Fold change or raw data Also available in R as a Bioconductor package Fleet 2016
20
Making Sense of Specific Gene Expression Data
BioGPS: Tissue Expression Pattern Coremine: Gene Annotation Fleet 2016
21
Tuesday BREAK #2
22
Day 2 Session 13: Pathway Analysis continued…. James C. Fleet, PhD
Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
23
Day 2 Session 14: Challenges of Metabolomic and Proteomic data
James C. Fleet, PhD Distinguished Professor Department of Nutrition Science Pete Pascuzzi, PhD Assistant Professor Purdue Libraries
24
Levels of Metabolic Research
What may happen Genomics ~25,000 genes DNA (Genotype) What can happen Phenotype Transcriptomics ~100,000 transcripts What has happened mRNA Metabolomics Lipidomics ~40,000 Metabolic pathway Substrate Product Challenges: Many metabolites Location Chemistry Proteomics ~ a million proteins What is happening Adapted from Navas-Iglesias et al. (2009) Analytical Chemistry, 28, 393. by C. Ferreira (Purdue)
25
The Increasing Complexity of Omic Analysis
+ location, interaction, and chemistry differences Genome ~25,000 genes Transcriptome ~100,000 mRNA + miRNA + lncRNA Proteome ~1,000,000 Alternative promoters, splicing, editing Post-translationalmodification Fleet 2016
26
Proteomics/Metabolomics Workflow
Data Filters Intensity Fold Well Designed Study Interpret Experiment Normalized Vetted Data Network building Sample Preparation Statistical Analysis Sample Analysis and QC analysis Pathway and Geneset Analysis Link Peaks to Databases, Quantification, Normalization Differentially Expressed List Clustering and visualization Raw Reads Fleet 2016
27
Generalized Proteomic Pipeline
Steps 1-4 are distinct from the genomics pipeline Walther and Mann (2010) J Cell Biol 190:491. Fleet 2016
28
MaxQuant File formats MS-based peptide profile wifi RAW .d mzxml
File formats wifi RAW .d mzxml Link to peptide database Integrate to proteins Modified from Tiago Sobreira, Purdue Fleet 2016
29
Proteomic Resources http://www.ebi.ac.uk/pride/archive/
31
Metabolomics Resources
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.