Download presentation
Presentation is loading. Please wait.
Published byJoan Shelton Modified over 9 years ago
1
Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang
2
Outline Data Summary Methods Data Analysis Procedure Preliminary Results Preprocessing GC GC-MS Data Methods
3
CCE Data Summary Phenotype summary for current available data for CCE project: HealthyColon Cancer Rectal Cancer PolypNATotal Lipidomics (Lipid) 2210212046 GProteomics (GP) 339220165 NMR252123253 Teac54175412119 Comet27124 055
4
Summary of Overlap Dataset Overlap between any 2 data sets: Overlap among any 3 data sets Overlap among any 4 data sets LipidGPNMRTeacComet Lipid464104641 GP4165176343 NMR01753522 Teac46635211955 Comet4143255 Lipid & GP & Teac41GP & NMR & Teac16 Lipid & GP &Comet37GP & Teac & Comet43 Lipid & Teac & Comet41NMR & Teac & Comet2 Lipid & GP & Teac & Comet37
5
Overlap of Different Omics Data
6
Methods for Integrating Omics Common methods: - Principal Component Analysis (Jolliffe, I. 1986), - Co-Inertia Analysis (Doledec, S. and Chessel, D.,1994) - Partial Least Squares (Wold, H., 1966) - Bayesian Analysis method (Webb-Robertson et. al., 2009) Our methods: We use iteratively weighted partial least squares method (IWPLS) to fit the model for each individual data set, then we use Bayesian method to integrate the results from individual data set.
7
Overlap B/W NMR and G-Proteomics NMR: 53 samplesGlobal Proteomics: 65 samples Overlap: 17 samples One sample: without phenotype information One sample: from blood draw 2 15 samples: all from blood draw 1 with phenotype as either “Healthy Control” or “Polyp”
8
Data Analysis Procedure Metabolomics (NMR) Data Preprocessing Ending with 1824 Variables IWPLS method Global Proteomics Data Preprocessing Ending with 5407 Variables IWPLS method Integrate Results
9
Analysis Results Our method:
10
Analysis Results (cont.) Summary: Other Methods Tried: - PLS: ending with 0 components; -Univariate t-test: none variables is significant. DataClassification Rate GProteomics100% NMR85.7% Integrated NMR and GProteomics 100%
11
Example: Overlap of Three Data Sets For overlap among three data sets, we focus on the overlap among Lipidomics, Teac and Comet. Data summary: -Phenotype summary: - Variable summary: Data analysis: we group patients of colon cancer and rectal cancer together as cancer group, while keeping the other two groups. The we try the following methods: Method 1: POCRE Method 2: ANOVA test PhenotypeHealthyPolypColonRectalTotal Sample size20109241 LipidomicsTeacComet Number of variables5212
12
Results Misclassification rate: Variables identified: POCREANOVA 17%39% POCRELipids: Teac: TEAC_mM ANOVALipids: Teac: TEAC_mM
13
Preprocessing GC x GC-MS Methods How to choose the reference sample for alignment? - Choose the chromatogram in the middle of the run sequence or the chromatogram containing the highest number of common chemical constituents (i.e. peaks) - Choose the chromatogram that is most similar to the loading of the first principal component in a PCA model on the unaligned data, or simply to the mean of all chromatogram. Similarity index method for choosing reference sample: For a given chromatogram, the similarity index is defined as: where The one with the maximum similarity index will be chosen as the reference sample. Ref: Skov, T. et al, Automated Alignment of Chromatographic Data, Journal of Chemometrics, Vol. 20, Issue 11-12, page: 484-497, 2007.
14
Results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.