Presentation is loading. Please wait.

Presentation is loading. Please wait.

Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang.

Similar presentations


Presentation on theme: "Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang."— Presentation transcript:

1 Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang

2 Outline Data Summary Methods Data Analysis Procedure Preliminary Results Preprocessing GC  GC-MS Data Methods

3 CCE Data Summary Phenotype summary for current available data for CCE project: HealthyColon Cancer Rectal Cancer PolypNATotal Lipidomics (Lipid) 2210212046 GProteomics (GP) 339220165 NMR252123253 Teac54175412119 Comet27124 055

4 Summary of Overlap Dataset Overlap between any 2 data sets: Overlap among any 3 data sets Overlap among any 4 data sets LipidGPNMRTeacComet Lipid464104641 GP4165176343 NMR01753522 Teac46635211955 Comet4143255 Lipid & GP & Teac41GP & NMR & Teac16 Lipid & GP &Comet37GP & Teac & Comet43 Lipid & Teac & Comet41NMR & Teac & Comet2 Lipid & GP & Teac & Comet37

5 Overlap of Different Omics Data

6 Methods for Integrating Omics Common methods: - Principal Component Analysis (Jolliffe, I. 1986), - Co-Inertia Analysis (Doledec, S. and Chessel, D.,1994) - Partial Least Squares (Wold, H., 1966) - Bayesian Analysis method (Webb-Robertson et. al., 2009) Our methods: We use iteratively weighted partial least squares method (IWPLS) to fit the model for each individual data set, then we use Bayesian method to integrate the results from individual data set.

7 Overlap B/W NMR and G-Proteomics NMR: 53 samplesGlobal Proteomics: 65 samples Overlap: 17 samples One sample: without phenotype information One sample: from blood draw 2 15 samples: all from blood draw 1 with phenotype as either “Healthy Control” or “Polyp”

8 Data Analysis Procedure Metabolomics (NMR) Data Preprocessing Ending with 1824 Variables IWPLS method Global Proteomics Data Preprocessing Ending with 5407 Variables IWPLS method Integrate Results

9 Analysis Results Our method:

10 Analysis Results (cont.) Summary: Other Methods Tried: - PLS: ending with 0 components; -Univariate t-test: none variables is significant. DataClassification Rate GProteomics100% NMR85.7% Integrated NMR and GProteomics 100%

11 Example: Overlap of Three Data Sets For overlap among three data sets, we focus on the overlap among Lipidomics, Teac and Comet. Data summary: -Phenotype summary: - Variable summary: Data analysis: we group patients of colon cancer and rectal cancer together as cancer group, while keeping the other two groups. The we try the following methods: Method 1: POCRE Method 2: ANOVA test PhenotypeHealthyPolypColonRectalTotal Sample size20109241 LipidomicsTeacComet Number of variables5212

12 Results Misclassification rate: Variables identified: POCREANOVA 17%39% POCRELipids: Teac: TEAC_mM ANOVALipids: Teac: TEAC_mM

13 Preprocessing GC x GC-MS Methods How to choose the reference sample for alignment? - Choose the chromatogram in the middle of the run sequence or the chromatogram containing the highest number of common chemical constituents (i.e. peaks) - Choose the chromatogram that is most similar to the loading of the first principal component in a PCA model on the unaligned data, or simply to the mean of all chromatogram. Similarity index method for choosing reference sample: For a given chromatogram, the similarity index is defined as: where The one with the maximum similarity index will be chosen as the reference sample. Ref: Skov, T. et al, Automated Alignment of Chromatographic Data, Journal of Chemometrics, Vol. 20, Issue 11-12, page: 484-497, 2007.

14 Results


Download ppt "Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang."

Similar presentations


Ads by Google