Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang.

Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang

Outline Data Summary Methods Data Analysis Procedure Preliminary Results Preprocessing GC  GC-MS Data Methods

CCE Data Summary Phenotype summary for current available data for CCE project: HealthyColon Cancer Rectal Cancer PolypNATotal Lipidomics (Lipid) 2210212046 GProteomics (GP) 339220165 NMR252123253 Teac54175412119 Comet27124 055

Summary of Overlap Dataset Overlap between any 2 data sets: Overlap among any 3 data sets Overlap among any 4 data sets LipidGPNMRTeacComet Lipid464104641 GP4165176343 NMR01753522 Teac46635211955 Comet4143255 Lipid & GP & Teac41GP & NMR & Teac16 Lipid & GP &Comet37GP & Teac & Comet43 Lipid & Teac & Comet41NMR & Teac & Comet2 Lipid & GP & Teac & Comet37

Overlap of Different Omics Data

Methods for Integrating Omics Common methods: - Principal Component Analysis (Jolliffe, I. 1986), - Co-Inertia Analysis (Doledec, S. and Chessel, D.,1994) - Partial Least Squares (Wold, H., 1966) - Bayesian Analysis method (Webb-Robertson et. al., 2009) Our methods: We use iteratively weighted partial least squares method (IWPLS) to fit the model for each individual data set, then we use Bayesian method to integrate the results from individual data set.

Overlap B/W NMR and G-Proteomics NMR: 53 samplesGlobal Proteomics: 65 samples Overlap: 17 samples One sample: without phenotype information One sample: from blood draw 2 15 samples: all from blood draw 1 with phenotype as either “Healthy Control” or “Polyp”

Data Analysis Procedure Metabolomics (NMR) Data Preprocessing Ending with 1824 Variables IWPLS method Global Proteomics Data Preprocessing Ending with 5407 Variables IWPLS method Integrate Results

Analysis Results Our method:

Analysis Results (cont.) Summary: Other Methods Tried: - PLS: ending with 0 components; -Univariate t-test: none variables is significant. DataClassification Rate GProteomics100% NMR85.7% Integrated NMR and GProteomics 100%

Example: Overlap of Three Data Sets For overlap among three data sets, we focus on the overlap among Lipidomics, Teac and Comet. Data summary: -Phenotype summary: - Variable summary: Data analysis: we group patients of colon cancer and rectal cancer together as cancer group, while keeping the other two groups. The we try the following methods: Method 1: POCRE Method 2: ANOVA test PhenotypeHealthyPolypColonRectalTotal Sample size20109241 LipidomicsTeacComet Number of variables5212

Results Misclassification rate: Variables identified: POCREANOVA 17%39% POCRELipids: Teac: TEAC_mM ANOVALipids: Teac: TEAC_mM

Preprocessing GC x GC-MS Methods How to choose the reference sample for alignment? - Choose the chromatogram in the middle of the run sequence or the chromatogram containing the highest number of common chemical constituents (i.e. peaks) - Choose the chromatogram that is most similar to the loading of the first principal component in a PCA model on the unaligned data, or simply to the mean of all chromatogram. Similarity index method for choosing reference sample: For a given chromatogram, the similarity index is defined as: where The one with the maximum similarity index will be chosen as the reference sample. Ref: Skov, T. et al, Automated Alignment of Chromatographic Data, Journal of Chemometrics, Vol. 20, Issue 11-12, page: 484-497, 2007.

Results

Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang.

Similar presentations

Presentation on theme: "Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang.

Similar presentations

Presentation on theme: "Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang."— Presentation transcript:

Similar presentations

About project

Feedback