Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang.

Slides:



Advertisements
Similar presentations
Baseline Characteristics According to CET Rate Marianne Zeller, et al. J Am Coll Cardiol 2007;50:
Advertisements

Weighted Least Squares Regression Dose-Response Study for Rosuvastin in Japanese Patients with High Cholesterol "Randomized Dose-Response Study of Rosuvastin.
NISS Metabolomics Workshop, Integrative Analysis of High Dimensional Gene Expression, Metabolite and Blood Chemistry Data Kwan R. Lee, Ph.D. and.
Structural Equation Modeling analysis for causal inference from multiple -omics datasets So-Youn Shin, Ann-Kristin Petersen Christian Gieger, Nicole Soranzo.
Methods: Metabolomics Workflow Introduction Figure 1a: 1 H NMR spectrum of blood serum sample from a breast cancer patient. Results The emerging area of.
Receptor Models for PAH Source Characterisation: Opportunities and Limitations Development of this presentation was supported by the Pavement Coatings.
Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test.
1 Parametric Sensitivity Analysis For Cancer Survival Models Using Large- Sample Normal Approximations To The Bayesian Posterior Distribution Gordon B.
Statistical Modeling of OMICS data Min Zhang, M.D., Ph.D. Department of Statistics Purdue University.
Lecture 7: Principal component analysis (PCA)
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis
Metabolomics Bob Ward German Lab Food Science and Technology.
CALIBRATION Prof.Dr.Cevdet Demir
Proteomic Mass Spectrometry
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Prof.Dr.Cevdet Demir
Department of Science U.S. Coast Guard Academy New London, Connecticut LCDR Gregory J. Hall Glenn S. Frysinger Chemometric Methods.
Principal Component Analysis. Consider a collection of points.
Metabolomic Data Processing & Statistical Analysis
Analyzing Metabolomic Datasets Jack Liu Statistical Science, RTP, GSK
Cancer Care Engineering Colorectal Cancer Gabriela Chiorean, M.D. May 27, 2011.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
The CCE 5 th Annual Retreat Global Proteomics & Determination of Vitamin D Metabolites Update Jiri Adamec.
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
Sirius™ version 6.0 Sirius™ is a software package for multivariate data analysis and experimental design. Application areas: Spectral analysis and calibration.
Metabolomics 5/2/2014. ‘Omics Family Tree W. M. Claudino, et al., Journal of Clinical Oncology, 2007, 25(19), pp /2/2014.
Vitamin D and Cancer Dorothy Teegarden, Ph.D. Purdue University Professor and Associate Head for Research, Department of Nutrition Science Lead, Cancer.
CceHUB Sharing, Exploring and Analyzing Data An Environment for Collaborative Cancer Research clinical dataobservational & scientific data decision supportcomputation.
MODEGAT Chalmers University of Technology Use of Latent Variables in the Parameter Estimation Process Jonas Sjöblom Energy and Environment Chalmers.
CCE project update Metabolomics Raftery Group. Original Study 20 cancer, 28 normals and 14 with polyps NMR and GC-MS study.
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
ASCA: analysis of multivariate data from an experimental design, Biosystems Data Analysis group Universiteit van Amsterdam.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Reanalysis of Petricoin et al. Ovarian Cancer Data Set 3 Russ Wolfinger and Geoff Mann SAS Institute Inc. NISS Proteomics Workshop March 6, 2003.
CceHUB An Environment for Collaborative Cancer Research Ann Christine Catlin CCE Annual Retreat May 26, 2010 clinical dataobservational & scientific data.
Place, date, unit, occasion etc. Slide 1 Nikoline Juul Nielsen, Post Doc Soil and Environmental Chemistry (present) Bioorganic Chemistry (previous) Exploratory.
Multivariate Data Analysis Chapter 5 – Discrimination Analysis and Logistic Regression.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
CceHUB Sharing, Exploring and Analyzing Data An Environment for Collaborative Cancer Research clinical dataobservational & scientific data decision supportcomputation.
Metabolomics Metabolome Reflects the State of the Cell, Organ or Organism Change in the metabolome is a direct consequence of protein activity changes.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
CLASSIFICATION. Periodic Table of Elements 1789 Lavosier 1869 Mendelev.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
1/5 D. C. McDonald et al, Status of the ITPA paper “Recent progress of the ITPA global H-mode confinement and pedestal databases”, Kyoto, 18-20th Apr 2005.
CceHUB omicsknowledgebase Ann Christine Catlin 3 rd Annual Cancer Care Engineering Retreat June 20, 2008 An Environment for CCE Research.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Joe Pekny, Professor Chemical Engineering Director, e-Enterprise Center Discovery Park Marietta Harrison, Professor Medicinal Chemistry & Molecular Pharmacology.
Feature Selection and Extraction Michael J. Watts
Applying MetaboAnalyst
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Where is this place? A Glacier National Park located in the U.S. state of Montana B Mt. Rainier National Park located in the U.S. state Washington B.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
A comparison of PLS-based and other dimension reduction methods for tumour classification using microarray data Cameron Hurst Institute of Health and Biomedical.
Table 2. Changes in body composition, energy and water intake
Bridging the Lab – Process Line Gap C
Mauro Maniscalco, MD, PhD, Debora Paris, PhD, Dominique J
Nat. Rev. Nephrol. doi: /nrneph
Standards Development for Metabolomics
Copyright © 2016 Elsevier Inc. All rights reserved.
Volume 7, Issue 2, Pages (February 2008)
Comparison of distal gut metabolite profiles between Egyptian and U. S
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Covering Principle to Address Multiplicity in Hypothesis Testing
© The Author(s) Published by Science and Education Publishing.
Lecture 8: Factor analysis (FA)
Peripheral Direct Adjacent Lobe Invasion Non-small Cell Lung Cancer Has a Similar Survival to That of Parietal Pleural Invasion T3 Disease  Hao-Xian Yang,
Factor Analysis.
Taiwanese Version of the M. D
Presentation transcript:

Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang

Outline Data Summary Methods Data Analysis Procedure Preliminary Results Preprocessing GC  GC-MS Data Methods

CCE Data Summary Phenotype summary for current available data for CCE project: HealthyColon Cancer Rectal Cancer PolypNATotal Lipidomics (Lipid) GProteomics (GP) NMR Teac Comet

Summary of Overlap Dataset Overlap between any 2 data sets: Overlap among any 3 data sets Overlap among any 4 data sets LipidGPNMRTeacComet Lipid GP NMR Teac Comet Lipid & GP & Teac41GP & NMR & Teac16 Lipid & GP &Comet37GP & Teac & Comet43 Lipid & Teac & Comet41NMR & Teac & Comet2 Lipid & GP & Teac & Comet37

Overlap of Different Omics Data

Methods for Integrating Omics Common methods: - Principal Component Analysis (Jolliffe, I. 1986), - Co-Inertia Analysis (Doledec, S. and Chessel, D.,1994) - Partial Least Squares (Wold, H., 1966) - Bayesian Analysis method (Webb-Robertson et. al., 2009) Our methods: We use iteratively weighted partial least squares method (IWPLS) to fit the model for each individual data set, then we use Bayesian method to integrate the results from individual data set.

Overlap B/W NMR and G-Proteomics NMR: 53 samplesGlobal Proteomics: 65 samples Overlap: 17 samples One sample: without phenotype information One sample: from blood draw 2 15 samples: all from blood draw 1 with phenotype as either “Healthy Control” or “Polyp”

Data Analysis Procedure Metabolomics (NMR) Data Preprocessing Ending with 1824 Variables IWPLS method Global Proteomics Data Preprocessing Ending with 5407 Variables IWPLS method Integrate Results

Analysis Results Our method:

Analysis Results (cont.) Summary: Other Methods Tried: - PLS: ending with 0 components; -Univariate t-test: none variables is significant. DataClassification Rate GProteomics100% NMR85.7% Integrated NMR and GProteomics 100%

Example: Overlap of Three Data Sets For overlap among three data sets, we focus on the overlap among Lipidomics, Teac and Comet. Data summary: -Phenotype summary: - Variable summary: Data analysis: we group patients of colon cancer and rectal cancer together as cancer group, while keeping the other two groups. The we try the following methods: Method 1: POCRE Method 2: ANOVA test PhenotypeHealthyPolypColonRectalTotal Sample size LipidomicsTeacComet Number of variables5212

Results Misclassification rate: Variables identified: POCREANOVA 17%39% POCRELipids: Teac: TEAC_mM ANOVALipids: Teac: TEAC_mM

Preprocessing GC x GC-MS Methods How to choose the reference sample for alignment? - Choose the chromatogram in the middle of the run sequence or the chromatogram containing the highest number of common chemical constituents (i.e. peaks) - Choose the chromatogram that is most similar to the loading of the first principal component in a PCA model on the unaligned data, or simply to the mean of all chromatogram. Similarity index method for choosing reference sample: For a given chromatogram, the similarity index is defined as: where The one with the maximum similarity index will be chosen as the reference sample. Ref: Skov, T. et al, Automated Alignment of Chromatographic Data, Journal of Chemometrics, Vol. 20, Issue 11-12, page: , 2007.

Results