Exploring Metabolomic data with recursive partitioning Metabolomic Workshop NISS July 14-15, 2005.

Slides:



Advertisements
Similar presentations
Conceptual Clustering
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
NISS Metabolomics Workshop, Integrative Analysis of High Dimensional Gene Expression, Metabolite and Blood Chemistry Data Kwan R. Lee, Ph.D. and.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Distributed Process Scheduling Summery Distributed Process Scheduling Summery BY:-Yonatan Negash.
Yeast - why it simply has a lot to say about human disease.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FastANOVA: an Efficient Algorithm for Genome-Wide Association Study Xiang Zhang Fei Zou Wei Wang University.
Recursive Partitioning Method on Survival Outcomes for Personalized Medicine 2nd International Conference on Predictive, Preventive and Personalized Medicine.
Psychology 202b Advanced Psychological Statistics, II March 29, 2011.
Metabolomics Bob Ward German Lab Food Science and Technology.
Fitness effects of HIV mutations Lucy Crooks Theoretical Biology, ETH Zurich.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Chapter Sixteen EXPLORING, DISPLAYING, AND EXAMINING DATA
Classification Continued
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
Analyzing Metabolomic Datasets Jack Liu Statistical Science, RTP, GSK
Detection and Resolution of Anomalies in Firewall Policy Rules
CBP 2006MSc. Computing1 Modelling and Simulation.
2007 GeneSpring MS GeneSpring for Metabolite BioMarker Analysis using Mass Spectrometry data Agilent Q-TOF VIP Visit Jan 16-17, 2007 Santa Clara, CA Thon.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Using Random Forests to explore a complex Metabolomic data set Susan Simmons Department of Mathematics and Statistics University of North Carolina Wilmington.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Metabolomics Metabolome Reflects the State of the Cell, Organ or Organism Change in the metabolome is a direct consequence of protein activity changes.
Empirical Validation of the Effectiveness of Chemical Descriptors in Data Mining Kirk Simmons DuPont Crop Protection Stine-Haskell Research Center 1090.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
APPLICATION OF DATAMINING TOOL FOR CLASSIFICATION OF ORGANIZATIONAL CHANGE EXPECTATION Şule ÖZMEN Serra YURTKORU Beril SİPAHİ.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
Types of Research Studies. Observation Observation is the simplest scientific technique Participant and researcher bias can occur Naturalistic observation.
Multivariate Data Analysis Chapter 2 – Examining Your Data
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
C OMPARING A SSOCIATION R ULES AND D ECISION T REES FOR D ISEASE P REDICTION Carlos Ordonez.
Data Mining Consultant GlaxoSmithKline: US Pharma IT
1 High Throughput Target Identification Stan Young, NISS Doug Hawkins, U Minnesota Christophe Lambert, Golden Helix Machine Learning, Statistics, and Discovery.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
1 Chapter 3: Graphical Data Exploration 3.1 Exploring Relationships Between Continuous Columns 3.2 Examining Relationships Between Categorical Columns.
Biological Data Mining A comparison of Neural Network and Symbolic Techniques
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Specification: Choosing the Independent.
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
Designing a metabolomics experiment
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
1 8. Estimating the cluster tree of a density from the MST by Runt Pruning Problem: 1-nn density estimate is very noisy --- singularity at each observation.
Multiplication Find the missing value x __ = 32.
Heping Zhang, Chang-Yung Yu, Burton Singer, Momian Xiong
Decision Trees.
Discovery and Development of Medicines
Trails Carolina
Therapy Programs Provider: Trails Carolina
Trails Carolina: Social Media Profiles
Assessing Hierarchical Modularity in Protein Interaction Networks
Classification and Prediction
Topic: Medicine of the future Reading: Harbron, Chris (2006)
Differential Privacy (2)
Department of Biochemistry and Molecular Biology
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Race into a healthy future!!!
Decision trees MARIO REGIN.
A machine learning approach to prognostic and predictive covariate identification for subgroup analysis David A. James and David Ohlssen Advanced Exploratory.
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Exploring Metabolomic data with recursive partitioning Metabolomic Workshop NISS July 14-15, 2005

University of North Carolina Wilmington Why study metabolites? Metabolomics – the global study of all small molecules produced in the human body Biochemical consequences of environment, drugs, and mutations can be observed directly through metabolites Understand how drugs work, interactions and possible side effects ~2500 metabolites

University of North Carolina Wilmington Challenges of metabolomic data Nonnormal distributions Outliers Informative missing values High correlation among metabolites n < p problem (n - number of biological samples and p - number of metabolites)

University of North Carolina Wilmington Why recursive partitioning? Is fairly robust to non-normal data Missing values is not an issue Correlation among variables is not an issue Useful for discovering outliers Is efficient at handling large p, small n data sets

University of North Carolina Wilmington How recursive partitioning works Recursive partitioning efficiently searches through all of the variables and finds the one with the best split (most significant) Once data is split or “partitioned” on this variable, the resulting daughter nodes are more homogeneous Now each daughter node is explored to find the best split This process is continued until no significant split remains

University of North Carolina Wilmington Example

University of North Carolina Wilmington Multiple Trees All effects are not necessarily found in a single tree In any node, there may be more than one significant variable Creating multiple trees may reveal a number of possible effects Gain an understanding of interactions/correlations among metabolites

University of North Carolina Wilmington Software Helix Tree (Partitionator) Uses Formal Inference-based Recursive Modeling (FIRM) developed by Douglas Hawkins Anyone can download free 7 day trial (webinars to assist in using the software)

University of North Carolina Wilmington Illustration of Software Data –317 metabolites –LC/MS and GC/MS –63 biological samples –Want to discover which metabolites differentiate between the diseased group and the “healthy” individuals (within the diseased group there is a subset of individuals currently taking drugs)