Meredith L. Wilcox FIU, Department of Epidemiology/Biostatistics

Slides:



Advertisements
Similar presentations
Autonomic Scaling of Cloud Computing Resources
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
An Overview of Machine Learning
An Introduction to Variational Methods for Graphical Models.
Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.
Introduction of Probabilistic Reasoning and Bayesian Networks
Consistent probabilistic outputs for protein function prediction William Stafford Noble Department of Genome Sciences Department of Computer Science and.
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Classification and risk prediction
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
Bayes Net Perspectives on Causation and Causal Inference
Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.
A Brief Introduction to Graphical Models
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Bug Localization with Machine Learning Techniques Wujie Zheng
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
EMBC2001 Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Optimal Bayes Classification
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Slides for “Data Mining” by I. H. Witten and E. Frank.
K2 Algorithm Presentation KDD Lab, CIS Department, KSU
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Lecture 2: Statistical learning primer for biologists
Linear Discriminant Analysis and Logistic Regression.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Modeling of Core Protection Calculator System Software February 28, 2005 Kim, Sung Ho Kim, Sung Ho.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Introduction to Machine Learning Nir Ailon Lecture 11: Probabilistic Models.
BINARY LOGISTIC REGRESSION
Logistic Regression: To classify gene pairs
CS 2750: Machine Learning Directed Graphical Models
CHAPTER 16: Graphical Models
Logistic Regression APKC – STATS AFAC (2016).
Variational Bayes Model Selection for Mixture Distribution
Data Mining Lecture 11.
Distributions and Concepts in Probability Theory
Markov Properties of Directed Acyclic Graphs
CSCI 5822 Probabilistic Models of Human and Machine Learning
A Short Tutorial on Causal Network Modeling and Discovery
An Introduction to Variational Methods for Graphical Models
Introduction to Logistic Regression
Prepared by: Mahmoud Rafeek Al-Farra
Somi Jacob and Christian Bach
Model generalization Brief summary of methods
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
INTRODUCTION TO Machine Learning
Read R&N Ch Next lecture: Read R&N
Machine Learning – a Probabilistic Perspective
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
What is Artificial Intelligence?
Presentation transcript:

Meredith L. Wilcox FIU, Department of Epidemiology/Biostatistics SNP Data Analysis Meredith L. Wilcox FIU, Department of Epidemiology/Biostatistics

Presentation Outline Part I. - Preliminary Analysis Introduction Methods Results Discussion Part II. - Ongoing Research Purpose Background Software

Part I. Preliminary Analysis - Introduction – Methods – Results – Discussion -

Part I. Introduction Purpose of the preliminary analysis: To develop 2 initial prediction models for the leukemia SNP data… - Logistic regression - Naïve Bayes Classifier To determine which of the two models is the better prediction model Overall goal: to improve SNP data analysis

Part I. Introduction Data SNP data consisting of 220 binary variables Output variable: CACO Leukemia present/absence Input variables: Sex (F/M) 218 SNPs (dominant/not dominant) Observations: 485 individuals Data is complete; no missing values

Part I. Methods Models using R Logistic Regression assumption: observations are independent only simple logistic regression considered (no interaction among input variables) Naïve Bayes Classifier assumption: input variables are independent given the outcome

Part I. Methods Variable Selection Goodness of Fit Measures Selection based on log likelihood score 10 input variables per model Goodness of Fit Measures 4-fold cross-validation Area under ROC

Part I. Results Variables Selected Same 10 SNPs were identified as input variables for both logistic regression and Naïve Bayes: 1. TFRC_rs406721 6. TFRC_rs3326 2. TGFB1_rs1982072 7. HFE 3. HFE_rs807212 8. RXRB_rs421446 4. HLA_DRB1_DQA1_rs2395225 9. ACP1_rs11553746 5. LTF_rs1042073 10. DQA1_3UTR

Part I. Results Cross Validation Average training/test error for each model Logistic Regression  lower average training/test error

Part I. Results- ROC curve AUROC: LR=0.79; NB=0.70

Part I. Discussion Limitations of the preliminary analysis: Better methods for variable selection available (stepwise procedures: forward selection, backward elimination). Recessive, additive, & heterozygous properties of genes not included in analysis Biology between disease & genes not considered in variable selection Interaction not considered in logistic regression model

Part II. Ongoing Research - Background – Purpose – Software -

Part II. Background Bayesian Networks Probabilistic graphical model consisting of two components- an acyclic directed graph (DAG) a set of local probability distributions Example of a DAG. (nodes = random variables) (arcs = conditional dependencies)

Part II. Background P(SNP1,L|SNP3) = P(SNP1|SNP3)*P(L|SNP3) Markov Condition P(SNP1,L|SNP3) = P(SNP1|SNP3)*P(L|SNP3)

Part II. Purpose To analyze SNP data using the following models: Bayesian Networks Multifactor Dimensionality Reduction Support Vector Machine To compare the prediction capability of the above models to other widely used models

Part II. Software Development of R code for Bayesian networks analysis to search for “best” Bayesian Network utilize search in variable ordering with MCMC method Arc reversal

Thank you.