MML, inverse learning and medical data-sets Pritika Sanghi Supervisors: A./Prof. D. L. Dowe Dr P. E. Tischer.

Slides:



Advertisements
Similar presentations
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Advertisements

A Tutorial on Learning with Bayesian Networks
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Brief introduction on Logistic Regression
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Chapter 6 The Structural Risk Minimization Principle Junping Zhang Intelligent Information Processing Laboratory, Fudan University.
probability distributions
An Introduction to Variational Methods for Graphical Models.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Probabilistic Reasoning with Uncertain Data Yun Peng and Zhongli Ding, Rong Pan, Shenyong Zhang.
What is Statistical Modeling
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Inferring phylogenetic models for European and other Languages using MML Jane N. Ooi Supervisor : A./Prof. David L. Dowe.
Overview Full Bayesian Learning MAP learning
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Bayesian estimation Bayes’s theorem: prior, likelihood, posterior
Statistical Methods Chichang Jou Tamkang University.
MML, inverse learning and medical data-sets Pritika Sanghi Supervisors: A./Prof. D. L. Dowe Dr P. E. Tischer.
Bayesian Learning Rong Jin.
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
Maximum likelihood (ML)
. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
EM and expected complete log-likelihood Mixture of Experts
A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Advanced Higher Statistics Data Analysis and Modelling Hypothesis Testing Statistical Inference AH.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
DISCRETE PROBABILITY DISTRIBUTIONS Chapter 5. Outline  Section 5-1: Introduction  Section 5-2: Probability Distributions  Section 5-3: Mean, Variance,
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
INTRODUCTION TO Machine Learning 3rd Edition
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning CUNY Graduate Center Lecture 2: Math Primer.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Bayesian Learning Provides practical learning algorithms
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
1Causal Performance Models Causal Models for Performance Analysis of Computer Systems Jan Lemeire TELE lab May 24 th 2006.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Advanced Higher Statistics
Bivariate & Multivariate Regression Analysis
Analyzing Redistribution Matrix with Wavelet
Oliver Schulte Machine Learning 726
Chapter 12 Using Descriptive Analysis, Performing
Bayes Net Toolbox for Student Modeling (BNT-SM)
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
More about Posterior Distributions
Bayesian Models in Machine Learning
Generalized Belief Propagation
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Topological Signatures For Fast Mobility Analysis
Presentation transcript:

MML, inverse learning and medical data-sets Pritika Sanghi Supervisors: A./Prof. D. L. Dowe Dr P. E. Tischer

2 Overview What is this project about?  Bayesian Networks  Generalised Bayesian Networks Some tools  Factor Analysis  Logistic Regression  Projections

3 What is this project about? Learn some property of medical data-sets which have high dimensionality The aim of the project is to estimate complex conditional probability distributions

4 Bayesian Networks A popular tool for Data Mining Models data to infer the probability of a certain outcome They represent the frequency distributions for the values that an attribute can take as Conditional Probability Distribution (CPD) P(WS) 0.75 P(GO) 0.5 WS GOP(S | WS, GO) T T F F T F SP(A|S) TFTF Conditional Probability Tables (CPTs)

5 Bayesian Networks - Limitations When a child node depends on a large number of parent attributes, the CPD becomes very complex When the data has high dimensionality, the CPD will be complex Large amounts of data would be required to construct the CPD as there are many cases (rows in the CPT). This is not always available There will be cases in the CPT which are not seen in the training data

6 Generalised Bayesian Networks Comley and Dowe (2003, 2004) based on the ideas from Dowe and Wallace (1998) introduced Generalised Bayesian Networks This project extends their work

7 What was done in this project? Additions to Generalised Bayesian Networks  Factor Analysis The real model might be dependent on some factors (height, weight  size) Also, reduces dimensionality  Logistic Regression Gives the dependence of a binary attribute on other attributes CPDs can be represented as a logistic regression function Gives compact approximations for CPDs Projections  Help visualise the medical data (original dimensionality around 30,000)

The Tools Factor Analysis Logistic Regression Projections

9 The Minimum Message Length (MML) Principle Models the data as a two-part message consisting of hypothesis H and the data it encodes, D. The best model is the one with minimum message length. This is done by maximising the posterior probability of the hypothesis given the data, - log Pr(H|D), as the message length is negative log likelihood of the probability. Message is represented as: HypothesisData The length of the message is: - log (prior)- log likelihood

10 Factor Analysis Multiple attributes may be defined by a common factor. Representing factors will result in a more compact Bayesian Network. The Wallace and Freeman model for Single Factor Analysis was implemented. The validity of the program built was checked using the artificial and real world data- sets specified in the Wallace and Freeman paper. SizeHeightWeight LargeTallAverage LargeShortHeavy MediumAverage SmallShortLight

11 Factor Analysis Attributes A1 and A2 have a common factor F1 Attributes A3, A4 and A5 have a common factor F2 The equation for the model is DataAttribute related term Standard Deviation x nk = μ k + а k ν n + σ k r nk MeanRecord related term Random variates N(0,1)

12 Results – Factor Analysis

13 Results – Factor Analysis

14 Results - Factor Analysis No Factor

15 Results - Factor Analysis

16 Logistic Regression Mathematical modelling approach used for describing the dependence of a variable on other attributes Used to define a discrete target attribute as a function of continuous attributes Gives a compact approximation for Conditional Probability Distribution

17 Logistic Regression The equation for the model is Pr(Y i = 1) = e β 0 + β 1 X i 1 + e β 0 + β 1 X i Target binary attributeParameters Parent attribute (continuous) In previous example X i = temperature Pr(Y i = 1) = probability of fire

18 Projections Medical data-sets have high dimensionality (approximately 30,000) Impossible to visualise Projecting to lower dimensions (2D) will help visualise these data-sets

19 Projections Based on ideas from Yang (2003) A Minimum Cost Spanning Tree (MCST) of the data set is created Points are laid out in 2D* by preserving its distance exactly from two nearest neighbours which have been previously laid out After the graph is created, a new point can be laid out by preserving its distances from its two nearest neighbours * Generalises to lower dimensions other than 2D

20 Results - Projections 3D Data - XY Plane3D Data - YZ Plane 3D Data - XZ Plane 2D Projection

21 Results - Projections Projection of Central Nervous System data Dimensionality 74; No of observations 60

22 What is being done in this project? Single factor analysis tool was created Logistic regression tool is being created Tool for projecting to lower dimensions is being created (currently projects to 2D, not tested for correctness) Incorporation of these tools to the program for creating Generalised Bayesian Networks may not be done due to constraints

23 References J W Comley and D L Dowe: General Bayesian Networks and Asymmetric Languages, Proceedings of the 2003 Hawaii International Conference on Statistics and Related Fields (HICS 2003), Honolulu, Hawaii, USA, 5-8 June 2003, ISSN: , pp J W Comley and D L Dowe: Minimum Message Length and Generalised Bayesian Nets with Asymmetric Languages, in P. D. Grunwald, I. J. Myung and M. A. Pitt (ed), Advances in Minimum Description Length: Theory and Applications, MIT Press. To be published D L Dowe, C S Wallace: Kolmogorov complexity, minimum message length and inverse learning, in W Robb (ed), Proceedings of the Fourteenth Biennial Australian Statistical Conference (ASC-14), Queensland, Australia, 6-10 July, 1998, p 144. C S Wallace and P R Freeman: Single factor analysis by MML estimation, J Royal Stat. Soc. B. 54, 1, , Li Yang: Distance-preserving projection of high dimensional data. Pattern Recognition Letters, 25(2): , 2004.

24 Thank You Any questions?