Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.

Slides:



Advertisements
Similar presentations
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Advertisements

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Exact Inference in Bayes Nets
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.
An Introduction to Variational Methods for Graphical Models.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
What is Statistical Modeling
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Chapter 10 Simple Regression.
Data Mining Techniques Outline
Connected Components, Directed Graphs, Topological Sort COMP171.
Advanced Topics in Data Mining Special focus: Social Networks.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Chapter 11 Multiple Regression.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
The Role of Specialization in LDPC Codes Jeremy Thorpe Pizza Meeting Talk 2/12/03.
Comp 540 Chapter 9: Additive Models, Trees, and Related Methods
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Computer vision: models, learning and inference
Outline Separating Hyperplanes – Separable Case
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Liang Ge.  Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Data Structures & Algorithms Graphs
CS Statistical Machine learning Lecture 24
An Introduction to Variational Methods for Graphical Models
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
Lecture 2: Statistical learning primer for biologists
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
On the Ability of Graph Coloring Heuristics to Find Substructures in Social Networks David Chalupa By, Tejaswini Nallagatla.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Biointelligence Laboratory, Seoul National University
Cohesive Subgraph Computation over Large Graphs
Chapter 7. Classification and Prediction
Learning Deep Generative Models by Ruslan Salakhutdinov
Chapter 3: Maximum-Likelihood Parameter Estimation
Introduction to Machine Learning and Tree Based Methods
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Boosting and Additive Trees (2)
Menglong Li Ph.d of Industrial Engineering Dec 1st 2016
CS200: Algorithm Analysis
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Markov Random Fields Presented by: Vladan Radosavljevic.
Ch11 Curve Fitting II.
Presentation transcript:

Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with the data flood (STT, 65) (pp ). Den Haag, the Netherlands: STT/Beweton. J.H. Friedman and N.I. Fisher (1999) Bump-hunting in high-dimensional data. Statistics and Computing, 9:123–143. Brief Intro to Undirected Graphical models Overview Regression-based models

Bump Hunting - The objective Find regions in the feature space, where the outcome variable has high average value. In classification, it means a region of the feature space where the majority of the samples are in one class. The decision rule looks like an intersection of several conditions (each on one predictor variable) If condition 1 & condition 2 &…… & condition N, then predict value … Ex: if 0<x 1 <1 & 2<x 2 <5 &…& -1<x n <0, then class 1

Bump Hunting - The objective When the dimension is high, and there is many such boxes, the problem is not easy.

Bump Hunting - The objective Let’s formalize the problem: Predictors x=( ) Target variable y, either continuous or binary Feature space: Find subspace such that Note: when y is binary, this is not mean of y. Rather, it is Pr(y=1 | x R) Define any box:

Box in continuous feature space: Bump Hunting - The objective

Box in categorical feature space. Bump Hunting - The objective

Sequentially find box in subsets of the data. Bump Hunting - PRIM Support of a box: Continue search for boxes until not enough support for the new box.

Bump Hunting - PRIM “Patient Rule Induction Method” Two steps: (1) Patient successive top-down refinement (2) Bottom-up recursive expansion These are greedy algorithms.

Bump Hunting - PRIM Peeling: Begin with box B containing all data (or all remaining data in later steps) Remove sub-box b*, which maximizes in B-b* The candidate box b is defined on a single variable (peeling only in one of the dimensions), and only a small percentile is peeled each time.

Bump Hunting - PRIM This is a greedy hill-climb algorithm. Stop the iteration when the support drops to pre- determined threshold. Why called “patient …”? Only remove a small fraction at each step.

Bump Hunting - PRIM Pasting: In peeling, box boundries are determined without knowledge of later peels. Some non-optimal steps can be taken. Final box could be improved by boundary adjustments.

Bump Hunting - PRIM example

2/7

Bump Hunting - PRIM example

3/7

Bump Hunting - PRIM example The winner is:

Bump Hunting - PRIM example The next peel: 1. And β= 0.4

Bump Hunting - PRIM example

Bump Hunting - Beam search algorithm At each step, w best sub-boxes (each on a single variable) are selected. Minimum support requirement. More greedy --- at each step, much more can be peeled than PRIM  optimization on one of the variables.

Bump Hunting - Beam search algorithm

W=2 Bump Hunting - Beam search algorithm

Bump Hunting - About PRIM It is a greedy search. However, it is “patient”. This is important. Methods that partition the data much faster, e.g. Beam search and CART, could be less successful. The “patient” method makes it easier to recover from previous “unfortunate” steps, since we don’t run out of the data too fast. It doesn’t select off predictors due to high correlation within them.

Undirected Graph Models - Introduction A network/graph is a set of vertices connected by edges. undirected edges  “undirected network” directed edges  “directed network”. Vertex-level characteristic: The number of connections to a vertex : “degree” Incoming edges  “in-degree” k i Outgoing edges  “out-degree” k o k=k i +k o kiki koko Evolution of networks. S.N. Dorogovtsev, J.F.F. Mendes

Undirected Graph Models - Introduction Graphical models – a visual expression of the joint distribution of the entire set of random variables. Undirected graphical model – also known as “Markov random fields” or “Markov networks”. Lack of connection in such a network – conditional independence given all other variables. Sparse graphs – small number of edges – easy to interpret. Edges – encode the strength of conditional dependency.

Undirected Graph Models - Introduction

Y “separates” X and Z Pairwise Markov independency Ex: Global Markov independency: Subgraphs A, B and C. If every path between A and B intersects with a node in C  C separates A and B. Ex:

Undirected Graph Models - Introduction Pairwise Markov independency Based on Global Markov independency Clique – a complete (all pairs connected) subgraph Maximal clique – a clique; no other vertices can be added to yield a clique. Ex: {X, Y}, {Y, Z}, {Z, W} of graph above

Undirected Graph Models - Introduction A probability density function f over a Markov graph G can be presented: Either distribution can represent the dependence structure: Pairwise Markov graphs concerns f (2) above.

Undirected Graph Models – Gaussian Graphical Model -Observations have a multivariate Gaussian distribution with mean μ and covariance matrix Σ. -Gaussian distribution represents at most second- order relationships, it automatically encodes a pairwise Markov graph. -All conditional distributions are also Gaussian. -If the ij th component of Θ = Σ −1 is zero, then variables i and j are conditionally independent -Y is one variable, Z = (X 1,...,X p−1 ) is the rest of variables, then the conditional distribution is Same as population multiple linear regression of Y on Z

Partition Θ = Σ −1 the same way, then because ΣΘ = I, Thus the regression coefficient of Y~Z, -Zero elements in β and hence θ ZY mean that the corresponding elements of Z are conditionally independent of Y -We can learn the dependence structure through multiple linear regression. Undirected Graph Models – Gaussian Graphical Model

Finding parameters when network structure is known. -Take empirical mean x ̄ and covariance matrix -The log likelihood of data is -The quantity −l(Θ) is a convex function of Θ Undirected Graph Models – Gaussian Graphical Model

Estimating graph structure: Meinshausen and Bu ̈hlmann’s regression approach (2006) -Fit a lasso regression using each variable as the response and the others as predictors. -θ ij is estimated to be nonzero if estimated coefficient of variable i on j is nonzero, OR (alternatively AND) estimated coefficient of variable j on i is nonzero

Undirected Graph Models – Gaussian Graphical Model More formally – graphical lasso -Penalized likelihood