POSTER TEMPLATE BY: www.PosterPresentations.com Cluster-Based Modeling: Exploring the Linear Regression Model Space Student: XiaYi(Sandy) Shen Advisor:

Slides:



Advertisements
Similar presentations
Questions From Yesterday
Advertisements

CHAPTER 13: Alpaydin: Kernel Machines
Chapter 5 Multiple Linear Regression
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
Objectives (BPS chapter 24)
2.2 Correlation Correlation measures the direction and strength of the linear relationship between two quantitative variables.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
9. SIMPLE LINEAR REGESSION AND CORRELATION
Adaptive Rao-Blackwellized Particle Filter and It’s Evaluation for Tracking in Surveillance Xinyu Xu and Baoxin Li, Senior Member, IEEE.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Multiple Regression MARE 250 Dr. Jason Turner.
POSTER TEMPLATE BY: Note: in high dimensions, the data are sphered prior to distance matrix calculation. Three Groups Example;
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Berkeley Parlab 1. INTRODUCTION A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing 2. CORRELATIONS TO THE GROUND.
Inferences About Process Quality
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Presented by Arun Qamra
A Regression Model for Ensemble Forecasts David Unger Climate Prediction Center.
Correlation & Regression
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Large Two-way Arrays Douglas M. Hawkins School of Statistics University of Minnesota
Simple Linear Regression
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Spatial Statistics in Ecology: Continuous Data Lecture Three.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
MARE 250 Dr. Jason Turner Multiple Regression. y Linear Regression y = b 0 + b 1 x y = dependent variable b 0 + b 1 = are constants b 0 = y intercept.
A Toolkit for Remote Sensing Enviroinformatics Clustering Fazlul Shahriar, George Bonev Advisors: Michael Grossberg, Irina Gladkova, Srikanth Gottipati.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
PCB 3043L - General Ecology Data Analysis.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Chapter 15 Multiple Regression Model Building
The simple linear regression model and parameter estimation
Bagging and Random Forests
Notes on Logistic Regression
PCB 3043L - General Ecology Data Analysis.
Statistics in MSmcDESPOT
Dimension Reduction via PCA (Principal Component Analysis)
Parametric calibration of speed–density relationships in mesoscopic traffic simulator with data mining Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2009/10/20.
Simple Linear Regression
Simple Linear Regression
Implementing AdaBoost
Stat 324 – Day 28 Model Validation (Ch. 11).
Linear Model Selection and regularization
Combined predictor Selection for Multiple Clinical Outcomes Using PHREG Grisell Diaz-Ramirez.
Model generalization Brief summary of methods
Multidimensional Space,
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

POSTER TEMPLATE BY: Cluster-Based Modeling: Exploring the Linear Regression Model Space Student: XiaYi(Sandy) Shen Advisor: Rebecca Nugent Carnegie Mellon University, Pittsburgh, Pennsylvania Introduction Simulation with 60 Data Points Boston Housing Data Issues with current model search criterions What is Linear Regression? Stepwise regression is greedy, does not necessarily search the entire model space Could have very complicated models that do not predict much better than simpler models Characterizing the models: Represent each model by its nx1vector of fitted values Models that predict similar values are close (in space) We look at the Linear Regression Model Space : 2 p-1 possible models, each with n fitted values 2 p-1 observations in n-dimensional space Our questions: Do models cluster? Are there distinct “groups” of models with similar predictability? Are there complicated models that could be replaced by simpler models? How is stepwise doing? Conclusion /Discussion How do we normally build/choose model? What does it look like graphically? Regression Model Estimated Regression Function i = 1,2,…,n observations j = 0,1,2,…,p-1; p = number of parameters; p-1 variables β 0 : E[Y i ] when all X i,j = 0 β j : Change in E[Yi] for one unit increase in X i,j (all other variables fixed) where found by method of least squares In Practice, we have: One variable that we are interested in predicting: Y Many possible predictor variables: X 1, X 2, X 3 …… To predict Y from p-1 possible X j variables We have 2 p-1 possible models Example: 2 variables: X 1, X 2 => 4 possible models : 1.Y = β 0 2.Y = β 0 + β 1 X 1 3.Y = β 0 + β 2 X 2 4.Y = β 0 + β 1 X 1 +β 2 X 2 Model Criterion: R 2, adjusted R 2, AIC, BIC, and Stepwise regression Stepwise regression: search in the “model space” for the “best subsets” Forward: adding in variables one at a time Backward: removing variables one at a time Both: alternates forward and backward steps } greedy algorithms Illustration of Idea We have two predictor variables X i1, X i2, i = 1,2,3 : Perfect model: Y i * = 3 + 2X i1 Real Y data: Y i = 3 + 2X i1 + rnorm(3,0,1) (recall 4 possible models from previous panel) The fitted value from each model and the original Y i * are plotted below: We have six predictor variables X i1, X i2, X i3, X i4, X i5, X i6, i = 1,2,…,60 Perfect model: Y i * = 2X i1 + 3X i2 Real Y data: Y i = 2X i1 + 3X i2 + rnorm(60,0,1) We have 2 6 = 64 possible models, model space is 64x60 dimensions Visualization of Model Space: We use a heat map of the kernel density estimate of the model space (red-low density, white/yellow-high density) Perfect model in green, stepwise chosen models in blue, model with the right variables in red Pairs plot : plots, impossible to show all in one graph, instead we show two selected pairs of dimensions representing two cross sections of the model space The blue and red models predict more similar values and are closer to the perfect fit (brown) in model space The blue and red models contain the correct predictor variable X 1 The black model does not contain any predictor variable and thus is the furthest from the perfect fit  Stepwise chose the model with variables X 1, X 2 and X 3  Two clusters of models, one group of models predicts similarly to the truth, the other group does not  The perfect model, the stepwise chosen model and the model with the right variables predict very similarly Note : Hard to look at higher dimensions, can only visualize 2-dimension at a time. Principal Components (PC) projection: lower dimension representation which contain information/structure from the high dimensions Hierarchical Clustering: Predicting the median value of owner-occupied homes in $1000 for 506 suburbs of Boston Selected predictor variables: crime rate, average # of rooms, distance to employment centers, proportion of blacks, accessibility to highways, and nitrogen oxides concentration Principal Component (PC) Projection: We randomly sampled 60 suburbs, since more models than observations are needed to run PC Truth model: Fitted model (red line):  Three clusters of models, one group of models predicts closely to the truth, the other two groups do not.  Stepwise behaves similarly in PC projection as in pairs plot Note : relying on projection, hence does not necessarily capture all the structure/information Hierarchical Clustering is done on the PC projections The stepwise chosen model is labeled in blue Each model is labeled by its number of variables  There are two large clusters of models; each could be split into two smaller clusters  The stepwise chosen model predicts similarly to models with more variables; there is one 3-variable model that could be a possible replacement  Models with fewer variables are in the same cluster with a few exceptions  The model with no variables is similar to a 1-variable model Stepwise regression models are in high frequency areas of the model space. In our simulations, it predicts similarly to the perfect model and the model with correct variables PC projection is more useful to visualize higher dimension Increasing the number of observations increases the dimensions; Increasing the number of variables drastically increases the number of models Future: Want to better characterize the clusters/model spaces