LECTURE 16: BEYOND LINEARITY PT. 1 March 28, 2016 SDS 293 Machine Learning.

Slides:



Advertisements
Similar presentations
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Advertisements

Chapter 15 Above: GPS time series from southern California after removing several curve fits to the data.
Computer vision: models, learning and inference Chapter 8 Regression.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Ch11 Curve Fitting Dr. Deshi Ye
TigerStat ECOTS Understanding the population of rare and endangered Amur tigers in Siberia. [Gerow et al. (2006)] Estimating the Age distribution.
A Short Introduction to Curve Fitting and Regression by Brad Morantz
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
Curve-Fitting Regression
The Basics of Regression continued
Nonlinear Regression Probability and Statistics Boris Gervits.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Today’s quiz on 8.2 A Graphing Worksheet 1 will be given at the end of class. You will have 12 minutes to complete this quiz, which will consist of one.
Regression in EXCEL r2 SSE b0 b1 SST.
Spreadsheet Problem Solving
Introduction to Linear Regression.  You have seen how to find the equation of a line that connects two points.
Solving Equations Containing Rational Expressions
Simple Linear Regression Analysis
Classification and Prediction: Regression Analysis
Calibration & Curve Fitting
Coming up in Math 110: Today: Section 8.2 (Quadratic formula)
Objectives of Multiple Regression
Nonlinear Regression Functions
Chapter 13: Inference in Regression
Simple Linear Regression
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
1 Psych 5510/6510 Chapter 10. Interactions and Polynomial Regression: Models with Products of Continuous Predictors Spring, 2009.
Curve-Fitting Regression
Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Education 793 Class Notes Multiple Regression 19 November 2003.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
LECTURE 09: INTERACTION PT. 2: COST October 19, 2015 SDS235: Visual Analytics Note: slide deck adapted from R. Chang.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
LECTURE 06: CLASSIFICATION PT. 2 February 10, 2016 SDS 293 Machine Learning.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
LECTURE 03: LINEAR REGRESSION PT. 1 February 1, 2016 SDS 293 Machine Learning.
LECTURE 04: LINEAR REGRESSION PT. 2 February 3, 2016 SDS 293 Machine Learning.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
LECTURE 17: BEYOND LINEARITY PT. 2 March 30, 2016 SDS 293 Machine Learning.
LECTURE 20: SUPPORT VECTOR MACHINES PT. 1 April 11, 2016 SDS 293 Machine Learning.
LECTURE 11: LINEAR MODEL SELECTION PT. 1 March SDS 293 Machine Learning.
LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.
LECTURE 21: SUPPORT VECTOR MACHINES PT. 2 April 13, 2016 SDS 293 Machine Learning.
FINAL PROJECT 01: SKETCHING AND EARLY PROTOTYPING March 30, 2015 SDS136: Communicating with Data.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.
Statistical Forecasting
Statistical Data Analysis - Lecture /04/03
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
QM222 A1 Nov. 27 More tips on writing your projects
CHAPTER 29: Multiple Regression*
Introduction to Predictive Modeling
Simple Linear Regression
CHAPTER 12 More About Regression
Basis Expansions and Generalized Additive Models (1)
Regression Forecasting and Model Building
Presentation transcript:

LECTURE 16: BEYOND LINEARITY PT. 1 March 28, 2016 SDS 293 Machine Learning

Announcements Assignments - Feedback for A4 should be in your inbox - A5 solution posted; feedback will be out tomorrow - A6 due Wednesday 11:59pm T-minus 5 weeks until the end of the semester!

Final project Goal: apply the ML techniques we’ve learned to solve a real-world problem you care about Teams of 2-3 people (recommended) or on your own Example problems

dataset challenge

dataset challenge: example ?s Cultural Trends: What makes a particular city different? For example, in which countries are Yelpers sticklers for service quality? Location Mining: How much of a business' success is really just location, location, location? Do reviewers' behaviors change when they travel? Seasonal Trends: What about seasonal effects: are there more reviews for sports bars on major game days and if so, could you predict that? Infer Categories: Do you see any non-intuitive correlations between business categories e.g., how many karaoke bars also offer Korean food, and vice versa? Natural Language Processing (NLP): How well can you guess a review's rating from its text alone? Change Points and Events: Can you detect when things change suddenly (i.e. a business coming under new management)? Social Graph Mining: Can you figure out who the trend-setters are? For example, who found the best waffle joint before waffles were cool?

Final project T-minus 5 weeks until the end of the semester! Time to start thinking about final projects: - Can work in teams of 2-3 people (recommended) or on your own - Goal: apply the ML techniques we’ve learned to solve a real-world problem you care about Example Activity: topic generation

Step 1: Write a quick description of a data set you think would be interesting to explore at the top of the page, and write your 99 number at the bottom Activity: real world problems

Step 2: Pass your description clockwise to the next person Activity: real world problems

Step 3: Read the description of the dataset, and underneath the description, write a question you think someone might want to answer using it Activity: real world problems

Step 4: Fold over the top of the paper (leaving just your question visible), and pass it clockwise. Now repeat! Activity: real world problems

For next class: pick a topic Before class on Wednesday, write a quick Piazza post about a potential final project topic Please include: - A description of the domain - The problem(s) you're trying to solve / question(s) you're trying to answer - The audience - The data you’ll be using (if you know) Not 100% sure? Try a couple and get some feedback (you’re free to change your mind later) See a topic you like? Reply to the post and form a team!

Outline Final project overview / activity Moving beyond linearity - Polynomial regression - Step functions - Splines - Local regression - Generalized additive models (GAMs) Lab

So far: linear models The good: - Easy to describe & implement - Straightforward interpretation & inference The bad: - Linearity assumption is (almost) always an approximation - Sometimes it’s a pretty poor one RR, the lasso, PCA, etc. all try to improve on least squares by controlling the variance of a linear model … but linear models can only stretch so far

Flashback: Auto dataset

Polynomial regression One simple fix is to use polynomial transformations: This example is a quadratic regression Big idea: extend the linear model by adding extra predictors that are powers of the original predictors Note: this is still a linear model! (and so we can find its coefficients using regular ol’ least squares)

Polynomial regression in practice For large enough degree d, a polynomial regression allows us to produce an extremely non-linear curve As d increases, this can produce some really weird shapes Question: what’s happening in terms of bias vs. variance? Answer: increased flexibility  less bias, more variance; in practice, we generally only go to degree 3 or 4 unless we have additional knowledge that more will help

Example: Wage dataset

Degree 4 polynomial 95% confidence interval (i.e. 2x std. error)

Example: Wage dataset 79 “high earners”

Example: Wage dataset What’s going on here?

Example: Wage dataset Relatively sparse = less confident

Global structure in polynomial regression Polynomial regression gives us added flexibility, but imposes global structure on the non-linear function of X Question: what’s the problem with this? Answer: when our data has different behavior in different areas, we wind up with a messy, complicated function trying to describe both parts at once

Step functions Big idea: if our data exhibits different behavior in different parts, we can fit a separate “mini-model” on each piece and then glue them together to describe the whole Process: 1. Create k cutpoints c 1, c 2,..., c K in the range of X 2. Construct (k+1) dummy variables: 3. Fit least squares model using C 1 (X), …,C K (X) as predictors (we can exclude C 0 (X) because it is redundant with the intercept)

Example: Wage dataset

Granularity in step functions Step functions gives us added flexibility by letting us model different parts of X independently Question: what’s the problem with this? Answer: if our data doesn’t have natural breaks, choosing the wrong step size might mean that we “miss the action”