Download presentation
Presentation is loading. Please wait.
Published byBeverly Wilcox Modified over 6 years ago
1
QM222 Class 11 Section D1 1. Review and Stata: Time series data, multi-category dummies, etc. (chapters 10,11) 2. Capturing nonlinear relationships (Chapter 12) Future topics before test: One variable with different slopes (for different groups, Chapter 13) Understanding more about the bias due to missing confounding factors (Chapter 14) QM222 Fall 2015 Section D1
2
Schedule Assignment 3 due today.
Assignment 4: Due date moved to Friday 6pm. I very much hope to quickly look at your Assignment 3 to see if you are on the right track. QM222 Fall 2015 Section D1
3
Some of you are still unclear on wording
An “observation” is what a row in your dataset represents. Your dependent variable is what is on the left hand side of the regression equation. Your explanatory (also called independent) variables are on the right hand side. If you can measure a possibly confounding variable, you want to include it among your explanatory variables. QM222 Fall 2015 Section D1
4
Time series and time Review QM222 Fall 2015 Section D1
5
(review) In time-series data, you need to have a variable for time
The variable for time has to increase by 1 each time period. If you have annual data, a variable Year does exactly this. If you have quarterly or monthly (or decade) data, you need to create a variable time. Sales = time Quarterly data The coefficient on time tells us that Sales increase by 27 each quarter. QM222 Fall 2015 Section D1
6
(review) Making a variable Time in Stata: background
Note: in Stata, _n means the observation number In Stata, to refer to the previous value of a variable i.e. in the previous observation, just use the notation: varname[_n-1] The square brackets tells Stata the observation number you are referring to. QM222 Fall 2015 Section D1
7
Making a variable for Time in time-series data in Stata (one observation per time period)
First make sure the data is in chronological order. For instance, if there is a variable “date” go: sort date Making a time variable (when the data is in chronological order) gen time=1 in 1 (“in #” tell State to do this only for observation #) replace time= time[_n-1]+1 OR just: gen time= _n QM222 Fall 2015 Section D1
8
Quarterly or monthly data
With quarterly or monthly data, you should also include indicator variables for seasonality. For quarter data, make 3 indicator variables. The fourth is the reference (base) category. Example: Sales = time - 4 Q Q Q3 Here, the coefficient on time tells us that Sales increase by 27 each quarter, holding season constant. Q4 is the reference category. Sales in Q2 on average are 10 more than Sales in Q4. Sales in Q1 on average are 4 less than Sales in Q4. QM222 Fall 2015 Section D1
9
(review) Running a Stata regression using a categorical explanatory variables with many categories
You can make a single indicator variable in Stata easily, e.g. gen female = 0 replace female = 1 if gender==2 OR in a single line: gen female= gender==2 QM222 Fall 2015 Section D1
10
(review) Running a Stata regression using a categorical explanatory variables with many categories
In Stata statistics, you don’t need to make indicator variables separately for a variable with more than 2 categories. Assuming that you have a string (or numeric) categorical variable season that could take on the values Winter, Fall, Spring and Summer, type: regress sales price i.season This will run a multiple regression of sales on price and on 3 seasonal indicator variables. Stata chooses the reference category (it chooses the category it encounters first, although there is a way for you to set a different reference category if you want). Stata will name the indicator variables by the string or number of each value they take. QM222 Fall 2015 Section D1
11
Let’s do this! Use hobbit data set(on our website, Other Materials, Data and other Materials) Make time variable. Make a weekend indicator variable. Regress Gross on time and weekend indicator. Interpret each coefficient. Regress Gross on time and day of week (Day) using i. QM222 Fall 2016 Section D1
12
Estimating nonlinear relationships
Could the relationship be non-linear, and if so, how can we estimate this using linear regression? QM222 Fall 2015 Section D1
13
Non-linear relationships between Y and X
Sometimes, the relationship between the Y variable and the X variable is unlikely to be linear. This may lead you to measure a very low insignificant slope. e.g. If you ran a regression of this graph, its coefficient would be zero. QM222 Fall 2015 Section D1
14
Many of you believe that you might have nonlinear relationships
e.g. Maybe job satisfaction goes up with age and then down again. e.g. You do not believe that an increase $1 in price will have the same effect going from $10 to $11 as going from $100 to $101. Note that this section is only applicable for numerical variables. You cannot do these nonlinear things with indicator variables. QM222 Fall 2015 Section D1
15
To solve the problem of Y possibly increasing with X and then decreasing:
You simply add to the regression a new X variable that is a non-linear versions of old variable. My suggestion: estimate a quadratic by making a new variable X2 and run the regression with both the linear and non-linear (quadratic) term in the equation. If you don’t know if a relationship is nonlinear, you can estimate the regression assuming it is nonlinear (e.g. quadratic) and then examine the results to see if this assumption is correct. QM222 Fall 2015 Section D1
16
Quadratic: Y = b0 + b1 X + b2 X2 In high school you learned that quadratic equations look like this. So by adding a squared term, you can estimate these shapes. QM222 Fall 2015 Section D1
17
However, a regression with a quadratic can estimate ANY part of these shapes
So, using a quadratic does not mean that the curve need actually ever change from a positive to a negative slope or vice versa … QM222 Fall 2015 Section D1
18
How do you know whether the relationship really is nonlinear?
Put in a nonlinear term (e.g. a squared term) and let the |t-stats|’s in the equation tell you if it belongs in there. If the |t-stat|>2, you are more than 95% confident that the relationship is nonlinear. Even if |t-stat| < 2, it’s a good idea to keep in the quadratic term as long as you are relatively confident it belongs in. I tend to leave it in if it has a | t-stat | >1, which means that I am at least 68% confident the relationship is nonlinear. Example: I know annual visitors to the park. I want to know if they are growing (or falling) at a constant rate over time, or not. First I make the variables: gen time= _n gen timesq = time^2 QM222 Fall 2015 Section D1
19
Here are regressions on time, then on time AND timesq
Here are regressions on time, then on time AND timesq. Is the relationship nonlinear? Are visitors growing/shrinking, and at a constant rate? . regress annualvisitors time Source | SS df MS Number of obs = F( 1, 21) = Model | e e Prob > F = Residual | e e R-squared = Adj R-squared = Total | e e Root MSE = 2.5e+05 annualvisi~s | Coef. Std. Err t P>|t| [95% Conf. Interval] time | _cons | . regress annualvisitors time timesq F( 2, 20) = Model | e e Prob > F = Residual | e e R-squared = Adj R-squared = Total | e e Root MSE = 1.3e+05 time | timesq | _cons | QM222 Fall 2015 Section D1
20
Sketching the Quadratic Visitors = 1102401 + 118498 time - 5374 time2
The linear term in positive, so at a small X eg. X=0.1 the slope is positive. The squared is negative so the slope eventually becomes negatively sloped. So the general shape is as below. But which part of the curve is it? For those who don’t think in derivatives, plug in high, medium and low values for X in the original equation. In this data, time goes from 1 to 23 so: At time=1, Visitors = (1) (1) = 1,215,525 At time=10, Visitors = (10) (102) =1,749,981 At time=23, Visitors = (23) (232) =985,009 So over these 23 years, predicted visitors go up, then back down again. QM222 Fall 2015 Section D1
21
Sketching the Quadratic using calculus Visitors = 1102401 + 118498 time - 5374 time2
Calculus tells us the slope: dVisitors/dtime = – 2*5374 time The slope gets smaller as time increases. At the top of this cure, the slope is exactly zero. So solve 0 = – 2*5374 time time = 11.03 QM222 Fall 2015 Section D1
22
What about this issue: You believe that a 1% increase in X will have the same % effect on Y no matter what price you start at. [NOT ON TEST] e.g. You believe a 1 percent increase in price has a constant percentage effect on sales. Mathematical rule: If lnY = b0+ b1 lnX, b1 represents the %∆Y/ %∆X Or, the percentage change in Y when X changes by 1% (ln is natural log, the coefficient of “e”. Log means to the base 10. Either works.) So just make two new variables: lnY and lnX and run a regression: regress lnY lnX The coefficient will be: the percentage change in Y when X changes by 1% QM222 Fall 2015 Section D1
23
A case when logs might be useful?
If you have skewed data (like lifetime gross in movies), you could just regress ln(Lifetime gross) = b0 + b1 ln(metascore) QM222 Fall 2015 Section D1
24
We should talk more if you want to use logs
QM222 Fall 2015 Section D1
25
Back to the hobbit data set
Make a variable for timesquared Run a regression of gross on time, timesquared, and the better of the other two (weekend indicator, or day of week indicator variables) Is the relationship between gross and time nonlinear? What does it look like? QM222 Fall 2015 Section D1
26
Dealing with skewed data
QM222 Fall 2015 Section D1
27
There are 3 ways you might deal with skewed data
1. Use logs for the skewed variable (if you believe the right relationship is with the percentage change). 2. If the skewed variable is the dependent variable, predict the median rather than the mean by going: qreg Yvariable Xvariable 3. You can topcode the variable (whether it is a dependent or explanatory variable) , for instance: replace LifetimeGross = if LifetimeGross> QM222 Fall 2015 Section D1
28
More practice using Stata
What would you like me to demonstrate? Otherwise: Help each other. Where are you stuck? What don’t you know how to do? What can you teach the others? QM222 Fall 2015 Section D1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.