Interpret Interaction Effect Continuous and Dummy Variable
Marinela, the source is Joro Kolev, who says that the following two are algebraic regression facts:
1.) Fact one, if every industry is a singleton, nobody can estimate anything on top of the constant. Here, I tag only one observation per group defined by the variable rep:
Code:
. sysuse auto, clear (1978 Automobile Data) . keep if !missing(rep) (5 observations deleted) . egen tag = tag(rep) . reg price mpg i.rep if tag, absorb(rep) note: mpg omitted because of collinearity note: 2.rep78 omitted because of collinearity note: 3.rep78 omitted because of collinearity note: 4.rep78 omitted because of collinearity note: 5.rep78 omitted because of collinearity Linear regression, absorbing indicators Number of obs = 5 F(0, 0) = 0.00 Prob > F = . R-squared = 1.0000 Adj R-squared = . Root MSE = 0 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | 0 (omitted) | rep78 | 2 | 0 (omitted) 3 | 0 (omitted) 4 | 0 (omitted) 5 | 0 (omitted) | _cons | 6921 . . . . . ------------------------------------------------------------------------------ .
as you see I cannot estimate anything but a constant.
Now I am going to keep the rep==1 and rep==2 as singletons, but the rest of the groups defined by rep>2 I let them be whatever they are (not singletons):
Code:
. replace tag = tag + 1 if rep>2 (59 real changes made) . reg price mpg i.rep if tag, absorb(rep) note: 2.rep78 omitted because of collinearity note: 3.rep78 omitted because of collinearity note: 4.rep78 omitted because of collinearity note: 5.rep78 omitted because of collinearity Linear regression, absorbing indicators Number of obs = 61 F(1, 55) = 17.25 Prob > F = 0.0001 R-squared = 0.3416 Adj R-squared = 0.2817 Root MSE = 2573.4 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -261.2437 62.89928 -4.15 0.000 -387.2967 -135.1908 | rep78 | 2 | 0 (omitted) 3 | 0 (omitted) 4 | 0 (omitted) 5 | 0 (omitted) | _cons | 11945.14 1392.397 8.58 0.000 9154.718 14735.57 ------------------------------------------------------------------------------
Now I managed to estimate the slope on mpg. However,
2.) Fact two, the slope I estimated on mpg is not determined by the two singleton groups, in fact the regression above simply disregarded/threw out the singleton groups.
I estimate below the regression only for rep>2, that is I throw out manually the singleton groups, and the slope on mpg is still the same.
Code:
. reg price mpg i.rep if rep>2, absorb(rep) note: 4.rep78 omitted because of collinearity note: 5.rep78 omitted because of collinearity Linear regression, absorbing indicators Number of obs = 59 F(1, 55) = 17.25 Prob > F = 0.0001 R-squared = 0.2431 Adj R-squared = 0.2018 Root MSE = 2573.4 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -261.2437 62.89928 -4.15 0.000 -387.2967 -135.1908 | rep78 | 4 | 0 (omitted) 5 | 0 (omitted) | _cons | 11864.94 1398.91 8.48 0.000 9061.463 14668.42 ------------------------------------------------------------------------------
Note that the observations in the previous two regressions are different, 61 vs 59. And yet the slope on mpg is the same.
The same will happen in your regression if you include dummies at the 2 digit industry level. The singleton industries will not contribute to the estimation of your slopes.
Finally, it is up to you what you do. If you think that it is crucial to include dummies at the 2 digit industry level, you do that, and you live with the fact that the singleton industries were "silenced", and not allowed to say anything about what your slope parameters are.
On the other hand if you include dummies at the 1 digit industry level, you control for industry at more coarse level, but you are allowing every industry to speak regarding what your slope estimates are.
Originally posted by Marinela Veleva View Post
brutonbeirsinglat.blogspot.com
Source: https://www.statalist.org/forums/forum/general-stata-discussion/general/1599790-how-to-interpret-interaction-terms-with-continuous-variables