1 hour ago · Tech · 0 comments

I’ve often seen confusion between interactions in a regression model and correlations among the predictors. To keep it simple, consider the model y = b0 + b1*x1 + b2*x2 + b3*x1*x2 + error, and assume the predictors have been signed so that both b1 and b2 are positive. Then b3 represents the interaction. This has nothing to do with the joint distribution of x1 and x2 in the data, or in the population. (For simplicity, assume the data to which the model are being fit is a random sample from the population of interest.) The interaction depends on the model of y given x1 and x2, while the correlation depends on the model for x1 and x2. These are two completely different parts of the model. And yet, they often seem connected. I have the general impression that I’d be more likely to expect a positive interaction of x1 and x2 when predicting y, if x1 and x2 are positively correlated in the population. For example, when predicting income from height and sex, being taller and being male both…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.