In 2019, Andrew blogged about collinearity in Bayesian models. In the comments, he pointed to an example from Bayesian Data Analysis, 2nd edition (BDA2). I think it is a useful example to keep in mind when extrapolating from sample to population. Since folks (like me) may only have BDA3 on their shelf, I thought I’d talk thru it. Pretend it is 1980 and we are at the US Census Bureau. We just revamped the occupational coding system, and it’s so much better ! We want 1980-style codes on all our old data that only had 1970-style codes. Let’s trade in our peasant blouses for some shoulder pads. Say we have double-coded training data (n = 10,000) with: O_1980 = occupation coded in the 1980 coding system O_1970 = occupation coded in the 1970 coding system E = education, either high or low I = income, either high or low We want to impute O_1980 for the single-coded full dataset (N = 1,000,000) with only O_1970, E, and I. Consider everyone with the a specific occupation according to the 1970…
No comments yet. Log in to reply on the Fediverse. Comments will appear here.