2 hours ago · Science · 0 comments

Foreword: This blog does its best not to, but still assumes you have looked over the Idiom paper. If you haven’t, I suggest doing so, then coming back!)We set out to audit the reinforcement learning results for IDiom, a recent protein language model for designing intrinsically disordered regions from the Rotskoff and Dunn labs at Stanford. Originally, we wanted to explore whether there was any reward hacking, owing to the susceptibility of RL-optimized generative models to exploit shallow features in their reward signals. Instead, we found something more interesting: the biological distinctions that emerge after RL are neither reproduced by simple base-model filtering nor fully resolved by the reward model itself.The SetupIDiom is a 122M parameter autoregressive model trained on 37 million intrinsically disordered region sequences from the AlphaFold Database. After pretraining, Liu et al. post-train the model with GRPO on ProtGPS — a localization predictor built on ESM2 embeddings…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.