Put away your Betteridge’s Law of Headlines, I genuinely don’t know the answer :) In “Training large language models on narrow tasks can lead to broad misalignment”, Betley et al found that fine-tuning LLM’s with e.g. insecure coding practices can affect the outcome of non-coding related answers in a bad way.
No comments yet. Log in to discuss on the Fediverse