1 day ago · 0 comments

This is a great example of preference tuning away from an accurate world model (in this case, tuning by the Kenyan reinforcement trainers on LLMs to their idea of perfect written English).

No comments yet. Log in to reply on the Fediverse. Comments will appear here.