1 hour ago · Tech · 0 comments

Saying that LLMs are just next token predictors is underselling these beasts to a mind numbing degree.First, LLMs aren’t just predicting the next token. They plan ahead – the loss function is the average of cross entropy across all future tokens in a context window and the attention has access to all previous tokens. So, at a particular token, the LLM is planning what could be relevant far ahead and not just at the immediate next token. Second, LLMs are trained to predict sequences across all texts on the internet that contain not just human generated text but things like weather forecasts, financial series, code, bash dumps, satellite pings and so on. To be able to do this prediction well, LLMs have to infer the physics / dynamics for all such domains (e.g. to predict weather patterns in data, you need to develop a model of earth coordinates, sunlight patterns, monsoon cycles and so on). (Just think how hard this prediction problem is, given the diversity of texts in the pretraining…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.