RL economics, morally charged terms, and "distillation"

0 ▲

2 hours ago · Tech · 0 comments

After a number of Twitter discussions, and repeating myself a lot in these discussions, it is time to write a short note on the economics of advancing LLM capabilities through RL, about principles of propaganda and coining new words, and about my stubborn refusal to use the term "distillation" except in a specific narrow sense.How do models advance when human-curated data has run out?It's been a while since we ran out of human data to train LLMs on. We are training on copies of the internet, large piles of (originally pirated, then purchased-and-scanned-and-wholesale ingested) books, and whatever other data sources we can obtain. This leads to a certain performance plateau, as we haven't quite figured out how to make the models more data-efficient in training.The advancements we have seen in coding and mathematics in the last year are mostly due to reinforcement learning. At the highest level, you pose a problem to an LLM that the LLM has a small but nontrivial chance of solving. You…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.