In the early days of LLMs, the scaling hypothesis was not initially intuitive. Scaling, in very simple AI terms, is the idea that throwing more data and compute into a model will make it better. Surely, super-intelligence would be achieved through complex architecture, not just... more? However, the huge jump from GPT-2 to GPT-3 was not due to a fundamentally different design; the model had simply seen more data. It developed a greater understanding, not mainly through more complex coding, but through the sheer amount of data it had been exposed to. Past a certain threshold, the model can understand things it was never explicitly taught. I learned this in my Digital Transformation class this semester, and it had me thinking about what that means for humans. After all, artificial neural networks were themselves inspired by the structure of the human brain. So surely the brain functions in the same way when exposed to enough of the world? Read enough history books, and you seem to be…
No comments yet. Log in to reply on the Fediverse. Comments will appear here.