Writing an LLM from scratch, part 33 -- what I learned from finally getting round to the appendices

0 ▲

2 hours ago · Tech · 0 comments

After finishing the main body of "Build a Large Language Model (from Scratch)", I set myself three follow-on goals. The first was training a full GPT-2-small-style base model myself. That was reasonably easy to do but unlocked a bunch of irresistible side quests; having finally got to the end of those, it's time to move on to the others: reading through the book's appendices, and building my own GPT-2 style model in JAX. This post is about the appendices. The TL;DR: there was stuff in there that could have saved me time in my side-questing, but I think that having to work those things out from scratch probably helped me learn them better. Appendix A: Introduction to PyTorch This is an excellent overview of PyTorch, and given that I'm writing for people who are reading the book too, all I can really say is that it's well worth reading, even if you have some experience in it. He gives an intro to what it is, some details on how to choose to use GPUs (or Apple Silicon) if you have them,…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.