Writing an LLM from scratch, part 34a -- building a JAX training loop for an LLM training run

0 ▲

1 hour ago · 46 min read9250 words · Tech · hide · 0 comments

For over a year, I've been using Sebastian Raschka's book "Build a Large Language Model (from Scratch)" -- and the multitude of side-projects that have branched out from reading it -- as something like a curriculum for learning about modern AI. The one final task I had set myself was to build and train an LLM from scratch just using my notes -- no reference to the book, no reference to the model code I'd written following the book. As an output, I wanted something as good as my best PyTorch model based on Raschka's code -- a base model, trained on 3.2B tokens, that my (admittedly limited) evals ranked as being close to the original GPT-2 small's quality. I wanted to use a different framework, just to make sure I wasn't parroting code that I'd somehow memorised, so I asked people on Twitter which one I should use, and the winner was JAX. I took a slightly different route to Raschka's book; he takes an inside-out perspective, explaining things like attention, gradually building up a…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.