Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation

0 ▲

3 hours ago · 0 comments

I've been working on a GPT-2-small-style LLM based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)". I've trained various versions of it in the cloud to work out which interventions to the model and training code had the best effects on the loss it gets on a specific test dataset, and now I wanted to do a training run locally to match the best of those. For that, I wanted to match the batch size I was using for the cloud training runs. When I first started learning t...

No comments yet. Log in to reply on the Fediverse. Comments will appear here.