Writing an LLM from scratch, part 32h -- Interventions: full fat float32

0 ▲

5 points · 1 hour ago · 0 comments

This is the last of the interventions I'm trying out to see if I can improve the test loss for a from-scratch GPT-2 small base model, trained on code based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)". Back when I did my first training run for a base model, on my local RTX 3090, I used two optimisations: Setting the 32-bit floating point matrix multiplication precision to "high" rather than to "highest", which means that it uses lower-precision (but still techn...

No comments yet. Log in to reply on the Fediverse. Comments will appear here.