Finally, in this post, we put our money where our mouth is. We dragged a 2016 enterprise relic out of the closet — NAY, out of the grave, a single Intel Xeon running on agonizingly slow DDR3 RAM with absolutely no GPU to speak of — and forced it to run a cutting-edge, 26-billion-parameter Mixture-of-Experts architecture at reading speed. We did without throwing exotic hardware at the problem. Instead we treated the deployment pipeline as a serious thing, and mapped the architecture directly to physical hardware, tuning memory allocation, and unlocking the absolute limits of CPU cache optimization.A 10 year old Xeon is all you need - point.freeOr running Gemma 4 on a 2016 Xeon with no GPU, 25 flags, 128 GB of DDR3, and a 25B-parameter MoE.point.freeVery cool post. I'm may go necro up some old servers and mess with brining them back up to speed now.I wonder if, by pooling resources, something like this could start within the old IT-guard communities, spread, and begin to tip the scales…
No comments yet. Log in to reply on the Fediverse. Comments will appear here.