Fixing a stuck Ollama runner and building a GPU watchdog

0 ▲

blog

2 hours ago · Tech · 0 comments

If you self-host LLMs, at some point, you’ll likely experience a stuck GPU consuming significant electricity for long periods of time. This is my summary of discovering the problem & then implementing automation that corrects the issue. I want automation to discover & correct these issues. I don’t want to be the first line of defense against unnecessary electricity consumption. My Self Hosted Setup I run a large language model on hardware at home — an Ollama server on a mini PC (AMD Ryzen AI MAX+ 395 with an integrated GPU). Self-hosting gives me privacy, no subscription, and a model that works without an internet connection. The tradeoff is that I’m my own ops team. When something goes wrong, there’s no provider watching for outages. This weekend, something went wrong. I noticed the cooling fan on my AI server running at full speed without letting up. I hadn’t been using it- and my family’s use of it is pretty limited. There likely were no active requests for the system. Either the…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.