A while ago I migrated burningboard.net to a multi-jail FreeBSD setup: nginx, Puma, Sidekiq, and the database each in their own jail, with the host doing all the PF and routing. That post ended on the architecture. What it did not cover is the question that matters the morning after you put real users on a thing: is it actually healthy right now, and if it is not, will I find out before they do? This is the observability half of that story. I run a completely separate machine whose only job is to watch the Mastodon host, plus a handful of other boxes in my network. It runs Prometheus for metrics, Loki for logs, Grafana to draw both, and Alertmanager to wake me up. None of it lives on the Mastodon host itself, because a monitoring system that shares a fate with the thing it watches is not a monitoring system, it is a single point of failure with extra steps. I covered the in-the-moment FreeBSD troubleshooting toolkit (top, gstat, netstat, rctl) in an earlier article. This one is the…
No comments yet. Log in to reply on the Fediverse. Comments will appear here.