Bluesky went down for half its users for about 8 hours on Monday. Jim Calabro wrote up an excellent post-mortem that is worth reading in full, but the root cause is something I keep seeing across languages and runtimes, and it's worth talking about on its own. One endpoint in their data plane, GetPostRecord, fanned out concurrent work based on the size of the incoming request. Every other endpoint in the system used bounded concurrency via errgroup.SetLimit. This one did not. A new internal c...
No comments yet. Log in to reply on the Fediverse. Comments will appear here.