Extending The Horizon Of Agent Autonomy Is A Testing Problem

0 ▲

5 hours ago · Tech · 0 comments

I’ve talked before about how improbable long-horizon autonomous agentic workflows are. Every step is a throw of a weighted dice, and with each additional step the probability of success goes down. On top of this, decisions have dependencies, and this means that errors can compound downstream. Take a wrong turn at step N, and step N+1, N+2, N+3 could well build on that mistake. The other side of the equation is verification. Mistakes aren’t a problem if they’re caught before they compound. So we now have two components: the probability of an error, and the probable number of subsequent steps before the error’s detected. More bluntly, if the agent f***ed up, how soon would we/it know? The E in my CRESS principles for context engineering stands for “Empirical” – input contexts should be grounded in observed reality, not unverified model output. I visualise raw model output as being like untreated sewage. Yes, there’s water in it. But it’s not safe for the model to drink. To make it safe,…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.