Letting AI agents drive a GUI app with -dbg-control

0 ▲

10 hours ago · 8 min read1576 words · Tech · 0 comments

SumatraPDF is a Windows GUI application for viewing PDF, ePub and comic books written in C++. Lately I do a lot of my SumatraPDF coding with AI agents: Claude Code, Grok Build, OpenAI Codex. They’re good at writing code. They’re less good at knowing if the code works, especially for GUI apps. The problem: agents don’t drive UI well Say I ask an agent to fix a bug in PDF text search, or in the new feature that translates selected text via an LLM. How does the agent verify the fix? Surprisingly, they can drive GUI app by injecting mouse clicks and keyboard input, taking screenshots. It’s slow and flaky. On my machine injected mouse clicks would sometimes get dropped. Coordinates change when the layout changes. Screenshots need a vision model to interpret. I wanted something an agent could drive deterministically: send a request, get a result back, assert on it. Like calling a function, except the function lives inside a running GUI app. The solution: a control channel over a named pipe…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.