After I published Five Years of Trying to Add Recursion to lychee, one reply I got was a very fair question: If recursion is so hard, how do other link checkers do it? Plenty of them already crawl websites! This sent me down a rabbit hole of reading the code of other link checkers. The key takeaway is: they didn’t find a clever trick we missed. They were built as crawlers from the very first commit, and I initially built lychee as a stream. I went and read the source of the recursive checkers we list in lychee’s README: muffet (Go), LinkChecker (Python), linkinator (TypeScript), and broken-link-checker (JavaScript). This post is a teardown of how each one actually handles recursion, what it costs them, and what it means for lychee. If you haven’t read the first post, the summary is that lychee was architected as a one-shot, unidirectional pipeline (inputs → extract → check → output). Recursion needs a cycle (responses create new inputs), and cycles in an async, channel-based pipeline…
No comments yet. Log in to reply on the Fediverse. Comments will appear here.