3 hours ago · Tech · 0 comments

Bot traffic on the internet was always comparatively high, with search engine scrapers and whatever odd research tools doing their thing, but that changed massively in recent few years with LLM-training-content-scaping bots. Those don't care about robots.txt and are designed to be unblockable, running on regular users' machines, faking browser user-agents and doing their thing from a giant massively-distributed IP pools by now (used to have somewhat blockable net ranges, but not anymore). This isn't a big deal for this blog for example, as there're only a few static pages, which these bots grab and go away reasonably quickly, but for local git repos this isn't the case - list of commits and various links there is effectively infinite, and these idiot bots want EVERY-THING! So what ends up happening is a never-ending stream of requests like this: 14.243.82.173 - - [15/May/2026:00:50:01 +0500] "GET /code/git/.../pa-mixer.example.cfg?id=9537b82b... HTTP/1.1" 200 3516 "-" "Mozilla/5.0…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.