2 hours ago · Tech · 0 comments

I host most of my websites on a DreamHost VPS. This morning I discovered that a new file had been added, agents.txt, to the root of each site, on May 7. It was easy to confirm that this is a new default file similar to the default robots.txt and favicon.ico DreamHost puts in every new site to get you started. Apparently they retroactively added it to sites that don’t already have one. So it’s a host action, not a hack. That’s good at least. The contents are simple, and sensible for a new website: Disallow LLM training and actions, allow on-the-fly “AI”-generated summaries, disallow access to some common folders that shouldn’t be used for any of the above. Though I am annoyed that they added it retroactively, particularly since it includes what looks like an explicit opt-in to retrieval-augmented generation, even if it’s something that’s happening already and less of a problem than a model vacuuming up your entire website for regurgitation. (Guess who’s already in Common Crawl!) # Data…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.