1 hour ago · Writing · 0 comments

Seeing this piece about the NY Times prohibiting use of its content to train AI large language models, reminds me to share that OpenAI (the company behind ChatGPT etc.) recently documented a method for excluding the content of sites you own / control from their crawlers and scrapers. You could argue that this is something they should have done prior to scraping most of the open web to feed their LLM, but there’s no reason not to implement it now. Simply add the following to your site’s robots.txt file: User-agent: ChatGPT-User Disallow: / User-agent: GPTBot Disallow: / Reason 151 to publish to a site that you control. Reply by email

No comments yet. Log in to reply on the Fediverse. Comments will appear here.