Reddit recently adjusted its robots.txt file in an effort to limit or charge AI companies for scraping content on its platform. The move sparked discussions between AI companies and content owners about data usage rights and business models. Reddit's move is not an isolated case and reflects the concerns of more and more websites about large-scale data collection by AI and the need to protect intellectual property rights. This article will explain in detail Reddit’s strategy and the reasons behind it.
Reddit is taking action to stop AI companies from crawling its content, or at least require them to pay.
Earlier this week, Reddit announced it was changing its robot exclusion protocol, also known as its robots.txt file. This seemingly boring edit is part of a larger negotiation/battle between content owners that AI companies are eager to use to train their language models.

"Robots.txt" is a way for websites to communicate to third parties how the site was crawled, the classic example being sites that allow Google to crawl them for inclusion in search results.
In the case of artificial intelligence, the value exchange is less obvious. When the business model of running a website involves attracting clicks and eyeballs, having an AI company suck your content and send no traffic (and in some cases, they'll outright plagiarize your work) isn't attractive.
So by changing its robots.txt file and continuing to limit and block unknown bots and crawlers with ratings, Reddit appears to be working to prevent companies like Perplexity AI from being criticized for their practices.
Highlight:
- Reddit is taking action to stop AI companies from crawling its content, or at least require them to pay.
- Robots.txt is a way for websites to communicate to third parties how the site has been crawled, the classic example being sites that allow Google to crawl them for inclusion in search results.
- Reddit changed its robots.txt file and continued rating restrictions and blocking of unknown bots and crawlers to prevent companies like Perplexity AI from being criticized for the practice.
This move by Reddit indicates that there will be more games over data usage rights between content platforms and AI companies in the future, and also poses new challenges on how to balance the development of AI technology and the protection of intellectual property rights. This will prompt AI companies to explore more sustainable ways to obtain data, and promote content platforms and AI companies to establish a more fair and reasonable cooperation model.