Amazon Web Services (AWS) is investigating artificial intelligence search startup Perplexity AI for allegedly violating AWS terms of service by scraping website content in an attempt to prevent it from doing so. Perplexity AI is valued at $3 billion and is backed by the Jeff Bezos Family Foundation and Nvidia. The investigation has triggered widespread concern about the way AI companies obtain data, involving disputes over Robots exclusion agreements, copyright issues and the role of third-party service providers. The subsequent development of the incident will have a profound impact on the data acquisition regulations and ethics of the artificial intelligence industry.
Amazon Web Services (AWS) is investigating artificial intelligence search startup Perplexity AI for allegedly violating AWS terms of service by scraping content from websites that tried to prevent it from doing so, Wired reports.
Perplexity AI, a startup backed by Jeff Bezos' family foundation and Nvidia, was recently valued at $3 billion. Wired found that the company appears to rely on scraping content from websites that are blocked through the Robots exclusion agreement. The Robots Exclusion Protocol is a web standard that indicates which pages should not be accessed by automated robots and crawlers. While the agreement is not legally binding, most companies have traditionally abided by it.

An AWS spokesperson said the company's terms of service prohibit customers from using its services to conduct any illegal activities, and customers are responsible for complying with the terms and all applicable laws. AWS customers must adhere to the robots.txt standard when crawling websites.
The investigation found that Perplexity had access to a server using an undisclosed IP address that accessed Condé Nast-owned properties at least hundreds of times over the past three months, apparently to scrape prohibited content. Spokespersons for The Guardian, Forbes and The New York Times also said similar cases had been detected.
Perplexity CEO Aravind Srinivas said the scraping discovered was carried out by a third-party company that provides web scraping and indexing services, but declined to name the company. Perplexity spokesperson Sara Platnick said the company responded to Amazon's inquiry and said its PerplexityBot respects robots.txt but ignores the protocol when users enter a specific URL.
Jason Kint, chief executive of Digital Content Next, the digital content industry trade association, believes that if the allegations against Perplexity are true, the company has violated a number of principles to prevent potential copyright infringement. He emphasized that by default, AI companies should not access and use publishers’ content without permission.
Currently, this incident has triggered widespread attention and discussion on the way AI companies obtain data. The industry is looking forward to the release of the results of the AWS investigation and possible further action against Perplexity.
The Perplexity AI incident highlights the challenges and ethical dilemmas faced by artificial intelligence companies in data acquisition. It also warns the AI industry that it needs to establish more complete data specifications and management mechanisms to ensure the legality and compliance of data acquisition and promote artificial intelligence. healthy development of technology.