Reddit sues Perplexity for deleting posts, widening user data battle with AI industry

Spread the love

Thomas Fuller | Lightrocket | Getty Images

A social media giant Reddit launched a lawsuit against artificial intelligence company Perplexity, alleging it illegally deleted user posts to train its AI model, marking the latest data rights clash between content owners and the AI industry.

The complaint, filed in New York federal court on Wednesday, also names three defendants that Reddit says helped Perplexity harvest its data: Lithuanian data scraper Oxylabs, “former Russian botnet” AWMProxy and Texas startup SerpApi.

Reddit claims the three smaller entities were able to extract its copyrighted content “by masking their identities, hiding their locations and disguising their web scrapers as ordinary people.”

Perplexity, which operates an AI-based search engine, has denied the allegations and accused Reddit of “blackmail” and opposing the open internet, while SerpApi told CNBC that it “strongly disagrees” with Reddit’s claims and intends to defend itself in court.

The case is one of many brought by content owners accusing AI firms of using copyrighted material without permission to train their large language models. Reddit, in particular, has been on the front lines of this battle since starting one ongoing litigation against AI Anthropic’s launch in June. CNBC was unable to reach Oxylabs and AWMProxy.

In a statement shared with CNBC, Ben Lee, Reddit’s chief legal officer, said AI companies are “engaged in an arms race for quality human content” and that pressure has fueled an “industrial-scale data-laundering economy.”

Scrapers bypass technology protections to steal data, then sell it to customers hungry for study materials. Reddit is a prime target because it is one of the largest and most dynamic collections of human conversations ever created.

Reddit — which hosts more than 100,000 interest-based “subreddit” communities — said in its lawsuit that its user posts have become the most-cited source for AI-generated answers to Perplexity.

He added that he sent Perplexity a cease-and-desist letter, after which he “increased the volume of Reddit citations fortyfold.”

AI researchers previously noted that Reddit’s high volume of moderated conversations could help AI chatbots produce more natural-sounding responses.

In the age of artificial intelligence, Reddit is working to leverage its massive pool of data, only allowing access to it through AI-related licensing agreements. The social media company has signed such agreements with OpenAI and Alphabetis google.

In response to the lawsuit, Perplexity, in a post on the Reddit platform, claims that it is not training AI models of content, but simply summarizing and citing public discussions on Reddit. Because of this, it is said to be “impossible” to sign a license agreement.

“A year ago, after we explained this, Reddit insisted we pay anyway, despite legal access to Reddit’s data. Bowing to strong-arm tactics is simply not the way we do business,” the statement said, describing the lawsuit as “a show of strength in Reddit’s training data negotiations with Google and OpenAI.”

“Perplexity believes this is a sad example of what happens when public data becomes a large part of a public company’s business model,” Perplexity added, noting that data licensing has become an increasingly important source of revenue for Reddit.

In February, Reddit COO Jen Wong told the trade publication Adweek that AI licensing deals with Google and OpenAI make up nearly 10% of Reddit’s revenue.

Leave a ReplyCancel Reply

Trending now