Perplexity AI Scraping Controversy: Cloudflare Says Bots Broke the Rules

When bots act like browsers, who really runs the internet, humans or their AIs?

Nkeiru Ezekwere
4 Min Read

What happens when your AI assistant visits a website you’re not allowed to? That is the messy, increasingly urgent question at the heart of a brewing battle between Cloudflare and Perplexity AI, one which could reshape how the internet works for everyone.

Earlier this week, Cloudflare accused Perplexity, the buzzy AI search engine, of quietly scraping data from websites that had explicitly blocked its bots. According to Cloudflare, they set up a honeypot: a fresh website with a robots.txt file that told Perplexity to back off. But when they later asked Perplexity about the site’s content, it had answers. Too many.

Cloudflare says Perplexity sidestepped the rules by disguising its crawler as a regular browser, a generic Chrome user agent, to be exact. “Some supposedly ‘reputable’ AI companies act more like North Korean hackers,” Cloudflare CEO Matthew Prince posted on X. “Time to name, shame, and hard block them.”

Spicy take. But the internet wasn’t entirely on board.

Plenty of developers and AI fans came to Perplexity’s defense, arguing that if a human is allowed to access public information, why shouldn’t an AI assistant acting on behalf of that human be allowed to do the same?

Even Perplexity itself pushed back. First, it denied that its bots were behind the activity. Then, in a blog post, it suggested the whole scandal was a Cloudflare marketing stunt and said the access likely came from a third-party tool it uses. More importantly, Perplexity argued that there is a real distinction between automated scraping and user-initiated access. In other words: “Our AI didn’t sneak in, we were invited.”

Related: “Ditch the Doomscroll” – Perplexity CEO’s Call to the Next Gen.

Cloudflare was not having it. The company pointed to OpenAI as a model of best practices, noting that ChatGPT agents follow robots.txt, respect blocks, and use a new standard called Web Bot Auth, a cryptographic method to let sites know a bot is friendly.

Behind all this is a bigger issue: bots are taking over the internet. As of 2024, bot traffic officially outpaces human traffic online, with over 50% of web activity coming from machines, many of them large language models. Not all of it is benign: scraping, spam, and even attempted logins are increasingly automated.

This shift matters, especially for websites that rely on ad traffic. In the Google era, search engines would index content, send humans to your site, and (hopefully) boost your business. But AI assistants? They fetch info, give answers directly, and skip the link-clicking altogether.

Gartner even predicts that by 2026, search engine volume could drop 25% as AI tools take over. If Perplexity and others become the default middlemen between people and websites, who benefits? That is the real tension. If a human tells an AI to visit a page, is it still a bot? If a website blocks the AI, is it blocking a human too? And as AI agents start planning your vacations, ordering your takeout, and picking your outfits, will websites have any choice but to play along?

- Advertisement -
Share This Article