Ask HN: What measures are you taking to stop AI crawlers?

Curious to know what steps people here are taking to protect their sites, products, and APIs. What have you tried that actually works in practice?

6 points | by kjok 15 hours ago

5 comments

  • mmarian 2 hours ago
    I set up the Cloudflare blocks on one site where I don't want the content to be ingested. Seems to work pretty well, my SEO looks to be ok too.
  • JohnFen 15 hours ago
    I spent a lot of time trying to find a good solution to this problem and failed, so what I ended up doing was to give up and remove my sites from the public web entirely.

    I'm eager for a good solution that will allow me to put them back, but I'm doubtful that's going to happen. In any case, I'm extremely interested in other people's replies here. Maybe there's a solution that I haven't been able to find!

    • chistev 19 minutes ago
      what do you mean by removing from the public web?
    • mmarian 2 hours ago
      I'm curious, what made you decide to completely remove them from the public web?
  • ATechGuy 11 hours ago
    Just saw this https://x.com/ycombinator/status/1960779353589211577

    They say "... can scrape any website—not even Cloudflare can detect it."

  • johng 14 hours ago
    Some of our sites have been getting absolutely hammered by the AI bots -- so much so they are taking down the sites. Even with cloudflare protection and caching. The only thing We've been able to do so far is tell Cloudflare to block all AI bots, modify the robots.txt and even then we've had to manually identify IP addresses and bots that ignore all of the above and block them specifically or at the ASN level.

    Cloudflare makes doing this kind of stuff easy but I would hate to have to do this manually on a webserver. And I don't like the idea of how much of the internet already relies on Cloudflare.

  • bediger4000 15 hours ago
    I have a lot of them in robots.txt as disallow /, of course. I have several getting 404 on any request whatsoever, Meta's AI crawler, Bytespider mainly, via Apache httpd mod_rewrite.