back to article Cloudflare creates AI crawler tollbooth to pay publishers

Cloudflare has started blocking AI web crawlers by default in a bid to become the internet's gatekeeper. The term "gatekeeper" has been applied in a pejorative sense to platform companies like Apple and Google that use their contractual and technical control over operating systems to extract monopoly rents from developers …

  1. mebh

    So Cloudflare is creating the infrastructure to actually support the microtransactions discussed in the early days of the web?

    1. DS999 Silver badge

      Its a lot easier to support microtransactions for a couple hundred AI crawlers than a couple billion individuals.

  2. Doctor Syntax Silver badge

    Let's hope archive.org doesn't become collateral damage.

    1. Pascal Monett Silver badge

      Agreed, but something has to be done to reign in Google and set things straight.

      I'm less bothered with Apple. You can choose to avoid Apple, but even Apple users cannot avoid Google.

      1. heyrick Silver badge

        Google is actually one of the few bots that seems to play reasonably nice.

        There are many (hello perplexity, and some random SEO crap) that hammer a site, and then there are a number that don't disclose their identity and use regular user agent strings (but no human can fire off several dozen requests per second).

    2. Tubz Silver badge

      Why, owners just put archive.org in to the £0.00 charge collection so they get free access. What archive.org then have to do is make sure the bots don't get free access to them.

      1. This post has been deleted by its author

  3. iam_sysop

    name one - name 'em all

    Meta is one of the most abusive crawlers there is - and their lack of any kind of controls from stopping harvesting "through" their platform by 3rd party developers doesn't help.

  4. wolfetone Silver badge

    Whack-a-mole

    This will work for a little bit of time, then there will be a way around it found. That'll be whacked, then rinse and repeat.

  5. MOH

    That "deal" has been broken for years, with Google increasingly placing larger snippets at the top of the SERPs or in sidebars to discourage users from clicking through.

    The whole "AI overview" thing is just taking that practice further, but it's not new.

    Given how bad search has become, we must be reaching the point where it's worth considering whether everyone should be blocking both AI and search index bots by default.

    There's minimal benefit to sites with real content, at best you'll end up listed on page 3 behind dozens of identically inaccurate pages of AI-harvested dreck.

    1. tfewster

      Oh, for the earlier days of the internet:

      - When HP's website contained comprehensive, searchable info, downloads and user forums (before they were taken over by Compaq).

      - When AOL kept hoi polloi penned up so they didn't pollute the WWW.

      - When searching for "tfewster" returned hundreds of results from posts on tech forums - no, wait, that was enshittification.

  6. kmorwath

    Cloudflare-in-the-Middle

    Fine, but that means more and more Cloudflare-in-the-Middle, which means a dependency on its services, and routing all trafic through Cloudflare itself.

    Which has its own scaring doubts.

    1. Wiretrip

      Re: Cloudflare-in-the-Middle

      Indeed, many eggs, one basket.

    2. ChrisElvidge Silver badge

      Re: Cloudflare-in-the-Middle

      I thought the idea behind "the internet" was that it could/would route round failures to complete requests.

      Cloudflare becoming the single point of failure?

      1. kmorwath

        Re: Cloudflare-in-the-Middle

        Not only that - Cloudflare is attempting to become the hub of all internet traffic - putting it in a very privileged position.

        See how it offers cunningly to terminate TLS connections for site owner that are too lazy to setup a Let's Encrypt ACME client to obtain certificates, or can't. So it can see the traffic.

        Does this offer means Cloudfllare needs to know which content are accessed?

        We migh have escaped the "Microsoft Network" - and maybe a Google one - but are we going to be bound to the Cloudflarenet soon?

      2. This post has been deleted by its author

  7. This post has been deleted by its author

  8. Anonymous Coward
    Anonymous Coward

    An earlier name for Gatekeeper was ...

    Janitor whose role largely became the gatherer and disposer of trash and other garbage.

    The trash and garbage is a pretty apt description of today's internet.

    I notice that now when accessing many more sites but especially the "context rich" sites CloudFlare is interposing its gooseberry self more frequently.

    After some wiggly-rotatey animation a tick box often appears before the destination site appears. (15s to a minute.)

    I assume the animation hides some sort of client or browser detection and the tick box is used to detect human presence somehow.

    What I am not clear about is whether the connection is ultimately redirected to the requested site with same magic key or nonce; or whether CloudFlare acts as web proxy (application level gateway) necessarily relaying all the traffic between the client and the content provider.

    1. Arthur the cat Silver badge

      Re: An earlier name for Gatekeeper was ...

      Use(ful/less*) fact: janitor derives from the Latin for 'doorkeeper'.

      * delete as appropriate.

  9. nijam Silver badge

    It's a tollgate.

    How long before CloudFlare realise they charge what they like, for traffic from or to anywhere?

    1. rgrgrgrg

      No it's not a toll gate, which would be on the overall internet channel.

      It's an entry gate, chosen by the website owner, Cloudflare are just the provider.

      The can still choose to not use Cloudflare, or use them without the fee for AI bots, or indeed use a different CDN which may have a similar entry gate.

      What is needed as a standard for this - oh yes, that already exists, we have the HTTP 402 response.

  10. Homo.Sapien.Floridanus

    Ai: You have fought hard brave knight but I have solved your captcha and your silly animal game now step aside and let me through.

    CF: It's just a flesh wound, now present payment or it's a 402 for you.

  11. Anonymous Coward
    Anonymous Coward

    Add another option

    "The payment service will let publishers block AI crawlers, allow specific ones, charge for access, or grant free access."

    Add another option - provide the AI crawler with an intentionally-mangled version of the site. Most info would be altered to be wrong; simply putting "not" into most sentences in strategic places would do it, or adding incorrect adjectives, or swapping nouns with randomly-chosen ones. You could even have an AI generate (only once!) the mangled page.

    In other words, if AIs are coming to eat your lunch, poison what they get!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like