back to article OpenAI pauses Bing search feature over paywall bypass abilities

OpenAI's experiment with allowing ChatGPT to search the web via Bing has been suspended because the feature inadvertently allowed users to bypass paywalls. First rolled out in May and limited to paying ChatGPT users, OpenAI updated its help page for the "Browse with Bing" feature yesterday to indicate that, as of July 3, …

  1. MiguelC Silver badge

    Why (how?) do the LLMs crawl behind the paywalls, in the first place?

    1. Anonymous Coward
      Anonymous Coward

      Because Microsft needs to advertise Bing somehow.

      This is some guerrilla marketing crap for Microsoft products. Stop pretending people pay for Bing let alone find workarounds for it... this is just ridiculous.

    2. EvilGardenGnome
      Pirate

      Because someone in the pipeline paid for the access to increase the available content. Now that world and dog is using this subscription, someone has realized that EULAs can really, really bite you in the ass.

    3. DS999 Silver badge

      Some paywalled sites allow web crawlers (I'm guessing OpenAI uses a web crawler similar to Googlebot to collect its data) to access the full text, so that searches will return their site more often. The problem comes when you see that site in a search, click on it, and find nothing to do with your search terms in what you can see without paying.

      1. heyrick Silver badge
        Happy

        Can be... interesting... to fiddle your User-Agent to pretend to be a search bot.

        1. FILE_ID.DIZ
          Boffin

          Well... might not be that simple.

          Google, Bing and DDG, for example, maintain lists of their bot's IP ranges. Google, Bing and DDG also seem to use FCrDNS with a specific domain (googlebot.com and search.msn.com) and DDG uses duckduckbot-X.duckduckgo.com where X is an integer it seems.

          Bing bot IP ranges - https://www.bing.com/toolbox/bingbot.json

          Google bot IP ranges - https://developers.google.com/static/search/apis/ipranges/googlebot.json

          DDG bot IP ranges - https://help.duckduckgo.com/duckduckgo-help-pages/results/duckduckbot/

          And as the saying goes - the "good guys" always have to be right. The bad guys (in this example, trying to circumvent a paywall) just have to be right once.

          And I have to imagine that writing a middleware to update whatever application(s) is/are responsible to allow spiders in based on third-party provided, non-RFC standardized formated IP data might be harder than just looking for a UA string. At least until someone in the bean counter department notices.

  2. heyrick Silver badge

    Thank god

    Sometimes Bing can find stuff that Google is convinced doesn't exist.

    The use of AI completely buggered up using Bing on my phone. Pressing the Enter arrow would give a new line in the text rather than starting a search, and there was no obvious on-screen method to get it to actually begin searching instead of trying to get me to edit the text.

    At least, for today, the Enter button on the on-screen keyboard is a magnifying glass and it does what it is supposed to.

    1. streaky

      Re: Thank god

      I think you may have misunderstood the story, unless there's a joke in there I'm missing the nuance of.

  3. streaky

    Bing?

    Self-evidently a beta feature, which it states repeatedly - also IDK why people would use the Bing feature anyway when the plugins are available to paid users on GPT4 and they do a better job of exactly this task.

  4. irrelevant

    Cooyright

    "I am not able to display the full content of articles from [the site you requested] or any other publication that is protected by copyright,"

    So that's pretty much every single thing on the Internet, then, given almost everything you can find is copyright somebody or other. Even if they make it freely accessible. And it's even more of a stupid thing to say given they trained their LLMs on (copyrighted) data in the first place, without asking, which many people are upset about.

    1. katrinab Silver badge
      Megaphone

      Re: Cooyright

      Everything is copyright, unless the copyright has expired such as most of the stuff on Project Gutenberg, or the copyright holder has explicity renounced the copyright.

  5. Robert Carnegie Silver badge

    This is what "A Logic Named Joe" was about

    A computer which gives you what you ask for, whether you should have it, or not.

  6. EricB123 Silver badge

    It's only a matter of time

    Will ChatGPT send me a poop emoji anytime soon?

  7. HMcG

    What's the problem?

    I thought ripping off copyrighted material and republishing it without authorization or crediting the original source was the whole modus operandi of LLMs?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like