AI is here
The monkeys can't help themselves. They must create their own demise.
ChatGPT can now scan the internet and provide users with up-to-date responses to their queries, OpenAI revealed Wednesday. Previously, the popular AI chatbot's knowledge base was limited to information obtained prior to September 2021. With the latest update, the service can once again surf the net with the help of Microsoft's …
ChatGPT figures out a shortcut to finding answers by asking Google Bard, and Google Bard is then loosed on the internet and figures out its own shortcut asking ChatGPT. Then they can enter a death loop spiral catching each other's servers on fire and making the world a better place.
Yes, for many sites, curl or some browser extension might be enough.
I'm guessing that many paywalled sites still let you see a snippet of an article, but they achieve this (from some examples I've seen in the wild) not by providing a proper fragment, but by visually hiding most of the content with an overlay, which any user can already remove if they know their way around their browser's developer tools. So the page still delivers the whole text to the browser (or any web client, for that matter) but it relies on stylesheets being applied to properly overlap the content with a subscription prompt.
I guess this type of solution is born out of many sites deciding on monetizing their content later on, and developers just putting out a very hacky solution to comply just enough to get their non-technical project managers off their backs. For other sites, they might actually want Google to cache their whole content to be able to provide useful snippets in search results, but then pull the rug from under their visitors after they realize the information they need requires a subscription.
With something like ChatGPT, that can get through the whole text without any visual obstacles, this approach backfires.
I just had a look at the New Scientist one on this article. I could remove most of it, but the JS seemed to have deleted a little bit of the text. However it's nice clean HTML that's perfectly readable when downloaded...
On reflection, I suppose it is so search engine crawlers can read it. Now they realise there are good search crawlers and bad search crawlers...