Will the Internet Archive be able to handle the extra traffic if this goes mainstream, and/or will Mozilla run a cache for popular requests?
Mozilla 404s '404 Not Found' pages: Firefox fills in blanks with archive.org copies
Mozilla is trying out a new experimental feature in Firefox that lets you smash through annoying 404 dead-ends. The "404 No More" feature uses copies of webpages from the Internet Archive's Wayback Machine to replace 404 "not found" errors with something more useful. If you visit a link to a page that's disappeared, Firefox …
COMMENTS
-
Friday 5th August 2016 01:15 GMT HCV
Doesn't strike me as a good idea at all
I really don't like it when my web browser tries to outguess reality. If a page is missing, that's data of a sort right there. And instead showing me what is by definition outdated information? No. Don't.
I'm especially not crazy about the potential for a chilling effect. Sites will quickly learn that they can't actually delete a page -- unless they request that it be deleted from the Wayback Machine, too!
I've already seen archived sites get wiped from the IA at the request of a new domain owner. This is going to cause even more disappearances.
-
Friday 5th August 2016 09:54 GMT paulf
Re: Doesn't strike me as a good idea at all
You make a very good sub-point. When a domain is acquired by a new owner any request they make to remove pages from IA should only go back to the point they bought the domain - it shouldn't remove content that predates their ownership of that domain.
I've also seen the complete and only archive of a site removed from IA simply because the new owner of a domain (that had otherwise been dormant for 10 years) decided they didn't want any of their content archived. Having their new content omitted from IA is fine, if that's what they want, but they shouldn't be allowed to tell IA to wipe historic data of that domain that predates their ownership (content they had no hand in creating).
-
-
Friday 5th August 2016 02:03 GMT Peter Prof Fox
Why can't they
Add something like 'Click here to find the archived page' rather than serve up out of date/irrelevant information by default.
PS. I gave up on my blog because moderating the comments was grief. 80% were nasty-spam and then 19% were nasty-spam. I don't want Mozilla going round graveyards digging up putrid corpses.
-
This post has been deleted by its author
-
Friday 5th August 2016 03:33 GMT Geoffrey W
Aren't you all a bunch of naying nellies. Did any of you even try it? It still displays the 404 page but you get a bar at the top of the page if there is an archive entry wherein you can click to view said archival page. I thinks its neat. I already found it useful. I had a dead link in my bookmarks which I never got round to checking wayback for, and lo! there it is! I think its a neat idea. Oh, and yes, when it shows the page it has a bar at the top telling you its from the wayback machine archives. So quit griping and find something more worthwhile to bitch about.
-
Friday 5th August 2016 10:13 GMT VinceH
"Aren't you all a bunch of naying nellies. Did any of you even try it? It still displays the 404 page but you get a bar at the top of the page if there is an archive entry wherein you can click to view said archival page."
Perhaps they didn't feel the need to try it because the article told them what it does... inaccurately, by the sound of it:
The "404 No More" feature uses copies of webpages from the Internet Archive's Wayback Machine to replace 404 "not found" errors with something more useful. If you visit a link to a page that's disappeared, Firefox will fetch from archive.org a version of the page before it vanished.
Perhaps your criticism should therefore be directed at El Reg for incorrectly describing what it does, rather than readers for reading it.
-
-
Friday 5th August 2016 03:52 GMT GrapeBunch
It's tedious for the site-owner to do this by hand
If you write a web page with external links, typically 90% of those links will be 404 after a few years. The pages largely still exist, but the site has been moved or the pages in the site reorganized. Installing the latest and greatest content management system used to break most of the links on a site.
It's faster to change the links to point to IA than, in general, to find them anew. After one boring session of editing links this way, I was tempted to suggest pointing to the IA version from the very start. Another pitfall avoided is when the link still exists, but its content changed. For example when a domain is allowed to expire, then is taken over by pr0n or worse.
I hope that IA doesn't delete, but rather deep-archives sites that are requested removed from public access. Eventually, what's on them will flow into the public domain. OK, most would be best forgotten, but the gems justify the overburden.
-
Friday 5th August 2016 16:51 GMT John Brown (no body)
Re: It's tedious for the site-owner to do this by hand
"I hope that IA doesn't delete, but rather deep-archives sites that are requested removed from public access. Eventually, what's on them will flow into the public domain. OK, most would be best forgotten, but the gems justify the overburden"
Archaeologists just love digging through rubbish tips and analysing coprolites. Maybe 100/200/300 years in the future, all those old archived Geocites pages will be gold to social historians
-
-
-
-
-
Friday 5th August 2016 06:58 GMT Geoffrey W
Re: better tha click-jacking the 'bad' URL
Yes, that was my point but you put it more succinctly, but why couldn't they take it one step further - server not found is not that different than page not found. Can't be that hard to just check an unknown server against the archives and offer to show that if found.
-
Friday 5th August 2016 08:37 GMT Alister
Re: better tha click-jacking the 'bad' URL
@Geoffrey W
Without looking into it in any depth, I would guess it currently works by examining the response header returned by the server, to determine when to spring to life.
If there is no server response (i.e. the server isn't there) then it can't do that.
If they also built it to work when there was no server response received, you would be in danger of flooding the archive with requests for non-existent or incorrectly typed URLs.
-
-
-
-
-
Friday 5th August 2016 08:57 GMT Tannin
People aren't stupid
People aren't stupid, you know. They read the article, and the article is at pains to say that Firefox will redirect 404s to the archive. It does not, repeat does not, bother to make it clear that (according to various grumpy comments above which I have no reason to disbelieve) this isn't a redirect at all but a glorified error page that offers to serve the archive page instead. (A very different - and much more sensible - thing.)
Subject to who you are and how plausible your message is, people tend to believe things you tell them. When you are a writer for the Register, we tend to think you probably know your stuff and take it at face value. (Stand aside one loopy science malreporter, of course.) When what you write seems plausible (e.g., when you suggest that Mozilla management have come up with an ill-considered "improvement" of dubious value - just to pick an example completely at random), people tend to believe it.
In short, don't bloody criticise people for posting perfectly sensible responses to the (you would have thought) trustworthy news they read. Instead, criticise the highly misleading, headline-chasing article they are responding to.
Thankyou, Mr Grumpy and your friends, for pointing out that Mozilla haven't been as stupid (this time) as the article makes them out to be. (Assuming you have your facts right, of course, which I am happy to do.) No thanks for the manner in which you did so.
-
Friday 5th August 2016 17:02 GMT Geoffrey W
Re: People aren't stupid
I assume you mean me so, sorry. This is the internet though and rule one should be "Trust No One" and rule two "The Truth is Out There". It gets so tiresome to find hordes of (well, a few) people jumping up immediately and being negative before they really know. Instead of "Oh that sounds interesting, I'll go and look further" they offer "That's just Stupid!" I went and became a test pilot and flew the feature, up in the air and everything, without a seat belt wearing only my big goggles, so what I reported is what I saw.
The register may have people who know their stuff but they aren't above trolling their readers, and perhaps just being wrong in their assumptions too without researching what it actually does. Research these days often seems to comprise reading the press release, searching google, then reading the headlines and perhaps the first paragraph if you feel like going really deep.
I did try to not be totally flamey and attempted an amused tone by using a silly phrase like naying nellies. I've had more than my share of down votes this week. Sorry.
-
-
Friday 5th August 2016 08:59 GMT Cthonus
Real world issues
You know there are valid reasons pages get junked from websites.
Given the number of photographs I've had copyright abused over the years I wouldn't be happy if some of the dead links ressurected images I've had ISP remove from the archives.
What about articles that went to be considered libellous and were taken down? Given how lazy people are there must still be hundereds of broken links/bookmarks cached and never updated, even on legitimate sites...
-
-
Friday 5th August 2016 11:53 GMT Anonymous Coward
This is great
This is a wonderful new feature.
I recently decided to go back and play a computer game that from about 10 years ago.
What I found was is that this generation has pretty much abandon text based web pages in favor of monetized youtube videos. There was once a plethora of fan web sites for this game with tons of information that you could search and find. Now those pages are gone, replaced by hundreds of youtube videos/channels that are almost impossible to accurately search for specific data.
I had come to rely on the wayback machine to resurrect the old missing sites for my game, and started to wonder why no one had yet devised some plugin that would automatically redirect 404s to try the wayback, as I was getting tired of getting a 404, copying the original link and then going to the wayback and pasting the link in.
-
Friday 5th August 2016 12:34 GMT Anonymous Coward
Pullling the last available site before it went kaput? Hmm...
People with personal pages that don't change often (like that raw HTML homepage you tried to host and learn in 1995) can simply load their websites, wait archive.org mirror it, and take it down.
Free hosting FTW.
And I wonder what happens to the like of piratebay, and image repositories...
-
Friday 5th August 2016 12:38 GMT Richard Lloyd
It's a banner across the top that asks if you want to load the archived version
Unlike most people here (or the original article author I suspect), I actually went ahead and installed Test Pilot. Here's one of the screen shots showing the 404 not found thing in action:
https://testpilot-prod.s3.amazonaws.com/experiments_experimentdetail/2/4/24ddd4335aca6b96cca9106a6f3411d2_image_1470245154_0440.jpg
This seems like a good way to do things - show a banner at the top of the page giving the option of seeing the archived version. If the original article here at El Reg had made that clear, I suspect there'd be a lot less outrage. I, too, think that auto-replacing a 404 not found page with an archive.org version is a very bad idea - that's what this article implied...
-
Friday 5th August 2016 18:40 GMT Midnight
I don't get it.
I understand the github reference, but what's so amazing about Bloomberg's 404 page?
Aside from having over 110k of scripting and menus, the page just says "404. Page Not Found / Unfortunately, this page does not exist. Please check your URL or return to the Home Page".
Am I missing something, or are those two sentences just that much more amusing than anything else Bloomberg ever reports on?