The Register print version next!
Come on, you know you want to. Just think of all those Airport shelves you could dominate.
In January, an online publisher launched a website called Lab 6 that serves its content as a PDF to protest the state of the modern web, and has caused quite a stir. There's nothing novel about posting PDFs to the web, but doing so as a protest against web technology is akin to taking a stand in the tabs-spaces debate – PDFs …
Yep, I see the Joke Alert. The Print page isn't plain HTML, though:
jonathan@Odin:~$ curl -s https://www.theregister.com/Print/2021/07/20/pdf_html_debate/ | grep -A 1 "<script"
<script>
var RegArticle={id:216169,pf:0,af:0,bms:0,cat:'news',ec:['adobe'],kw:[["software",'Software'],["web",'Web'],["development",'Development']],short_url:'https://reg.cx/40A4',cp:0,noads:[],author:'Thomas Claburn'}
--
<script>var RegTruePageType = 'www print';</script>
<link rel="canonical" href="https://www.theregister.com/2021/07/20/pdf_html_debate/"><link rel=stylesheet href="https://fonts.googleapis.com/css?family=Arimo:400,700&display=swap">
--
<script>
var RegCR = true;
--
<script src="/design_picker/4c219a18bc536a8aa7db9b0c3186de409fcd74a7/javascript/_.js"></script>
<script async onerror="gpt_js_errored()" src="//securepubads.g.doubleclick.net/tag/js/gpt.js"></script>
<script>
RegGPT('reg_software/front');
--
<script async src="https://www.googletagmanager.com/gtag/js"></script>
--
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
HTML is currently effectively just whatever the browser vendors say it is
Isn't that the idea of a markup language? It just says "emphasize this", "use large text" or "make a list", and it's up to the browser to do that in whatever way it feels appropriate. It was never intended to be a precise page description language, even though it is sometimes (ab)used for that.
His argument seems to be that HTML is too flexible, and he doesn't like the way people take advantage of that, so he wants to rigidly define how information is displayed. That's his (rather OCD?) choice, but it's more a personal choice of a particular publication mechanism than of a standard for the web. A bit like a film maker choosing to use black & white silent film, and decrying colour or sound, it's just one choice among many, and will have its fans & its detractors.
Whilst rummaging in the garage over the weekend I came across a backup CD from 2005.
First shock was just how many websites I was able to stack uncompressed on less than 1GB of disk. Second shock was just how many ran on a modern Apache2 server without a hitch including some SSI and Perl scripting in the cgi-bin. They ran rather fast on my RaspberryPi.
They were all personally hand-chiselled HTML & Perl code. But then I came across one site that used php and mysql (An old phpBB version 1 forum). Impossible to hack into a working site on a modern system because of changes in all three 'packages' blew dependencies. That's where the rot started.
Simpler times, more dependable times when you did everything serverside and used flat databases. Oh and because you could assume no more than a 54k modem and a 640x480 screen they were light and compact.
I always argued that a good website was about content, not appearance. I made more money from ads that blended and were more relevant to the page (the original adwords) then when they tookover and created the ad-blocker market in a never to be resolved war of attrition.
So a pint James. Achieving a similar result by another method.
What he was getting at is that the browser vendors can add features willy-nilly to HTML and get them written into the standards.
That's how web standards have always worked. TBL developed HTML as a project. HTTP + HTML may or may not have surpassed gopher and been written into a standard. It's always been a matter of throwing mud at the wall and seeing what sticks.
The problem is not so much browser vendors developing new features willy-nilly. It's more that the dominant vendor (Google) not only controls the engine that most browsers use, but also the ad-platforms and huge gobs of the internet - from consumer services like Search, gmail and YouTube to developer services like Google Fonts.
If they decide that something is a good idea they can shove it into their services, add it to Chrome and bang, it's a de facto standard. Same with AMP - dev to our "standard" or drop down the rankings.
This is a very different world from when Mozilla would push a feature but it would depend on a combination of developers thinking it was useful enough to use and vendors adding support to Webkit/Blink/Trident/Presto for it to get any traction.
It's a monopoly issue rather than a living-standard issue. But as he quite correctly says, modern browsers are basically an OS and the web has become an application platform. The complexity is such that it is impractical to build a new browser engine from scratch - Microsoft of all people tried and gave up.
Combine that complexity with modern dev practices leaning heavily on brittle dependencies and you have a recipe for disaster.
> The complexity is such that it is impractical to build a new browser engine from scratch - Microsoft of all people tried and gave up.
Ahem. Microsoft has never built a browser from scratch. The first version of Internet Explorer was simply Spyglass Mosaic with a different badge.
-A.
> TBL developed HTML as a project
No, he did not.
Html was a large-scale global discussion, a major integrated group effort.
Tim BL "merely" hacked up a proof-of-concept app to display the read-only subset of html (hence: "browser"). In a bid to prod the discussion re the interactive subset of html out of the doom loop it was getting caught up in.
Sadly, the browser took off so hard that the interactive intent got drowned in history. We are today still using only the read-only web, plus app-driven/code-driven hack-ins to get the interactive bits. Which seems the source of much of James's discontent.
"Amusingly", jwz was bitching-about precisely the same problem --of each browser differentially creating the "standard" by their idiosyncratic code/behaviour-- already in the mid-to-late 90s.
"His argument seems to be that HTML is too flexible"
I read it as something along the line of "HTML has standards. But nobody adheres to them. Because standards are so confining"
Seems to me like a pretty good analysis.
I'm not so sure about the proposed solution. For one thing, PDF supports scripting and once you allow scripting, every evil in the digital universe probably swarms out of the box if you open it. see -- https://nora.codes/post/pdf-embedding-attacks/
PDF/A forbids embedding JavaScript.
It's mostly JavaScript which turns the browser into CPU-thrashing data-slurping application runner. Turn off JavaScript and watch most websites degrade into a mess rather than fall gracefully back to an accessible page containing static data and then the browser using 0% CPU waiting for a click event... because HTML's been hijacked to provide a template for the JavaScript to stick content in and mess around with - this was not HTML's original intent but it's been pushed into that job.
So while he complains about HTML, it's really the newer parts of HTML and JavaScript which are the guilty parties.
"Turn off JavaScript and watch most websites degrade into a mess rather than fall gracefully back to an accessible page containing static data"
To be fair, I see quite a lot of sites failing gracefully to an accessible page containing static data. Unfortunately, it's usually a very small amount of static data that says "You need javascript for the website to work".
indeed: The only parts of the web site I run that are NOT hand-coded W3C HTML 4.1 compliant is the drop-in image gallery, which is using PHP and AJAX for whatever it does.
And the stuff I am using is all tables and just enough CSS to make things simple. As a result, it's small, and it's freaking fast, even if it's a little on the plain looking side.
HTML was based on SGML, but the original implementations didn't use it in the way it was intended to be used. SGML originated in the 1970s as a way of marking up the structure of a document - headings, paragraphs, tables, equations, etc. It was supposed to prevent data being locked into proprietary, binary formats. It was never intended to include styling information (font, colour, margins, etc).
HTML 4 was intended to clean up the non-structural parts of HTML, while CSS was a decent attempt at separating style from structure. XHTML was then going to move to XML, since SGML had some complexity that was a legacy of the limitations of 70s era computers.
Sadly, the lunatics then took over and gave us HTML "5", which is not based on XML - it's just a moving target of tag soup. Couple that with the horror that is JavaScript, literally a weekend hack that was never meant to be more than a proof of concept, and now websites are a mess.
That being that the point made above about badly defined standards and badly behaved and unplanned extensions from the browser and webserver builders. No one ever stopped bolting junk onto a pre-alhpa grade design. (scream all you wan't, it fails at deterministic encoding, still has encapsulation problems, and is preposterously inefficient from a encoding standpoint)
The browser makers shouldn't have been in the drivers seat the whole time. We have millions of websites, about 4 relevant browsers, and about as many relevant webserver implementations. Ignoring those other stakeholders was a big part of the problem.
The other part of the problem is that the attempts to manage the standards were ad-hoc, incompetent, and lacked the broader vision to produce an integrated or workable whole. So now we are getting to the point where people are beginning to rebel and post content in static formats because HTML has become an unquenchable dumpster fire.
PDF/A isn't a good solution, but it was a better one for the author of Lab6. But HTML refuses to permit a safe, tight standard that results in a deterministic and reproducible layout, so I won't burn them at the stake.
HTML5 is absolutely not a moving target of tag soup. In fact, it's the first version that isn't, as it defines a deterministic parsing algorithm. Prior to HTML5 it was very much browser dependent, and I believe nailing it down took many years.
XHTML (HTML as XML), by contrast, would be wildly unsuitable as a format for the web, unless you want 50% of the web to fail to render due to parse errors.
Sounds like they're talking about PDFs having fixed page and font sizes, rather than being mark-up documents like HTML. A few PDF readers have tried to implement a "Reading View" that doesn't preserve the page size, but not many do that, and as for the results, YMMV.
Firtman is also wrong.
PDF was originally designed to fix a layout for print, but now includes features like text reflow for viewing on arbitrary screen sizes. Like HTML it has evolved beyond it's original design goals. Is that a good thing? That can be debated ad nauseam, but if HTML is allowed to evolve to HTML 5 then PDF is surely allowed a crack at branching out.
In all honesty, PDF often "just works", even if it's not as elegant as other options in various scenarios. You can archive it, checksum it. So many sites these days basically break the Wayback Machine because the Web Archive simply can't keep up with the dependencies and dynamic structure of what could (and often should) be a static page.
In the meantime: Behold, the lightest, most compatible and responsive site on the web. An example to us all.
I agree with most of the sentiments of this project, but surely going back to sanitised HTML would be better.
I am finding it increasingly difficult to read small text, so a basic thing I do with web pages is to enlarge them. In basic HTML, the text gets re-wrapped. "Clever" web pages often have fixed line lengths, making this impossible. (I do also resize PDFs.)
I agree with you. The sentiment is most assuredly well-placed, but plain HTML would have been a much more valid demonstration.
What is killing the Web's usefulness is JavaScript. The over-reliance on this demonstrably dangerous technology is not just a nuisance, it is responsible for 99% of all malware infestations and, at the best of times, just transforms a web site into something you can't even bookmark properly.
Long live NoScript !
You're correct of course. For one thing, I doubt that it will ever be possible to secure an internet that permits Javascript or any other form of scripting. You'll surely end up playing Whack-a-mole with an boundless sea of exploitable OS and application vulnerabilities. BUT, if you don't allow scripting, how are you going to do slippy maps (Google Maps, OSM, etc)? Editable forms? Spell checking?
For example, composing and posting this message looks to require Javascript.
> if you don't allow scripting, how are you going to do
I'd say there are indeed valid uses for JavaScript, like those you quoted for instance. As a tool JavaScript has its uses, but like a knife it can easily be used in harmful and annoying ways, and today's web is full of examples.
Don't accuse JavaScript, accuse greed and arrogance.
Which is why the situation isn't going to improve, quite on the contrary.
"BUT, if you don't allow scripting, how are you going to do"
"slippy maps (Google Maps, OSM, etc)?": Well, you could have a client program which you download and run so you know what is running what code, but at some point you either have to let a complex program run on a browser or not.
"Editable forms?"
With the HTML tags form, input, etc. They've been there since 1990. That's what JS pages use too, they've just got extra junk around them.
"Spell checking?": Your browser does that. It sees a texttarea tag and uses its local spellchecker on it. It works pretty well especially because you control the dictionary and it doesn't have to send the typed text to a remote server to check against their dictionary. Most sites where you see spellcheck in action are doing that. Try turning off scripting on this page and writing some misspelled words in the box and I'm guessing it will be the same. It certainly is for me.
In fact, for most purposes an editable form should just be a text document with spaces to fill in stuff.
Only if it's something complex should there be anything more elaborate. ( With a special place to be reserved in Hell for the writers of anything with compulsory fields unless said information is of absolute necessity).
"In fact, for most purposes an editable form should just be a text document with spaces to fill in stuff."
No, it shouldn't. Unless you're processing everything manually. Having a HTML-based form lets you get the contents of boxes without having to do any parsing and insulates you from whatever people might have done to the surrounding text. A text form is fine if you want to handle it that way, but nothing is wrong with a more common solution that's been supported for decades.
"With a special place to be reserved in Hell for the writers of anything with compulsory fields unless said information is of absolute necessity"
No problem with that either. They have to put something in fields which you want because it's your form. If you don't want to provide that information, you can probably just put in junk. If you don't like a form which requires you to enter an email to send them information, then don't send them information; they're obviously not interested in anonymous input.
The thing about sanitised HTML - at a casual glance, it's very hard to differentiate from common, iQuery-infested HTML.
What he says is true: even good devs with sound instincts find it very hard to resist the lure of "diagnostics" or "feedback". I've done it myself. PDF avoids temptation.
This implies that feedback forms are necessarily wrong. I don't have them on most of my pages mostly because I don't want to deal with the readers' feedback, but if someone does want to give that ability, why should they be limited to writing an email address for someone to copy and paste over? It can be done without JS, it has been done for as long as the web was a thing, it doesn't prevent archiving, so what's the major problem with it?
PDF cannot escape its Postscript origins: it will always attempt to create fixed sized-pages.
EPUB would seem the most reasonable format: this is essentially a subset of HTML with a resource tree. Authoring tools are now pretty reasonable and, for those who want it, DRM is supported.
Still, I bet the guy loves all the attention he's getting!
Of course he know PDFs are a provocation, but what he says about turning a simple way to share information using "hypetexts" into the ugliest way to develop the ugliest applications is true. And the only working business mode is really to litter them with tracking to sell ads - leading straight in the "social networks" issue.
His complaints are true, and his reaction doesn't do anything about it. You can write very ugly pages with HTML, you can have lots of ads and trackers, it makes a lot of the internet a pain. His solution, however, is just not to do that, which could be done as easily with clean HTML as with PDF (better in my opinion). It doesn't change the calculations of the ad-supported business model, nor does it really fix problems. Not only that, but he also has to use some HTML so users can find the PDFs; they do hypertext badly. The complaints aren't original, and the solution isn't an efficient way to resolve any of them.
"PDF Is not a format suited to share in different formats and diverse devices," he told The Register. "It's a format created for printing. So it's like using a boat to drive across a street."
PDF's used to be a format for printing, but those Fscktards at Adobe had to try and cram in a load of other functionality which wasn't needed and made the bloated insecure mess that is Acrobat reader and ruined it for everyone.
Sure Lab 6 are publishing them as PDF/A without the ability to run JS but how do you know it a PDF/A and not a different version which does allow JS without opening it first?
I personally use the built in Linux Mint PDF reader Xreader which after trying a couple of PDF which contain javascript appears not to support JS so at least that should make it a bit safer to open PDF files from unknown sources.
PDF was and remains a handy format for sharing documents. PDF's current ubiquity exemplifies how neither a commercially produced program nor the ideas behind it can remain corralled by copyright/patents for long.
I am surprised Adobe continues to market it, bells, whistles, and all. Perhaps businesses, too lazy to look for now legitimate alternative sources of PDF software for their Windows-based devices, happily shell out subscriptions. They have become inured to paying through the nose for Windows operating system, and MS office software, so why not Adobe too?
Linux varieties are well endowed with PDF creation and reading software. I find LibreOffice excellent for creating PDFs. Presumably Windows versions of LibreOffice offer the same facilities. I would not be surprised if MS Office does too.
… and not a different version which does allow JS without opening it first?
You could feed the document to a PDF/A validator before using your preferred PDF viewer.
mark l 2 wrote:
I personally use the built in Linux Mint PDF reader Xreader which after trying a couple of PDF which contain javascript appears not to support JS so at least that should make it a bit safer to open PDF files from unknown sources.
I just checked to see if I had that installed, I do, but it's shown as Document Viewer in the open with menu. I've set it to use that, as it does a better job of displaying than LibreOffice.
It had to happen : ladies and gentlemen we have found the real Nathan Barley.
For instance, you could start here, with Issue 2 : A Gemini-PDF polyglot and The Tilde Quilt where "If you enjoyed last month's PDF-MP3 polyglot doc-cast, you're going to love this hybrid Gemini-PDF file".
You will be dumbfounded by The Tilde Quilt, which "combines the exclusivity of private pubnix membership, the value-hoarding artificial scarcity of sequential integers, the inspiration of spatial constraint, the north star of the unbounded cartesian grid, and the immediacy of named pipes" and is a "brand new media paradigm".
It is a crappy little piece of ASCII art.
I was quite excited when I read the article on my laptop - everything looked good and seemed to make sense. Tried it on my mobile and had to remind myself I'm not a youth, nor a hawk.
Also, PDFs are largely benevolent, right? But here's an example I found earlier....
unzip -v pocorgtfo19.pdf
Archive: pocorgtfo19.pdf
Length Method Size Cmpr Date Time CRC-32 Name
- -------- ------ ------- ---- ---------- ----- -------- ----
15905 Defl:X 14500 9% 2019-01-04 16:19 838d0657 defender.zip
90291 Defl:X 23139 74% 2019-03-04 19:23 8da8fe88 phrack6612.txt
181069 Defl:X 181080 0% 2019-01-04 16:19 a00486ec jonesforth.tar.bz2
......
19676 Defl:X 17862 9% 2019-03-09 19:41 d8324af6 polyocamlbyte.zip
83900 Defl:X 77184 8% 2019-01-04 16:19 c1f51574 PDFRick.zip
- -------- ------- --- -------
33897715 30472096 10% 33 files
.. and yes, it acts exactly like you'd a PDF to.
When running the command
file pocorgtfo19.pdf
I’d expect a PDF to return something like
pocorgtfo19.pdf: PDF document, version
version_number
rather than something like, say,
pocorgtfo19.pdf: Zip archive data, at least v2.0 to extract
.pdf is a container, similar to .mov. Running which ever command will only yield it's expected results, not the results of an undefined recursive lookup. If your going beyond human representation, then you don't want recursive behavior unless you find a match to something specific.
I feel both of the examples above would be better off using something like ExifTool (or whatever), but I'm not sure why at least 1 pipe (to any filter) wasn't included in either command.
It is all of our fault really. We are the idiots that keep on the treadmill of misfeatures.
Why don't we just grab the source to Firefox ~3.x, backport as many security fixes as we can identify and just log bug reports with all website vendors that don't render properly on it?
I don't even foresee there being too many security patch backports. Many of these crept up with the much later feature bloat.
Loading up 4MB of libraries just to put a colour gradient behind some text, that killed web technology! People who have no design experience or qualifications calling themselves designers and putting together apps and pages that are impossible to use. Design is not just about what something looks like, it's about practical considerations around use, something at least 80% of websites fail on in their owners race to have the modern equivalent of Homer Simpson's Dancing Jesus on the front page!
I remember back in the late 80s floating my car at moderate speed. The road vibration died away as the wheels came off the road and it was then that I twigged that what I had thought was just a streetlight which had blown down, was actually the cabin light of a mostly submerged car drifting towards me.
I had a glorious vision of The World's Best Ever Insurance Claim:
"My right front wheel broke his windscreen as I sailed slowly over the top of him."
.
Sadly, at that point both front bungs popped, two little fountains burst up in the footwells, and it was out the window to push it back to "shore" whilst it was still buoyant. Some mates turned up, towed me up to the top of the hill, and we clutch-started it on the full downhill run. First 2 times there was water coming out the exhaust pipe, but the third time it caught, and I drove home via the hill-top route.
Good little car, that. Toyota Corolla station wagon, manual, rear wheel drive.
And Brisbane, back when it rained. And thundered ànd lightninged and flooded routinely. Gotta wait 10yrs before that starts again. Boo.
This sums up my sentiment about the current state of the web. It is all active content with the result that it fails as an information serving network, its now just a bandwidth and system hogging mess that is entirely self serving. Its also a major security risk -- passive content cannot host malware, its just data. Instead of looking at the big picture, though, the discussion rapidly degenerates to back and forth about the deficiencies of PDF, how HTML version this or that provides essential capabilities and even how disability unfriendly pure text is. Its just smoke, missing the point entirely
Ad-tech is a self serving business, a vortex of believers who absorb huge amounts of resources because the moment people stop believing in it a huge section of the economy would collapse. That's why you don't see much information about the effectiveness of advertising. I know that everyone likes to get paid but you have to draw a line somewhere and from my perspective that line was crossed a decade or more ago.
These discussions aren't as pointless as you think. We all agree that ads are a nightmare, probably useless, and that we'd be happier without them. Well, probably some people don't, but I think most of us here do and I'm prepared to skip the people who don't for now. The problem is what we do about this belief above the ad blocking systems we have running. The stated solution in this case is PDF, which isn't really a good solution because, if it was used by others, ads could be integrated in it as well through JS integration. This guy has simply decided not to put in any ads, which is great, but he could have used lots of formats to do that.
Since many options are available, it's worth discussing what format is optimal. Let's say I wanted to kill adtech so I decided to send you all my pages as image files of the rendered text, and to prevent possible bugs in decompression software, in an uncompressed format. That does technically solve the problem of active code in documents--you're not getting any scripting in a raw array of pixels. However, that's a terrible solution and you should tell me that before I waste my time on it. PDF is not that bad, but it also has problems that make its use suboptimal in this case. Hence it is worth discussing what would be best for a format without advertising or tracking.
It will never be finished, so the bugs will never be fixed
Why fix bugs when you can include ways of finding out the battery state of the host, or accessing a timer with enough resolution to theoretically detect cache misses in the processor. It seems to me that a lot of these things are somebody thinking "this would be awesome!" and throwing together some code without taking any time to ask "now I have this idea, how can it be abused"?
and the complexity has grown so immense that nobody other than the incumbent browser vendors can realistically implement the web at all any more
Not to mention, any book on the subject (usually split into multiple books for the markup, the CSS, the scripting, and getting it all working together) is liable to be out of date the moment it rolls off the press.
he took issue with the way independent bloggers have moved to specific platforms rather than running their own services
Because running your own service can be a nightmare. Using a third party service puts you at the whim of somebody else, but on the other hand that somebody else gets to worry about the updates, the security, the...
but he said the cost has been the death of format experimentation as content gets squeezed into standard templates
True, but I'd imagine the majority of bloggers aren't geeks. They may even create their content using an app and have no idea of what actually makes it all work.
and distributed through a handful of aggregators.
The flip side of this is that people don't want to search all over the place to find a blog they might be interested in. Things that aren't all tidily listed on one of the main bollocks-regurgitators can be hard to find (if not near impossible).
The kind of recipes you find on page one of search results seem to exist solely for the sake of attracting eyeballs
Forget recipes, that's true for a vast amount of stuff these days. I was looking for a manual for an old bit of garden equipment. There was a site that looked useful that said it had manuals for what I wanted. It lied. It was basically just that text on a page with lots of adverts (well, blank spaces where adverts were supposed to have been ;) ). Sadly this came quite high in the search listing, top ten (maybe top five).
not because someone genuinely loved a recipe and wanted to share it.
Argh! Copyright infringement! Go straight to hell! Do not collect £200!
You can still write a document in very plain HTML but there’s no demarcation between static documents and web applications.
Sure there is. If it works with scripting disabled (that's your default, I trust) then it's not an application.
Even techies who pride themselves on writing efficient lightweight markup usually can’t resist putting a comment feedback form on the page or hiding some tracking JavaScript in the background.
I wrote my own blog code in PHP. It's crap but functional. The only scripting hidden on the page is a little Easter Egg if the user enters the Konami code. It isn't necessary in order to use the system. There's a comment form, yes, but zero scripting necessary, any validation is handled server side. No embedded advertising. And no tracking. Just like Happy Harry Hard-on, I "assume" that there are three people following. There may be more, I don't care. I write stuff because I enjoy it, not because I think I'll get paid per eyeball (I don't get paid at all, and I've refused outfits offering to pay me to let them write a guest article that's really an advert).
He does make some valid points, but I can't help but feel that he's trying to shout down a hurricane. The web of old evolved into this mutated monster because it's what people wanted. And when I say people, I don't mean like you and me, I mean like the estimated 2.85 billion Facebook users and/or the billion Instagram users. Most of them won't give a crap how it works, only that it does.
…but pdf is not the answer.
You have to change the mindset — yours at least, if you can’t educate the sheep. Publishing on the web costs money, just like running a symphony orchestra. Who pays? There are three models: 1. You subscribe to a site (The Financial Times, Jansis Robinson’s Wine site etc.) for regular content that you value. 2. You take advantage of “patrons” who make a gift of their resources — the BBC (patrons the British taxpayer), Genbank (patron the US Taxpayer), Wikipedia (individual donations) hobby sites from organizations that wish to attract members, or individuals such as myself and my wife who are happy to do it their selves. 3. You take your chance in a sea of advertising excrement.
But if you prate on about “the web must be free” because you are too mean to pay its price or too stupid to realize it has a price, you end up believing that the solution to the mess is technical specifications.
While I probably wouldn't pick PDF as a shining light for standards rebellion, I do appreciate the sentiment. Chrome is almost impossible to escape, what with at least 3 major browsers that are basically reskins of it - and the advertising claws that come with it. I still have good memories of the Proxomitron; a web content filtering proxy. Rather difficult thing to do in present day versus dynamically generated pages.
As much as a new format would be fun (and possibly ad-liberating), the usual barriers of forked efforts and critical mass would apply here.
Sir Tim is onto something with his criticism of how the web has developed, but what would an alternative look like and be made of? And how would you prevent it being subverted too?