back to article Ad tech ruined the web – and PDF files are here to save it, allegedly

In January, an online publisher launched a website called Lab 6 that serves its content as a PDF to protest the state of the modern web, and has caused quite a stir. There's nothing novel about posting PDFs to the web, but doing so as a protest against web technology is akin to taking a stand in the tabs-spaces debate – PDFs …

  1. Anonymous Coward
    Anonymous Coward

    The Register print version next!

    Come on, you know you want to. Just think of all those Airport shelves you could dominate.

    1. Aleph0
      Joke

      Re: The Register print version next!

      Still supported IIRC :)

      https://www.theregister.com/Print/2021/07/20/pdf_html_debate/

      (note case sensitive on "Print" )

      1. Jonathan Richards 1

        Re: The Register print version next!

        Yep, I see the Joke Alert. The Print page isn't plain HTML, though:

        jonathan@Odin:~$ curl -s https://www.theregister.com/Print/2021/07/20/pdf_html_debate/ | grep -A 1 "<script"

        <script>

        var RegArticle={id:216169,pf:0,af:0,bms:0,cat:'news',ec:['adobe'],kw:[["software",'Software'],["web",'Web'],["development",'Development']],short_url:'https://reg.cx/40A4',cp:0,noads:[],author:'Thomas Claburn'}

        --

        <script>var RegTruePageType = 'www print';</script>

        <link rel="canonical" href="https://www.theregister.com/2021/07/20/pdf_html_debate/"><link rel=stylesheet href="https://fonts.googleapis.com/css?family=Arimo:400,700&amp;display=swap">

        --

        <script>

        var RegCR = true;

        --

        <script src="/design_picker/4c219a18bc536a8aa7db9b0c3186de409fcd74a7/javascript/_.js"></script>

        <script async onerror="gpt_js_errored()" src="//securepubads.g.doubleclick.net/tag/js/gpt.js"></script>

        <script>

        RegGPT('reg_software/front');

        --

        <script async src="https://www.googletagmanager.com/gtag/js"></script>

        --

        <script>

        (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){

    2. Anonymous Coward
      Anonymous Coward

      Re: The Register print version next!

      Brilliant idea! I’d subcribe to El Reg in print in a heartbeat. Tabloid format, please. Forums could be quite slow paced though…

      P.s. Live long James. Fuck Maximiliano.

  2. Phil O'Sophical Silver badge

    HTML is currently effectively just whatever the browser vendors say it is

    Isn't that the idea of a markup language? It just says "emphasize this", "use large text" or "make a list", and it's up to the browser to do that in whatever way it feels appropriate. It was never intended to be a precise page description language, even though it is sometimes (ab)used for that.

    His argument seems to be that HTML is too flexible, and he doesn't like the way people take advantage of that, so he wants to rigidly define how information is displayed. That's his (rather OCD?) choice, but it's more a personal choice of a particular publication mechanism than of a standard for the web. A bit like a film maker choosing to use black & white silent film, and decrying colour or sound, it's just one choice among many, and will have its fans & its detractors.

    1. Steve Graham

      I think you've misunderstood what he was saying. What he was getting at is that the browser vendors can add features willy-nilly to HTML and get them written into the standards.

      1. Lon24
        Pint

        Whilst rummaging in the garage over the weekend I came across a backup CD from 2005.

        First shock was just how many websites I was able to stack uncompressed on less than 1GB of disk. Second shock was just how many ran on a modern Apache2 server without a hitch including some SSI and Perl scripting in the cgi-bin. They ran rather fast on my RaspberryPi.

        They were all personally hand-chiselled HTML & Perl code. But then I came across one site that used php and mysql (An old phpBB version 1 forum). Impossible to hack into a working site on a modern system because of changes in all three 'packages' blew dependencies. That's where the rot started.

        Simpler times, more dependable times when you did everything serverside and used flat databases. Oh and because you could assume no more than a 54k modem and a 640x480 screen they were light and compact.

        I always argued that a good website was about content, not appearance. I made more money from ads that blended and were more relevant to the page (the original adwords) then when they tookover and created the ad-blocker market in a never to be resolved war of attrition.

        So a pint James. Achieving a similar result by another method.

        1. Snowy Silver badge
          Thumb Up

          Yes

          If ads did not need to run their own scripts to display and where related to what I was reading rather than needing to track my every move on the internet I would think about dropping the adblocker.

        2. Anonymous Coward
          Anonymous Coward

          Whilst rummaging in the garage over the weekend I came across a backup CD from 2005.

          I just had a "Walnut Creek CDROM" PTSD episode!

          (in a good way)

        3. Anonymous Coward
          Anonymous Coward

          PERL in cgi-bin. Not as clumsy or random as a Wordpress site. An elegant weapon, for a more civilized age.

        4. sreynolds

          Naah, I beg to differ. The rot started in 1995, with the first version of Javascript. We all know that time travel isn't possible, as that event took place.

      2. rg287

        What he was getting at is that the browser vendors can add features willy-nilly to HTML and get them written into the standards.

        That's how web standards have always worked. TBL developed HTML as a project. HTTP + HTML may or may not have surpassed gopher and been written into a standard. It's always been a matter of throwing mud at the wall and seeing what sticks.

        The problem is not so much browser vendors developing new features willy-nilly. It's more that the dominant vendor (Google) not only controls the engine that most browsers use, but also the ad-platforms and huge gobs of the internet - from consumer services like Search, gmail and YouTube to developer services like Google Fonts.

        If they decide that something is a good idea they can shove it into their services, add it to Chrome and bang, it's a de facto standard. Same with AMP - dev to our "standard" or drop down the rankings.

        This is a very different world from when Mozilla would push a feature but it would depend on a combination of developers thinking it was useful enough to use and vendors adding support to Webkit/Blink/Trident/Presto for it to get any traction.

        It's a monopoly issue rather than a living-standard issue. But as he quite correctly says, modern browsers are basically an OS and the web has become an application platform. The complexity is such that it is impractical to build a new browser engine from scratch - Microsoft of all people tried and gave up.

        Combine that complexity with modern dev practices leaning heavily on brittle dependencies and you have a recipe for disaster.

        1. captain veg Silver badge

          of all people

          > The complexity is such that it is impractical to build a new browser engine from scratch - Microsoft of all people tried and gave up.

          Ahem. Microsoft has never built a browser from scratch. The first version of Internet Explorer was simply Spyglass Mosaic with a different badge.

          -A.

          1. Graham Dawson Silver badge

            Re: of all people

            Counterpoint: edge prior to the chrominance was brand new browser engine. It was starting to show some real promise, too.

            1. samoanbiscuit

              Re: of all people

              Well, they should've started work on Edge when IE6 was still relevant. No years after the horse had bolted and Chrome became the dominant browser almost everywhere.

            2. captain veg Silver badge

              Re: of all people

              No it wasn't. It was Trident with as much cruft removed as they dared. There's a reason why there was a succession of security updates which targeted both browsers; it was to fix the same holes.

              -A.

        2. W.S.Gosset

          > TBL developed HTML as a project

          No, he did not.

          Html was a large-scale global discussion, a major integrated group effort.

          Tim BL "merely" hacked up a proof-of-concept app to display the read-only subset of html (hence: "browser"). In a bid to prod the discussion re the interactive subset of html out of the doom loop it was getting caught up in.

          Sadly, the browser took off so hard that the interactive intent got drowned in history. We are today still using only the read-only web, plus app-driven/code-driven hack-ins to get the interactive bits. Which seems the source of much of James's discontent.

          "Amusingly", jwz was bitching-about precisely the same problem --of each browser differentially creating the "standard" by their idiosyncratic code/behaviour-- already in the mid-to-late 90s.

    2. vtcodger Silver badge

      Maybe

      "His argument seems to be that HTML is too flexible"

      I read it as something along the line of "HTML has standards. But nobody adheres to them. Because standards are so confining"

      Seems to me like a pretty good analysis.

      I'm not so sure about the proposed solution. For one thing, PDF supports scripting and once you allow scripting, every evil in the digital universe probably swarms out of the box if you open it. see -- https://nora.codes/post/pdf-embedding-attacks/

      1. Dan 55 Silver badge

        Re: Maybe

        PDF/A forbids embedding JavaScript.

        It's mostly JavaScript which turns the browser into CPU-thrashing data-slurping application runner. Turn off JavaScript and watch most websites degrade into a mess rather than fall gracefully back to an accessible page containing static data and then the browser using 0% CPU waiting for a click event... because HTML's been hijacked to provide a template for the JavaScript to stick content in and mess around with - this was not HTML's original intent but it's been pushed into that job.

        So while he complains about HTML, it's really the newer parts of HTML and JavaScript which are the guilty parties.

        1. Cuddles

          Re: Maybe

          "Turn off JavaScript and watch most websites degrade into a mess rather than fall gracefully back to an accessible page containing static data"

          To be fair, I see quite a lot of sites failing gracefully to an accessible page containing static data. Unfortunately, it's usually a very small amount of static data that says "You need javascript for the website to work".

        2. W.S.Gosset

          Re: Maybe

          Spot on. See my note above re "read-only web".

        3. J. Cook Silver badge

          Re: Maybe

          indeed: The only parts of the web site I run that are NOT hand-coded W3C HTML 4.1 compliant is the drop-in image gallery, which is using PHP and AJAX for whatever it does.

          And the stuff I am using is all tables and just enough CSS to make things simple. As a result, it's small, and it's freaking fast, even if it's a little on the plain looking side.

        4. 89724102172714182892114I7551670349743096734346773478647892349863592355648544996312855148587659264921

          Re: Maybe

          It's great the the Reg still works without JS, for however long that lasts... The internet's future looks grim.

    3. nijam Silver badge

      > It just says "emphasize this", "use large text" or "make a list", and it's up to the browser to do that in whatever way it feels appropriate.

      Didn't CSS abomination stop all that?

      1. Anonymous Coward
        Anonymous Coward

        HTML was based on SGML, but the original implementations didn't use it in the way it was intended to be used. SGML originated in the 1970s as a way of marking up the structure of a document - headings, paragraphs, tables, equations, etc. It was supposed to prevent data being locked into proprietary, binary formats. It was never intended to include styling information (font, colour, margins, etc).

        HTML 4 was intended to clean up the non-structural parts of HTML, while CSS was a decent attempt at separating style from structure. XHTML was then going to move to XML, since SGML had some complexity that was a legacy of the limitations of 70s era computers.

        Sadly, the lunatics then took over and gave us HTML "5", which is not based on XML - it's just a moving target of tag soup. Couple that with the horror that is JavaScript, literally a weekend hack that was never meant to be more than a proof of concept, and now websites are a mess.

        1. W.S.Gosset

          > JavaScript, literally a weekend hack that was never meant to be more than a proof of concept,

          Well I never knew that. Interestingly, that's also precisely what the first browser was, too.

          1. Anonymous Coward
            Anonymous Coward

            Hence our present difficulties

            That being that the point made above about badly defined standards and badly behaved and unplanned extensions from the browser and webserver builders. No one ever stopped bolting junk onto a pre-alhpa grade design. (scream all you wan't, it fails at deterministic encoding, still has encapsulation problems, and is preposterously inefficient from a encoding standpoint)

            The browser makers shouldn't have been in the drivers seat the whole time. We have millions of websites, about 4 relevant browsers, and about as many relevant webserver implementations. Ignoring those other stakeholders was a big part of the problem.

            The other part of the problem is that the attempts to manage the standards were ad-hoc, incompetent, and lacked the broader vision to produce an integrated or workable whole. So now we are getting to the point where people are beginning to rebel and post content in static formats because HTML has become an unquenchable dumpster fire.

            PDF/A isn't a good solution, but it was a better one for the author of Lab6. But HTML refuses to permit a safe, tight standard that results in a deterministic and reproducible layout, so I won't burn them at the stake.

        2. Androgynous Cupboard Silver badge

          HTML5 is absolutely not a moving target of tag soup. In fact, it's the first version that isn't, as it defines a deterministic parsing algorithm. Prior to HTML5 it was very much browser dependent, and I believe nailing it down took many years.

          XHTML (HTML as XML), by contrast, would be wildly unsuitable as a format for the web, unless you want 50% of the web to fail to render due to parse errors.

        3. ecofeco Silver badge

          Thank you. You just saved me a lot of typing.

          This proposal to use PDF is either trolling or attention whoring because all you really to do is go back to basic HTML 4 to build fast, usable websites.

          Everything since then has been garbage added for someone else's profit and job security

  3. Tom 7

    PDF - does nothing it says on the tin.

    "Asked whether he felt the inability to alter PDFs through client-side intervention"

    News to me.

    1. rcxb1

      Re: PDF - does nothing it says on the tin.

      Sounds like they're talking about PDFs having fixed page and font sizes, rather than being mark-up documents like HTML. A few PDF readers have tried to implement a "Reading View" that doesn't preserve the page size, but not many do that, and as for the results, YMMV.

  4. Anonymous Coward
    Anonymous Coward

    James is not a fan of PDFs. They're ugly and inelegant, he explained. But

    despite his valiant effort to stir proper discussion - the world marches on :(

  5. FF22

    Dunning-Kruger

    Firtman is right. This guy is so ignorant, that he's incapable of recognizing how stupid his idea is, for a multitudes of reasons also ADDITIONAL to not fulfilling the goal he supposedly set out for it.

    1. quxinot

      Re: Dunning-Kruger

      If PDF is the answer, the question needs slapped.

    2. rg287

      Re: Dunning-Kruger

      Firtman is also wrong.

      PDF was originally designed to fix a layout for print, but now includes features like text reflow for viewing on arbitrary screen sizes. Like HTML it has evolved beyond it's original design goals. Is that a good thing? That can be debated ad nauseam, but if HTML is allowed to evolve to HTML 5 then PDF is surely allowed a crack at branching out.

      In all honesty, PDF often "just works", even if it's not as elegant as other options in various scenarios. You can archive it, checksum it. So many sites these days basically break the Wayback Machine because the Web Archive simply can't keep up with the dependencies and dynamic structure of what could (and often should) be a static page.

      In the meantime: Behold, the lightest, most compatible and responsive site on the web. An example to us all.

      1. nijam Silver badge

        Re: Dunning-Kruger

        > PDF often "just works"

        Unless the originator has incorporate fonts that aren't available to the recipient.

        1. Falmari Silver badge

          Re: Dunning-Kruger

          They will be available if they are embedded in the pdf which they are in this case.

      2. The Sprocket

        Re: Dunning-Kruger

        ". . . of what could (and often should) be a static page." Agreed 100%

        That IS light and responsive. I designed sites like that in the late 90's with Adobe PageMill for 640/800 pages.I loved it!! God, I miss those days.

      3. NetBlackOps

        Re: Dunning-Kruger

        Back in the day, early 1990's, PDF technology was at the core of the NeXT workstations even being used for display. The NeXT was the machine that Tim Berners-Lee used to create the web, both server and client.

        1. DS999 Silver badge

          Re: Dunning-Kruger

          Postscript was used for display on the NeXT, not PDF.

          When Apple bought it and used it as the basis for OS X, they dropped Postscript due to licensing but its replacement (Quartz) uses something very similar to PDF's object graph. iOS also uses Quartz.

      4. W.S.Gosset

        Re: Dunning-Kruger

        > Behold, the lightest, most compatible and responsive site on the web

        Heh. Spot on.

        His mention of jQuery reminded me: isn't it jQuery which embeds in its core library a bitmap photo of the original coder?

    3. Anonymous Coward
      Anonymous Coward

      Re: Dunning-Kruger

      Classic. Call someone ignorant, and cite Dunning-Kruger, whilst at the same time not even understanding his point, despite him even mentioning in the article the problems with PDF.

  6. Dr Paul Taylor

    re-sizing and re-wrapping text

    I agree with most of the sentiments of this project, but surely going back to sanitised HTML would be better.

    I am finding it increasingly difficult to read small text, so a basic thing I do with web pages is to enlarge them. In basic HTML, the text gets re-wrapped. "Clever" web pages often have fixed line lengths, making this impossible. (I do also resize PDFs.)

    1. Pascal Monett Silver badge

      Re: re-sizing and re-wrapping text

      I agree with you. The sentiment is most assuredly well-placed, but plain HTML would have been a much more valid demonstration.

      What is killing the Web's usefulness is JavaScript. The over-reliance on this demonstrably dangerous technology is not just a nuisance, it is responsible for 99% of all malware infestations and, at the best of times, just transforms a web site into something you can't even bookmark properly.

      Long live NoScript !

      1. vtcodger Silver badge

        Re: re-sizing and re-wrapping text

        You're correct of course. For one thing, I doubt that it will ever be possible to secure an internet that permits Javascript or any other form of scripting. You'll surely end up playing Whack-a-mole with an boundless sea of exploitable OS and application vulnerabilities. BUT, if you don't allow scripting, how are you going to do slippy maps (Google Maps, OSM, etc)? Editable forms? Spell checking?

        For example, composing and posting this message looks to require Javascript.

        1. Dr Paul Taylor

          slippy maps

          Hate those too. Where the mouse wheel means "scroll" is most contexts, it seems to mean "wildly rescale" on maps, making it very difficult to focus on a partlcular place and then move smoothly.

          Kill Javascript!

        2. ThatOne Silver badge
          Unhappy

          Re: re-sizing and re-wrapping text

          > if you don't allow scripting, how are you going to do

          I'd say there are indeed valid uses for JavaScript, like those you quoted for instance. As a tool JavaScript has its uses, but like a knife it can easily be used in harmful and annoying ways, and today's web is full of examples.

          Don't accuse JavaScript, accuse greed and arrogance.

          Which is why the situation isn't going to improve, quite on the contrary.

        3. Irony Deficient

          composing and posting this message looks to require Javascript

          I’m able to compose and post comments here without JavaScript; if I can do that, any other commentard here should be able to do so too.

        4. doublelayer Silver badge

          Re: re-sizing and re-wrapping text

          "BUT, if you don't allow scripting, how are you going to do"

          "slippy maps (Google Maps, OSM, etc)?": Well, you could have a client program which you download and run so you know what is running what code, but at some point you either have to let a complex program run on a browser or not.

          "Editable forms?"

          With the HTML tags form, input, etc. They've been there since 1990. That's what JS pages use too, they've just got extra junk around them.

          "Spell checking?": Your browser does that. It sees a texttarea tag and uses its local spellchecker on it. It works pretty well especially because you control the dictionary and it doesn't have to send the typed text to a remote server to check against their dictionary. Most sites where you see spellcheck in action are doing that. Try turning off scripting on this page and writing some misspelled words in the box and I'm guessing it will be the same. It certainly is for me.

          1. Terry 6 Silver badge

            Re: re-sizing and re-wrapping text

            In fact, for most purposes an editable form should just be a text document with spaces to fill in stuff.

            Only if it's something complex should there be anything more elaborate. ( With a special place to be reserved in Hell for the writers of anything with compulsory fields unless said information is of absolute necessity).

            1. doublelayer Silver badge

              Re: re-sizing and re-wrapping text

              "In fact, for most purposes an editable form should just be a text document with spaces to fill in stuff."

              No, it shouldn't. Unless you're processing everything manually. Having a HTML-based form lets you get the contents of boxes without having to do any parsing and insulates you from whatever people might have done to the surrounding text. A text form is fine if you want to handle it that way, but nothing is wrong with a more common solution that's been supported for decades.

              "With a special place to be reserved in Hell for the writers of anything with compulsory fields unless said information is of absolute necessity"

              No problem with that either. They have to put something in fields which you want because it's your form. If you don't want to provide that information, you can probably just put in junk. If you don't like a form which requires you to enter an email to send them information, then don't send them information; they're obviously not interested in anonymous input.

    2. veti Silver badge

      Re: re-sizing and re-wrapping text

      The thing about sanitised HTML - at a casual glance, it's very hard to differentiate from common, iQuery-infested HTML.

      What he says is true: even good devs with sound instincts find it very hard to resist the lure of "diagnostics" or "feedback". I've done it myself. PDF avoids temptation.

      1. doublelayer Silver badge

        Re: re-sizing and re-wrapping text

        This implies that feedback forms are necessarily wrong. I don't have them on most of my pages mostly because I don't want to deal with the readers' feedback, but if someone does want to give that ability, why should they be limited to writing an email address for someone to copy and paste over? It can be done without JS, it has been done for as long as the web was a thing, it doesn't prevent archiving, so what's the major problem with it?

        1. LybsterRoy Silver badge

          Re: re-sizing and re-wrapping text

          Sometimes making things easy is the wrong thing to do. The best example I can think of is social media. Its easy to post, like, follow etc. Doing a real good job there.

    3. Charlie Clark Silver badge

      Re: re-sizing and re-wrapping text

      PDF cannot escape its Postscript origins: it will always attempt to create fixed sized-pages.

      EPUB would seem the most reasonable format: this is essentially a subset of HTML with a resource tree. Authoring tools are now pretty reasonable and, for those who want it, DRM is supported.

      Still, I bet the guy loves all the attention he's getting!

    4. stiine Silver badge
      Facepalm

      Re: re-sizing and re-wrapping text

      My solution to the tiny text problem was a 55" 4k monitor....which won't fit on a desk...

  7. Anonymous Coward
    Thumb Up

    There's a lot of truth in what he says....

    Of course he know PDFs are a provocation, but what he says about turning a simple way to share information using "hypetexts" into the ugliest way to develop the ugliest applications is true. And the only working business mode is really to litter them with tracking to sell ads - leading straight in the "social networks" issue.

    1. doublelayer Silver badge

      Re: There's a lot of truth in what he says....

      His complaints are true, and his reaction doesn't do anything about it. You can write very ugly pages with HTML, you can have lots of ads and trackers, it makes a lot of the internet a pain. His solution, however, is just not to do that, which could be done as easily with clean HTML as with PDF (better in my opinion). It doesn't change the calculations of the ad-supported business model, nor does it really fix problems. Not only that, but he also has to use some HTML so users can find the PDFs; they do hypertext badly. The complaints aren't original, and the solution isn't an efficient way to resolve any of them.

  8. mark l 2 Silver badge

    "PDF Is not a format suited to share in different formats and diverse devices," he told The Register. "It's a format created for printing. So it's like using a boat to drive across a street."

    PDF's used to be a format for printing, but those Fscktards at Adobe had to try and cram in a load of other functionality which wasn't needed and made the bloated insecure mess that is Acrobat reader and ruined it for everyone.

    Sure Lab 6 are publishing them as PDF/A without the ability to run JS but how do you know it a PDF/A and not a different version which does allow JS without opening it first?

    I personally use the built in Linux Mint PDF reader Xreader which after trying a couple of PDF which contain javascript appears not to support JS so at least that should make it a bit safer to open PDF files from unknown sources.

    1. Long John Silver
      Pirate

      Adobe

      PDF was and remains a handy format for sharing documents. PDF's current ubiquity exemplifies how neither a commercially produced program nor the ideas behind it can remain corralled by copyright/patents for long.

      I am surprised Adobe continues to market it, bells, whistles, and all. Perhaps businesses, too lazy to look for now legitimate alternative sources of PDF software for their Windows-based devices, happily shell out subscriptions. They have become inured to paying through the nose for Windows operating system, and MS office software, so why not Adobe too?

      Linux varieties are well endowed with PDF creation and reading software. I find LibreOffice excellent for creating PDFs. Presumably Windows versions of LibreOffice offer the same facilities. I would not be surprised if MS Office does too.

      1. Falmari Silver badge

        Re: Adobe

        Word does save to pdf.

        Where I work we wrote our own pdf driver to create pdfs.

        I have even added parts to it.

      2. The Sprocket

        Re: Adobe

        The Mac version of LibreOffice does a fine job of creating PDFs.

    2. Irony Deficient

      how do you know [it’s] a PDF/A …

      … and not a different version which does allow JS without opening it first?

      You could feed the document to a PDF/A validator before using your preferred PDF viewer.

      1. ryokeken

        VeraPDF! Thanks this is my app of the week

        Thanks for the link

    3. druck Silver badge

      mark l 2 wrote:

      I personally use the built in Linux Mint PDF reader Xreader which after trying a couple of PDF which contain javascript appears not to support JS so at least that should make it a bit safer to open PDF files from unknown sources.

      I just checked to see if I had that installed, I do, but it's shown as Document Viewer in the open with menu. I've set it to use that, as it does a better job of displaying than LibreOffice.

  9. Sceptic Tank Silver badge
    Trollface

    Flash!

    Will save every one of us!

    1. IGotOut Silver badge

      Re: Flash!

      With Real Audio embed sound!

    2. The Sprocket

      Re: Flash!

      Bwahahaha! Back to the Future.

  10. Howard Sway Silver badge

    Lab 6

    It had to happen : ladies and gentlemen we have found the real Nathan Barley.

    For instance, you could start here, with Issue 2 : A Gemini-PDF polyglot and The Tilde Quilt where "If you enjoyed last month's PDF-MP3 polyglot doc-cast, you're going to love this hybrid Gemini-PDF file".

    You will be dumbfounded by The Tilde Quilt, which "combines the exclusivity of private pubnix membership, the value-hoarding artificial scarcity of sequential integers, the inspiration of spatial constraint, the north star of the unbounded cartesian grid, and the immediacy of named pipes" and is a "brand new media paradigm".

    It is a crappy little piece of ASCII art.

  11. Flywheel

    Couple of things

    I was quite excited when I read the article on my laptop - everything looked good and seemed to make sense. Tried it on my mobile and had to remind myself I'm not a youth, nor a hawk.

    Also, PDFs are largely benevolent, right? But here's an example I found earlier....

    unzip -v pocorgtfo19.pdf

    Archive: pocorgtfo19.pdf

    Length Method Size Cmpr Date Time CRC-32 Name

    - -------- ------ ------- ---- ---------- ----- -------- ----

    15905 Defl:X 14500 9% 2019-01-04 16:19 838d0657 defender.zip

    90291 Defl:X 23139 74% 2019-03-04 19:23 8da8fe88 phrack6612.txt

    181069 Defl:X 181080 0% 2019-01-04 16:19 a00486ec jonesforth.tar.bz2

    ......

    19676 Defl:X 17862 9% 2019-03-09 19:41 d8324af6 polyocamlbyte.zip

    83900 Defl:X 77184 8% 2019-01-04 16:19 c1f51574 PDFRick.zip

    - -------- ------- --- -------

    33897715 30472096 10% 33 files

    .. and yes, it acts exactly like you'd a PDF to.

    1. Irony Deficient

      … and yes, it acts exactly like you’d [expect] a PDF to.

      When running the command

      file pocorgtfo19.pdf

      I’d expect a PDF to return something like

      pocorgtfo19.pdf: PDF document, version version_number

      rather than something like, say,

      pocorgtfo19.pdf: Zip archive data, at least v2.0 to extract

      1. Irony Deficient

        Re: … and yes, it acts exactly like you’d [expect] a PDF to.

        If it’s a bit of applied steganography in action (similar to a .gif image and a .zip archive “cohabitating” in the same file), then something like

        strings pocorgtfo19.pdf | grep '\....UT'

        might reveal some hints.

      2. Anonymous Coward
        Anonymous Coward

        Re: … and yes, it acts exactly like you’d [expect] a PDF to.

        .pdf is a container, similar to .mov. Running which ever command will only yield it's expected results, not the results of an undefined recursive lookup. If your going beyond human representation, then you don't want recursive behavior unless you find a match to something specific.

        I feel both of the examples above would be better off using something like ExifTool (or whatever), but I'm not sure why at least 1 pipe (to any filter) wasn't included in either command.

  12. karlkarl Silver badge

    It is all of our fault really. We are the idiots that keep on the treadmill of misfeatures.

    Why don't we just grab the source to Firefox ~3.x, backport as many security fixes as we can identify and just log bug reports with all website vendors that don't render properly on it?

    I don't even foresee there being too many security patch backports. Many of these crept up with the much later feature bloat.

    1. W.S.Gosset

      Firefox 3 ?!!?

      !

      Netscape.

  13. Plest Silver badge
    Facepalm

    JavaScript abuse killed the web.

    Loading up 4MB of libraries just to put a colour gradient behind some text, that killed web technology! People who have no design experience or qualifications calling themselves designers and putting together apps and pages that are impossible to use. Design is not just about what something looks like, it's about practical considerations around use, something at least 80% of websites fail on in their owners race to have the modern equivalent of Homer Simpson's Dancing Jesus on the front page!

    1. ecofeco Silver badge

      Re: JavaScript abuse killed the web.

      This. But I blame marketing first, THEN the designers.

  14. poohbear

    FWIW back in the 90s Adobe was pushing very hard for PDF to be what the web was built on, rather than HTML.

    1. JulieM Silver badge

      I suspect that when you are selling expensive, proprietary hammers, everything looks like a nail.

  15. heyrick Silver badge

    So it's like using a boat to drive across a street."

    Have you not been watching the news?

    Sometimes, a boat is the only way you're getting across that street...

    1. Anonymous Coward
      Anonymous Coward

      Re: So it's like using a boat to drive across a street."

      My boat is all I have left. My car floated across the street(s) days ago XD

      :(

      1. David 132 Silver badge

        Re: So it's like using a boat to drive across a street."

        Ah, if you wanted the car to stay put while the water flowed around it, should have bought a Morris Marina. The clue's in the name.

      2. W.S.Gosset

        Re: So it's like using a boat to drive across a street."

        I remember back in the late 80s floating my car at moderate speed. The road vibration died away as the wheels came off the road and it was then that I twigged that what I had thought was just a streetlight which had blown down, was actually the cabin light of a mostly submerged car drifting towards me.

        I had a glorious vision of The World's Best Ever Insurance Claim:

        "My right front wheel broke his windscreen as I sailed slowly over the top of him."

        .

        Sadly, at that point both front bungs popped, two little fountains burst up in the footwells, and it was out the window to push it back to "shore" whilst it was still buoyant. Some mates turned up, towed me up to the top of the hill, and we clutch-started it on the full downhill run. First 2 times there was water coming out the exhaust pipe, but the third time it caught, and I drove home via the hill-top route.

        Good little car, that. Toyota Corolla station wagon, manual, rear wheel drive.

        And Brisbane, back when it rained. And thundered ànd lightninged and flooded routinely. Gotta wait 10yrs before that starts again. Boo.

  16. martinusher Silver badge

    Yes, but let's miss the point and argue about trifles

    This sums up my sentiment about the current state of the web. It is all active content with the result that it fails as an information serving network, its now just a bandwidth and system hogging mess that is entirely self serving. Its also a major security risk -- passive content cannot host malware, its just data. Instead of looking at the big picture, though, the discussion rapidly degenerates to back and forth about the deficiencies of PDF, how HTML version this or that provides essential capabilities and even how disability unfriendly pure text is. Its just smoke, missing the point entirely

    Ad-tech is a self serving business, a vortex of believers who absorb huge amounts of resources because the moment people stop believing in it a huge section of the economy would collapse. That's why you don't see much information about the effectiveness of advertising. I know that everyone likes to get paid but you have to draw a line somewhere and from my perspective that line was crossed a decade or more ago.

    1. doublelayer Silver badge

      Re: Yes, but let's miss the point and argue about trifles

      These discussions aren't as pointless as you think. We all agree that ads are a nightmare, probably useless, and that we'd be happier without them. Well, probably some people don't, but I think most of us here do and I'm prepared to skip the people who don't for now. The problem is what we do about this belief above the ad blocking systems we have running. The stated solution in this case is PDF, which isn't really a good solution because, if it was used by others, ads could be integrated in it as well through JS integration. This guy has simply decided not to put in any ads, which is great, but he could have used lots of formats to do that.

      Since many options are available, it's worth discussing what format is optimal. Let's say I wanted to kill adtech so I decided to send you all my pages as image files of the rendered text, and to prevent possible bugs in decompression software, in an uncompressed format. That does technically solve the problem of active code in documents--you're not getting any scripting in a raw array of pixels. However, that's a terrible solution and you should tell me that before I waste my time on it. PDF is not that bad, but it also has problems that make its use suboptimal in this case. Hence it is worth discussing what would be best for a format without advertising or tracking.

    2. NetBlackOps

      Re: Yes, but let's miss the point and argue about trifles

      Plain text is not disabilty unfriendly. As a matter of fact, it's the easiest on my screen reader in use here.

  17. steelpillow Silver badge
    Facepalm

    Flash! A-aah! (Subtitled Adobe-dobe-do)

    At least nobody ever tried to do this before and embedded active content like Flash in their PDFs.

    Oh, wait...

    (Mind you, it was rather a long time ago now, wasn't it Adobe?)

  18. heyrick Silver badge

    He has a point

    It will never be finished, so the bugs will never be fixed

    Why fix bugs when you can include ways of finding out the battery state of the host, or accessing a timer with enough resolution to theoretically detect cache misses in the processor. It seems to me that a lot of these things are somebody thinking "this would be awesome!" and throwing together some code without taking any time to ask "now I have this idea, how can it be abused"?

    and the complexity has grown so immense that nobody other than the incumbent browser vendors can realistically implement the web at all any more

    Not to mention, any book on the subject (usually split into multiple books for the markup, the CSS, the scripting, and getting it all working together) is liable to be out of date the moment it rolls off the press.

    he took issue with the way independent bloggers have moved to specific platforms rather than running their own services

    Because running your own service can be a nightmare. Using a third party service puts you at the whim of somebody else, but on the other hand that somebody else gets to worry about the updates, the security, the...

    but he said the cost has been the death of format experimentation as content gets squeezed into standard templates

    True, but I'd imagine the majority of bloggers aren't geeks. They may even create their content using an app and have no idea of what actually makes it all work.

    and distributed through a handful of aggregators.

    The flip side of this is that people don't want to search all over the place to find a blog they might be interested in. Things that aren't all tidily listed on one of the main bollocks-regurgitators can be hard to find (if not near impossible).

    The kind of recipes you find on page one of search results seem to exist solely for the sake of attracting eyeballs

    Forget recipes, that's true for a vast amount of stuff these days. I was looking for a manual for an old bit of garden equipment. There was a site that looked useful that said it had manuals for what I wanted. It lied. It was basically just that text on a page with lots of adverts (well, blank spaces where adverts were supposed to have been ;) ). Sadly this came quite high in the search listing, top ten (maybe top five).

    not because someone genuinely loved a recipe and wanted to share it.

    Argh! Copyright infringement! Go straight to hell! Do not collect £200!

    You can still write a document in very plain HTML but there’s no demarcation between static documents and web applications.

    Sure there is. If it works with scripting disabled (that's your default, I trust) then it's not an application.

    Even techies who pride themselves on writing efficient lightweight markup usually can’t resist putting a comment feedback form on the page or hiding some tracking JavaScript in the background.

    I wrote my own blog code in PHP. It's crap but functional. The only scripting hidden on the page is a little Easter Egg if the user enters the Konami code. It isn't necessary in order to use the system. There's a comment form, yes, but zero scripting necessary, any validation is handled server side. No embedded advertising. And no tracking. Just like Happy Harry Hard-on, I "assume" that there are three people following. There may be more, I don't care. I write stuff because I enjoy it, not because I think I'll get paid per eyeball (I don't get paid at all, and I've refused outfits offering to pay me to let them write a guest article that's really an advert).

    He does make some valid points, but I can't help but feel that he's trying to shout down a hurricane. The web of old evolved into this mutated monster because it's what people wanted. And when I say people, I don't mean like you and me, I mean like the estimated 2.85 billion Facebook users and/or the billion Instagram users. Most of them won't give a crap how it works, only that it does.

  19. mevets

    tempting....

    to put up a service that takes url's in, and produces pdfs out, so I can have a sort of global reader mode for the web....

    1. Anonymous Coward
      Anonymous Coward

      Re: tempting....

      https://www.sejda.com/html-to-pdf

      https://pdfcrowd.com/

  20. John Savard

    HTML can serve

    How about just use HTML, but with browsers that can't handle any of the modern features like JavaScript and popups?

    And I have always believed that tabs are for word processors, and do not belong in programming languages.

  21. Jonjonz

    The saddest thing is how this complexity makes archiving for posterity (and future knowledge) the internet. Even now trying to go back more than six months to a site/page that answered a question you had is getting iffy. So much information eternally lost.

  22. Social Ambulator

    Yes, the problem is advertising…

    …but pdf is not the answer.

    You have to change the mindset — yours at least, if you can’t educate the sheep. Publishing on the web costs money, just like running a symphony orchestra. Who pays? There are three models: 1. You subscribe to a site (The Financial Times, Jansis Robinson’s Wine site etc.) for regular content that you value. 2. You take advantage of “patrons” who make a gift of their resources — the BBC (patrons the British taxpayer), Genbank (patron the US Taxpayer), Wikipedia (individual donations) hobby sites from organizations that wish to attract members, or individuals such as myself and my wife who are happy to do it their selves. 3. You take your chance in a sea of advertising excrement.

    But if you prate on about “the web must be free” because you are too mean to pay its price or too stupid to realize it has a price, you end up believing that the solution to the mess is technical specifications.

  23. Binraider Silver badge

    While I probably wouldn't pick PDF as a shining light for standards rebellion, I do appreciate the sentiment. Chrome is almost impossible to escape, what with at least 3 major browsers that are basically reskins of it - and the advertising claws that come with it. I still have good memories of the Proxomitron; a web content filtering proxy. Rather difficult thing to do in present day versus dynamically generated pages.

    As much as a new format would be fun (and possibly ad-liberating), the usual barriers of forked efforts and critical mass would apply here.

    Sir Tim is onto something with his criticism of how the web has developed, but what would an alternative look like and be made of? And how would you prevent it being subverted too?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like