back to article US Patent Office to take only DOCX in future – or PDFs if you pay extra

Documents submitted to the US Patent and Trademark Office should be in .DOCX format starting from next year – and if you want to stick to PDFs, that will cost extra. “At the USPTO, we are continuously working to modernize and streamline our patent application systems,” the agency announced this week. “To improve application …

  1. Mr. Balise
    Thumb Up

    The best idea always wins. SGML was the best idea, XML only its native son. PDF, once a clever ps extension, is the dirty old money making tat its inventors always meant it to be. Short term America at its greediest. I speak from what I know tell ya.

  2. ShadowSystems

    You want interoperability?

    Plain. Fekkin. Text.

    No special fonts, no colours, no special characters that get mangled by OCR, It. Just. Works.

    "You don't have to use MS Office to produce it, you can use..."

    Stop. I don't care what format(s) you use internally or for archiving, but the applicant submissions should be in plain text. No obfuscation, no bullshit buzzword bingo, no "on a computer", just plain text attached to a physical working prototype to prove you can actually *make* what your brain shits out. If you can't build the prototype, you obviously don't know the subject well enough to deserve a patent on it, now do you?

    "Why should I need to prototype a software patent application?"

    You shouldn't, because software should not be patentable. Copyrightable yes, patentable no. It's that whole "on a computer" bit. If it's on a computer then fek off.

    Plain. Damned. Text... It. Just. Works.

    1. Ken Hagan Gold badge

      Re: You want interoperability?

      With ASCII art? And formulae written out in TeX source code? No thanks. There's a reason that normal people use rich text formats. Plain text is inadequate for most forms of technical communication even before you insist on a layer of legalese.

      1. Flocke Kroes Silver badge

        Re: ASCII art

        Back when I looked at patents (an offence punishable by triple damages) ASCII art would have been an improvement. Patents are not about technical communication. They need to be specific enough to avoid being rejected because of prior art. Apart from that, vagueness is the next highest priority to cover as much ground as possible.

        I expect tech communication to arrive as PDF because of its high interoperability and expressiveness. None of this "you have a patent licence to precisely implement the standard but if you make it compatible with Word instead we can sue you to bankruptcy."

      2. LybsterRoy Silver badge

        Re: You want interoperability?

        "There's a reason that normal people use rich text formats."

        Yup and that reason is assholery.

        If you are trying to communicate you do not need rich text. You may need a diagram but that is not rich text and could, for technical documents, be attached separately.

        EMails are a brilliant example of why you do NOT need rich text. The BBC news site is a good example of why its bad to allow pictures to be inserted in text.

        Rich text generally equals "oooo pretty" not good communication.

    2. bazza Silver badge

      Re: You want interoperability?

      It's also a bit tricky to attach a copy to a physical working example of, say, a comb over. Staples? Or glue?!

      1. Tom 7

        Re: You want interoperability?

        Is there any patent the US PTO has had a working example for in the last 40 years?

        1. Flocke Kroes Silver badge

          Re: Prototypes

          Not sure about 40 but the wheel was patented under 50 years ago in the US (and much more recently in Australia).

          1. Kane
            Alien

            Re: Prototypes

            "Not sure about 40 but the wheel was patented under 50 years ago in the US (and much more recently in Australia)."

            All right, Mr Wiseguy, you're so clever, you tell us what colour it should be.

      2. LybsterRoy Silver badge

        Re: You want interoperability?

        Staples - a stationery company - just give them the address but why you would want to patent it I do not know.

        Glue on the other hand is generally easy to attach things to.

    3. Anonymous Coward
      Anonymous Coward

      "Plain. Damned. Text.."

      Why we should be limited by the limitations of primitive computers? Human never used only "plain damned text" because it is unable to communicate everything and human beings always used specific text formats and drawings to communicate better.

      The fact that mainframes and Unix systems later were limited to poor English-only text and monospaced font is something we should put in the IT Stone Age and leave it there - thanks to heaven it evolved too - sometimes some IT people look really like religious zealots - everything not written by K & R themselves in their stones cards is something demoniac that should be avoided.... c'mon, the XXI century began twenty years ago....

      1. LybsterRoy Silver badge

        Re: "Plain. Damned. Text.."

        What you've typed seems to indicate the validity of your argument that you need prettiness as well as words.

        1. Anonymous Coward
          Anonymous Coward

          Re: "Plain. Damned. Text.."

          It was a great thing that computers were invented only recently. Looking at some IT people fixated with not improving anything because of fears to learn something new, probably we would have not invented the wheel because Stonix didn't have one.

          A document organization goes far beyond text formatting - something that evidently many still fail to understand 500 years after Gutenberg - given how some people use a word processor today.

          1. MrDamage Silver badge

            Re: "Plain. Damned. Text.."

            You do realise computers predate cars, right?

            1. doublelayer Silver badge

              Re: "Plain. Damned. Text.."

              "You do realise computers predate cars, right?"

              That's a stretch. Conceptually, no they didn't, because cars are based on wheeled vehicles and just added the engine. As purchasable products, no they didn't, because general purpose computers, even those which broke down all the time, arrived a couple decades after cars that lots of people bought. The only method I can think of which allows it to work is if you're counting from first attempt to manufacture something of the kind, in which case the computer slightly predates the car, but then the computer got stuck and the car didn't.

      2. Arthur the cat Silver badge
        Trollface

        Re: "Plain. Damned. Text.."

        Human never used only "plain damned text" because it is unable to communicate everything

        Insist that all patent specifications are done by interpretive dance. It couldn't make the USPTO any worse.

    4. dajames

      Re: You want interoperability?

      Plain. Damned. Text... It. Just. Works.

      Well said ... but patent applications will include pictures and diagrams, so I'd allow MarkDown with PNG images.

      There is absolutely no justification for the Patent Office to worry about difficulties representing layout and fonts as the appearance of the application on the page has no bearing on the meaning of its text.

      ... and do they really imagine that those diffiiculties can be avoided by using .DOCX?!!!

    5. doublelayer Silver badge

      Re: You want interoperability?

      Plain text in which encoding? Does the line ending matter? How about long lines vs short lines? Mathematical formula in what format? Written as words or assume the reader knows LaTeX? Tables allowed or not? If written, use the Unicode box drawing characters; only _, |, and -; or no lines, just line up with whitespace? Unicode or ASCII quotation marks? Diagrams permitted as separate images referenced by filename?

      Plain. Text. Lots. Of. Questions.

  3. binaryhermit

    How well does DOCX work with non-Microsoft tools?

    Also, why not ODF? Maybe I'm out of the loop, but I seem to remember OOXML being less open than ODF.

    1. Joe W Silver badge

      Short answer: not well. Not well enough. Things get mangled. A lot. Collaboratively writing papers with Word is such a mess. The comment function is good, that works well (since I have not yet experienced the stupid update where a comment sends mails). The rest? A bloody mess.

      Don't get me started with figures and their captions and reformatting text. This is a major pain in the proverbial - also with MS Word. Sure, you add the caption from a context menue, but the stupid program has no recollection that these two (figure and caption) belong together. Oh, and when you open a document on two different machines (even company machines, centrally administrated) the whole layout gets mangled if you sit in different offices with different standard printers. The printer determines the paper margins (it seems), and that in turn the available space on the page. Then, figures move around, captions get detached (also happens with tables, of course), and you can start over again. Burn it with fire!

      1. Screwed

        It was always such fun opening Word documents on a computer which was primarily used for other tasks such as printing labels. With a small format label printer set as default.

        Add allowing Windows to change your default printer automatically.

        This is why I have sometimes set the default printer to whatever print to PDF option has been available. You can get Word to be pretty consistent at the cost of always having to select a specific printer if you actually want paper output.

        And agree about captioning. Whoever would have thought a caption belongs with something else?

    2. bazza Silver badge

      As with any "standard" the challenge is to understand what it means. The XSD schema for all MS's formats are public and curated by the library of Congress. You can download these schemas, get a code generator from someone like Objective Systems and automagically end up with source code in a language of your choice that is able to parse the files from an unzipped docx, turning each one into an object(s) in your program. There's also binary blobs for jpegs, etc.

      The tricky part so far as building a word processor application is in understanding how to render all that on screen / paper. Understandings vary...

      However, I suspect that the USPTO isn't interested in rendering any of it to screen. If all they want to do is search and compare content, then all they need is the parsed objects. And that is well understood thanks to the schemas.

      So all in all, not a bad idea.

      The simplicity of parsing the docx files shows up in various places. Beyond Compare, the best comparison tool on the planet in my opinion as a paid up user, does a really neat job with docx comparison, showing just the text differences. It's curiously useful to be able to do that.

  4. JWLong

    USPTO

    It should be shut down, blown up, and reinvented as the "USDoSE".

    UNITED STATES DEPARTMENT of SHIT ENGINEERING

    1. bazza Silver badge

      Re: USPTO

      Let's be charitable. This move may improve the quality of their work. We could even wish them well with this endeavour.

      (I'll leave the choice of icon up to your imagination)

  5. Potemkine! Silver badge

    The XML nature of the format, we're told, makes it easier for the patent office's internal systems to automatically extract and process the content of text, tables, graphs, and schematics

    You're told? Come on, it's obvious XML is great to automatically extract and process data.

    1. Flocke Kroes Silver badge

      You are confusing OOXML with XML. OOXML has a weasel worded patent license and pointless binary blobs. If you want an XML document, select ODF.

  6. Anonymous Coward
    Anonymous Coward

    Ah XML...

    XML is the perfect balance of not really human readable and not very easy to parse programmatically.

    1. Arthur the cat Silver badge

      Re: Ah XML...

      XML is Lisp S-expressions done badly. At least with Lisp you don't have to worry whether the end brackets match the opening ones.

      1. doublelayer Silver badge

        Re: Ah XML...

        "At least with Lisp you don't have to worry whether the end brackets match the opening ones."

        Which makes it much easier to miss one of the brackets and mess up the entire expression since there are now several places you could fix it, only one of which works. If the closing ones have to match, then the compiler can tell you where the mismatch occurred. If generated automatically, the readability problem isn't so bad.

  7. thx1111

    PDFs a pain for machines to grok

    > PDFs, meanwhile, are more of a pain for machines to grok ...

    PDFs, meanwhile, are more of a pain for machines and humans to grok ...

  8. rcxb Silver badge

    worst-case rely on character recognition techniques to scrape the text into an easier format.

    Well that's not a fair comparison. Somebody could scan a piece of paper and insert it as an image into an DOCX file just as easily. PDFs (that aren't just images of scanned pages) are trivially easy to extract to images and text.

    1. doublelayer Silver badge

      No, they're not. There are a bunch of tools that create PDFs and they do weird things to text layers. Sometimes they'll omit some characters because a font they used had a different glyph for a few combinations. Because their font didn't keep the characters apart, they're left out of the text. I've seen that repeatedly. Or there will be a table, and the text from the table will come out just as it went in, but the table's organization is completely destroyed. Row major, column major, sometimes completely random order, there's no way to tell. Whitespace sometimes indicates what it used to be, but it never lines up completely so if you try to split on whitespace you'll find the table is not easily parsed. There are lots of ways getting just the text out of a PDF won't work.

    2. James O'Shea

      Not if the original document has tables, or captioned figures or even just more than one column per page. Almost all OCR has serious trouble with tables and columns. Inserting a scanned in, not OCRed, picture of a page is worse. Unless the pic is high res, the text in it will be blurry... which is why the pic has to be high res just to OCR it. And you're at the mercy of whatever ink/toner was used, and what kind of paper, and how old it is. Extracting text from that mess is very difficult.

  9. Anonymous Coward
    Thumb Up

    The UK approach to document formats

    It's not often that I applaud a govt. initiative but this is one. I won't spell it out because the page I link below is short and sweet:

    https://www.gov.uk/government/publications/open-standards-for-government/sharing-or-collaborating-with-government-documents

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like