The best idea always wins. SGML was the best idea, XML only its native son. PDF, once a clever ps extension, is the dirty old money making tat its inventors always meant it to be. Short term America at its greediest. I speak from what I know tell ya.
US Patent Office to take only DOCX in future – or PDFs if you pay extra
Documents submitted to the US Patent and Trademark Office should be in .DOCX format starting from next year – and if you want to stick to PDFs, that will cost extra. “At the USPTO, we are continuously working to modernize and streamline our patent application systems,” the agency announced this week. “To improve application …
COMMENTS
-
Thursday 27th May 2021 02:21 GMT ShadowSystems
You want interoperability?
Plain. Fekkin. Text.
No special fonts, no colours, no special characters that get mangled by OCR, It. Just. Works.
"You don't have to use MS Office to produce it, you can use..."
Stop. I don't care what format(s) you use internally or for archiving, but the applicant submissions should be in plain text. No obfuscation, no bullshit buzzword bingo, no "on a computer", just plain text attached to a physical working prototype to prove you can actually *make* what your brain shits out. If you can't build the prototype, you obviously don't know the subject well enough to deserve a patent on it, now do you?
"Why should I need to prototype a software patent application?"
You shouldn't, because software should not be patentable. Copyrightable yes, patentable no. It's that whole "on a computer" bit. If it's on a computer then fek off.
Plain. Damned. Text... It. Just. Works.
-
-
Thursday 27th May 2021 07:41 GMT Flocke Kroes
Re: ASCII art
Back when I looked at patents (an offence punishable by triple damages) ASCII art would have been an improvement. Patents are not about technical communication. They need to be specific enough to avoid being rejected because of prior art. Apart from that, vagueness is the next highest priority to cover as much ground as possible.
I expect tech communication to arrive as PDF because of its high interoperability and expressiveness. None of this "you have a patent licence to precisely implement the standard but if you make it compatible with Word instead we can sue you to bankruptcy."
-
Thursday 27th May 2021 11:45 GMT LybsterRoy
Re: You want interoperability?
"There's a reason that normal people use rich text formats."
Yup and that reason is assholery.
If you are trying to communicate you do not need rich text. You may need a diagram but that is not rich text and could, for technical documents, be attached separately.
EMails are a brilliant example of why you do NOT need rich text. The BBC news site is a good example of why its bad to allow pictures to be inserted in text.
Rich text generally equals "oooo pretty" not good communication.
-
-
-
-
-
Thursday 27th May 2021 08:40 GMT Kane
Re: Prototypes
"Not sure about 40 but the wheel was patented under 50 years ago in the US (and much more recently in Australia)."
All right, Mr Wiseguy, you're so clever, you tell us what colour it should be.
-
-
-
-
-
Thursday 27th May 2021 07:47 GMT Anonymous Coward
"Plain. Damned. Text.."
Why we should be limited by the limitations of primitive computers? Human never used only "plain damned text" because it is unable to communicate everything and human beings always used specific text formats and drawings to communicate better.
The fact that mainframes and Unix systems later were limited to poor English-only text and monospaced font is something we should put in the IT Stone Age and leave it there - thanks to heaven it evolved too - sometimes some IT people look really like religious zealots - everything not written by K & R themselves in their stones cards is something demoniac that should be avoided.... c'mon, the XXI century began twenty years ago....
-
-
Thursday 27th May 2021 13:05 GMT Anonymous Coward
Re: "Plain. Damned. Text.."
It was a great thing that computers were invented only recently. Looking at some IT people fixated with not improving anything because of fears to learn something new, probably we would have not invented the wheel because Stonix didn't have one.
A document organization goes far beyond text formatting - something that evidently many still fail to understand 500 years after Gutenberg - given how some people use a word processor today.
-
-
Thursday 27th May 2021 22:16 GMT doublelayer
Re: "Plain. Damned. Text.."
"You do realise computers predate cars, right?"
That's a stretch. Conceptually, no they didn't, because cars are based on wheeled vehicles and just added the engine. As purchasable products, no they didn't, because general purpose computers, even those which broke down all the time, arrived a couple decades after cars that lots of people bought. The only method I can think of which allows it to work is if you're counting from first attempt to manufacture something of the kind, in which case the computer slightly predates the car, but then the computer got stuck and the car didn't.
-
-
-
-
-
Thursday 27th May 2021 10:20 GMT dajames
Re: You want interoperability?
Plain. Damned. Text... It. Just. Works.
Well said ... but patent applications will include pictures and diagrams, so I'd allow MarkDown with PNG images.
There is absolutely no justification for the Patent Office to worry about difficulties representing layout and fonts as the appearance of the application on the page has no bearing on the meaning of its text.
... and do they really imagine that those diffiiculties can be avoided by using .DOCX?!!!
-
Thursday 27th May 2021 22:25 GMT doublelayer
Re: You want interoperability?
Plain text in which encoding? Does the line ending matter? How about long lines vs short lines? Mathematical formula in what format? Written as words or assume the reader knows LaTeX? Tables allowed or not? If written, use the Unicode box drawing characters; only _, |, and -; or no lines, just line up with whitespace? Unicode or ASCII quotation marks? Diagrams permitted as separate images referenced by filename?
Plain. Text. Lots. Of. Questions.
-
-
-
Thursday 27th May 2021 04:36 GMT Joe W
Short answer: not well. Not well enough. Things get mangled. A lot. Collaboratively writing papers with Word is such a mess. The comment function is good, that works well (since I have not yet experienced the stupid update where a comment sends mails). The rest? A bloody mess.
Don't get me started with figures and their captions and reformatting text. This is a major pain in the proverbial - also with MS Word. Sure, you add the caption from a context menue, but the stupid program has no recollection that these two (figure and caption) belong together. Oh, and when you open a document on two different machines (even company machines, centrally administrated) the whole layout gets mangled if you sit in different offices with different standard printers. The printer determines the paper margins (it seems), and that in turn the available space on the page. Then, figures move around, captions get detached (also happens with tables, of course), and you can start over again. Burn it with fire!
-
Thursday 27th May 2021 06:43 GMT Screwed
It was always such fun opening Word documents on a computer which was primarily used for other tasks such as printing labels. With a small format label printer set as default.
Add allowing Windows to change your default printer automatically.
This is why I have sometimes set the default printer to whatever print to PDF option has been available. You can get Word to be pretty consistent at the cost of always having to select a specific printer if you actually want paper output.
And agree about captioning. Whoever would have thought a caption belongs with something else?
-
-
Thursday 27th May 2021 06:11 GMT bazza
As with any "standard" the challenge is to understand what it means. The XSD schema for all MS's formats are public and curated by the library of Congress. You can download these schemas, get a code generator from someone like Objective Systems and automagically end up with source code in a language of your choice that is able to parse the files from an unzipped docx, turning each one into an object(s) in your program. There's also binary blobs for jpegs, etc.
The tricky part so far as building a word processor application is in understanding how to render all that on screen / paper. Understandings vary...
However, I suspect that the USPTO isn't interested in rendering any of it to screen. If all they want to do is search and compare content, then all they need is the parsed objects. And that is well understood thanks to the schemas.
So all in all, not a bad idea.
The simplicity of parsing the docx files shows up in various places. Beyond Compare, the best comparison tool on the planet in my opinion as a paid up user, does a really neat job with docx comparison, showing just the text differences. It's curiously useful to be able to do that.
-
-
-
-
Thursday 27th May 2021 22:20 GMT doublelayer
Re: Ah XML...
"At least with Lisp you don't have to worry whether the end brackets match the opening ones."
Which makes it much easier to miss one of the brackets and mess up the entire expression since there are now several places you could fix it, only one of which works. If the closing ones have to match, then the compiler can tell you where the mismatch occurred. If generated automatically, the readability problem isn't so bad.
-
-
-
-
Thursday 27th May 2021 22:29 GMT rcxb
worst-case rely on character recognition techniques to scrape the text into an easier format.
Well that's not a fair comparison. Somebody could scan a piece of paper and insert it as an image into an DOCX file just as easily. PDFs (that aren't just images of scanned pages) are trivially easy to extract to images and text.
-
Thursday 27th May 2021 22:54 GMT doublelayer
No, they're not. There are a bunch of tools that create PDFs and they do weird things to text layers. Sometimes they'll omit some characters because a font they used had a different glyph for a few combinations. Because their font didn't keep the characters apart, they're left out of the text. I've seen that repeatedly. Or there will be a table, and the text from the table will come out just as it went in, but the table's organization is completely destroyed. Row major, column major, sometimes completely random order, there's no way to tell. Whitespace sometimes indicates what it used to be, but it never lines up completely so if you try to split on whitespace you'll find the table is not easily parsed. There are lots of ways getting just the text out of a PDF won't work.
-
Thursday 27th May 2021 23:14 GMT James O'Shea
Not if the original document has tables, or captioned figures or even just more than one column per page. Almost all OCR has serious trouble with tables and columns. Inserting a scanned in, not OCRed, picture of a page is worse. Unless the pic is high res, the text in it will be blurry... which is why the pic has to be high res just to OCR it. And you're at the mercy of whatever ink/toner was used, and what kind of paper, and how old it is. Extracting text from that mess is very difficult.
-
-
Thursday 27th May 2021 23:32 GMT Anonymous Coward
The UK approach to document formats
It's not often that I applaud a govt. initiative but this is one. I won't spell it out because the page I link below is short and sweet:
https://www.gov.uk/government/publications/open-standards-for-government/sharing-or-collaborating-with-government-documents