Re: Reminds me of Mr. Bastards.
After googling that i recommend reading the Guardian its a funny old world page that comes up in the google search.
Companies House has blocked someone who registered a new biz with a name that contained the right characters arranged in the right order to trigger a cross-site scripting (XSS) attack against users of the service's API. The company in question, registered number 12956509, was originally signed up with the UK's official company …
This post has been deleted by its author
A woman in a prestigious family owned a racehorse called "Fanny"
I recall a radio anouncement - "We apologise for an earlier announcement in our racing summary when we stated that Lady Argyle's "Fanny" had been scratched. That was incorrect. We have since learned that Lady Argyle's "Fanny" had not been entered.
Like Companies House, the various national registration bodies for racehorses also ensure that nothing naughty is allowed.
Occasionally one slips through, like the Aussie nag "Hoof Hearted". Innocently horsey enough, until you shout it rapidly and repeatedly like, for example, a commentator as it approaches the line in first place...
>Why any system tries to process any data without sanitizing it is beyond me.
Especially if that data originates from outside the organisation.
I was unpopular back in 2005 when implementing a B2B gateway on insisting on having an "application firewall" ie. an appliance that did "deep packet" inspection of received XML files/streams.
Well, yeah, because that's a bad idea and doesn't work. You need to be sure that the system that actually USES the XML is parsing it correctly, including doing complete validation on the syntax AND the structure AND the values parsed out of it. Oh, and Postel was wrong. The right aphorism would be "Be conservative in what you generate, and absolutely inflexible in what you accept". Any deviation from the expected protocol should automatically be a fatal error.
If you try to assure correctness with some kind of outboard filtering hack, you open yourself up to differences between the way the filter parses it and the way the actual application parses it (cf Sassaman, Patterson, Bratus, and Shubina: "The Halting Problems of Network Stack Insecurity"). AND you add attack surface; most of those "application firewalls" are full of dangerous security bugs themselves. AND you create an interdependency that makes upgrades dangerous or impossible. AND you complicate your network so you're more likely to make a fatal mistake.
A filter like that might work as a band-aid on a known bug while the application is being fixed, but in practice they're always expected to deal with ALL bugs, known and unknown. They're invariably used as an excuse never to fix the application, and in fact never to demand that the application be correct in the first place... as well as deterring routine updates of the application or its components, leading to ever-growing technical debt. Essentially no organization has the discipline to avoid this, and it predictably leads to failure.
If you let such a device into your network, you may avoid some immediate problems, but you're setting yourself up to lose really big at some unpredictable future time... in a way you won't be able to recover from because you've made everything overcomplicated, created too many dependencies, and let too many things deteriorate.
If you can't create an application that won't choke on bad XML, then you shouldn't use XML.
Whilenot disagreeing with the idea that the ultimate consumer needs to be robust, I can see an argument for a layered defence, with each layer developed independently, and I can see an argument for a conceptually single layer being implemented in practice as multiple boxes each doing one thing well.
Your initial premise is that when you provide filtering in the firewall you use a different xml parser to your business layer. Any system that includes multiple components to do the same job has issues, wherever they are deployed. The firewall is a bit of a red herring in your example, it's the conflicting parsers that are the root of the problem.
While i appreciate the point you make this isn't as clear cut as you describe.
>If you can't create an application that won't choke on bad XML, then you shouldn't use XML.
From the evidence I've seen over the years, I would agree - the majority of XML developers need to be taken outside and shot!
Now what to use instead of XML..
>you open yourself up to differences between the way the filter parses it and the way the actual application parses it
That's life in the absence of conformance testing! You need to look no further than reading a DOC/ODF file in MS Office and LibreOffice and their differing levels of feature support.
The problem here isn't so much as to assure correctness, but to guard against malformed or as we have here, mischievous XML.
With B2B interfaces both the XML forms and their syntax and semantics should be well defined. So that all you are loading into the XML checker is the schema - it's basically what we did with EDIFACT and ASN.1 ie. standard defensive programming practice...
In this case it also forced both the business and software developers to actually document the XML form (which takes thought and time) rather than simply embed their assumptions in the code. A side effect of this is to introduce a basis for coding efficiency and trust.
An additional problem with the B2B space is the acceptance handshake, namely the broker will deem a form (eg. an order) has been correctly delivered on successful transfer to a third-party system which due to normal DMZ fiewalling isn't typically the end system that will be processing the XML form. So being able to flag that certain forms are incorrectly completed, prevents the broker from raising a 'success' flag (and starting an order fulfilment countdown clock) and will/should cause it to report the failure back to the originator. The issue is generating the error report at the right application level so that in this case the business user placing the order knows their order has failed. In this instance the use of a gateway was deemed appropriate as the receiving XML applications were part of the whitelabel infrastructure underpinning about 30 different websites, so the extra overhead was deemed preferrable to having a malformed XML file taking the whole show down...
Why any system tries to process any data without sanitizing it is beyond me.
Because it's harder than it looks. Its got to the point that if anyone makes it out to be easy I automatically assume they don't know how to do it.
Three points come to mind right away:
What do you mean by "sanitise"? What is potentially hazardous varies on context and sometimes even on library versions or settings. Consider the various forms of regular expression for example. Or that a string safe in SQL may not be in shell.
It is another case where dynamic typing is evil. "Quote all strings, job done" doesn't cut it when you have what should be an integer but it's easy to put a string in its place.
It shouldn't be necessary in any event, it's largely a manifestation of the "let's cobble something together" approach to development and the consequent design of languages and approaches that rise to prominence as a result.
The best approach is architectural, ensuring data and commands simply can't be confused - it tends to be a non-issue in compiled languages for example, but even there you need to be careful when other "languages" are introduced perhaps without considering them as such, e.g. SQL or even REs. Alternate approaches are similarly structural such as parameterised SQL. The final, least preferred option is to render the data inert on use, such as quoted strings in shell.
Companies Ho. doesn't actively police this sort of thing. It's triggered by complaints. Taking data from them without sanitising it is just plain stupid.
Notably, there are no restrictions whatsoever* on what can be registered as a company address. There is nothing stopping you naming your building (or business unit, office, etc) after Bobby Tables.
*Apart from those the Post Office has about addresses in general.
Years ago I went through a phase of adding the first line of address to mailing lists as 'The Bin'. I got junk mail for years after addressed to 'The Bin, 1 The Mall, SW1A 1AA'. (That's Buck House. Not my real address ;)
The Business Names Act (and possibly other peripheral legislation) restricts or prohibits certain choices of company name, but not on the grounds discussed in this piece. Mostly to protect government and infrastructure services or to avoid misrepresentation of scale or scope of businesses.
Don't start me about address formats!
The number of websites I encounter which refuse to let me enter my address properly is infuriating!
It really is not at all a rare thing for hundreds of thousands, if not millions, of people to live in flats, which often have addresses like:
23/4 Apartment Street
42 Tenement Road (3/2)
...but there are so many websites which throw up their hands in horror at the thought that your address includes a slash or brackets, or whatever other character you have tried to substitute instead. Yeah, you're trying to prevent Bobby Tables from doing his stuff, but outright mindlessly rejecting valid and essential characters in people's addresses is not how you do it! They should be sanitising input and protecting database statements properly instead.
Fortunately, this is less of a problem now that more websites do address look up directly from the Royal Mail address database (apart from where the Royal Mail has an official numbering system for flats (usually 42/8 format), but the local council completely ignores it (more often, 42 blah (3/2)). One of these days we'll get this right, and get everyone to agree, hopefully, eventually…
Rule #1: Use whitelists, not blacklists.
Rule #2: Use well-validated libraries, not home-spun character escapes.
Rule #2 is almost as fundamental as "don't spin your own crypto primitives".
As maddening as it is when a website won't accept valid inputs, I must credit them with TRYING. Then attempt an out-of-band sales pitch to help them get it right.
That would be a creative blurring of meanings. But even a tech-phobic judge would struggle to imagine a computer becoming upset (entering an emotional state) upon reading a piece of gibberish. (Actually, that argument might have a better chance if the computer threw an exception - that's the best parallel to an upset human.)
You could maybe argue the name is deceptive. But the best bet might be for Randall Munroe to assert his copyright - then they might get in serious legal hotwater.
You aren't allowed a business name such that someone could commit a crime by uttering it aloud, so it would not be a massive stretch to block a business name that someone could commit an offence under the misuse of computers act by entering into a computer.
On the other hand, a Companies House employee entering a name into a Companies House computer in the course of their regular employment probably would not be unauthorised, so there still might not be any offence committed.
And if we do end up with a new law to prevent this, it's almost certain to be unfit for purpose .....
Thanks be to Cory Doctorow, Randall licences his work under CC 2.0.
Frequently missed, but you'll find it just underneath his advice that reads:
I think that company name should be perfectly valid (along with any other name that is not illegal under the very limited rules).
If Companies House, and their APIs, cannot handle suitable quoting schemes then they should fix them PDQ.
If people download the data and can't handle suitable quoting then they don't deserve to be in whatever business they are in and probably have MUCH more serious problems handling other data. Like personal data.
There seems to be a tendency for companies to "validate" fields and reject the user's input, rather than handle it. It isn't really that hard to handle arbitrary input -- many, many suitable escaping formats exist -- and companies which don't do it probably aren't handling other IT processes correctly either.
Am I missing something?
You don't leave knives or matches lying around in a playground. Although it's not explicitly prohibited by law, you would be liable for the consequences.
Similarly, Companies House should be free to reject anything potentially problematic. FCUK* the companies trying to be "clever".
* French Connection-UK, who pushed the limits in their advertising.
Currently in the UK, venues are being encouraged to put up QR codes relating to Track and Trace. This presumably normalises having codes scattered around the public environment so I expect we already have a few "hostile" QR codes put up by mischief makers.
Still waiting for the Banksy take on this though.
There was a bloke in the US that thought that using NULL and VOID as his and his missus license plates would skim him out of any traffic violations forever.
Turns out that EVERY traffic violation automatically registered to vehicles such as Ambulances and Police Cruisers that were then declared as NULL or VOID by humans were then addressed TO HIM.
He ended up with over $34,000.00 worth of traffic violations in less than one month or something to that effect.
Nice plot twist there!
ISTR that there was some issue with Police Officers reporting Polish drivers for offences and putting what the Polish is for "Driving Licence" on the report as the drivers name.
Ah here we go, it was Irish police huntung the serial offender Mr Prawo Jazdy of many and multiple unique addresses.