MWN?
I thought I'd stumbled upon Mac Weekly News. It's like LWN but with more colour.
Nice article - proper nerdy.
There has been much sniggering into sleeves after wags found they could upset iOS 6 iPhones and iPads, and Macs running OS X 10.8, by sending a simple rogue text message or email. A bug is triggered when the CoreText component in vulnerable Apple operating systems tries to render on screen a particular sequence of Unicode …
> Nice article - proper nerdy.
The Williams[*] is back! (Back to writing articles at least, not sure if he ever went away completely). This is excellent news.
By the way, I had no idea Mr. Williams was so technically competent, in addition to his pathologically thorough journalism.
[*] The one and only. One of very few who deserve to be called a Journalist, feared all the way from BT offices to Ofcom corridors.
Chris Williams, the author of this article, works on The Register's sub desk.
Chris Williams, who wrote all the Phorm articles, went to work for The Telegraph, and now has the byline "Christopher Williams".
To avoid confusion between the two we changed all bylined articles on The Register by Chris Williams the FIrst to "Christopher Williams".
Agreed. Really good to see an article like this on el Reg.
Unicode is notoriously difficult to get right, so I have sympathy for the Apple developer w̻̔̽ͯ̄͒́̎ͅh̻̰̭̗̣̪̩͗̎ͯͣ͆̓o̬̱͚̟̹͉ͦͥ̔̈́̓ͨ͋ ͤͤg̭̩̲̍͐ͣ̈́̆͗ͅͅǫ̐ͥͬͣ̀̿̂t͚̤̙̠̫̐̌̾̉̽ ̫̳̫̈̅̍͗̑ṱ̴͎̲͇̯͉̖̊ͤ̈͐ͬḧ̤̳̭̠͉̱͌ͬ͞i̜̺͓̞̳̓̉̓ş͔̩̲͙̤̺ͬ̆̉̂ ̲̭̍̑̉̉̄̆ͫ͞wͬr̛͖̭͎͉̪ͬ͂ͩͥ̚o̢̰͉͙͇͖ṅ̌g҉̫͕̺
I'm not a Mac owner or user so I don't have access to a Mac and I have no familiarity with coding/decoding tools but that debugger looks quite nice. In my time, oh so many years ago, I used Softice, which undoubtedly some of you know.
That's the first time that I have seen any reverse engineering for quite some years and it beings back some nice memories. All nighters trying to determine where program logic could be "modified" ever so slightly for a variety of reasons ;-) ;-) ;-)
That's the first time that I have seen any reverse engineering for quite some years
Plenty of this sort of reverse engineering (typically of similar crashes) appears in messages posted to software-security lists like BUGTRAQ and VULN-DEV. Prominent examples include Tavis Ormandy's 2010 explanation of the Windows #GP Trap Handler bug, and the flurry of Java vulnerabilities documented by Security Explorations / Adam Gowdiak last year.
Incidentally, in the article Chris writes "it's mighty hard to leverage an end-of-array read fault into something more serious". Rather overstated, I'm afraid; exploiting integer over/underflow and OOB access is of course a time-honored practice,1 and memory-access violations (SIGSEGVs on *ix systems, exception 0xc0000005 on Windows, etc) in particular have some well-known exploit vectors. Typically those require a second vulnerability to be anything more than a DoS - for example, a vulnerability that lets the attacker alter trap handling for the process - but "mighty hard" is not warranted.
1That's why it's Sin #3 in The 19 Deadly Sins of Software Security, which should be mandatory reading for anyone who writes code.
Seems to me that CTRunGetGlyphCount is returning -1 to indicate an error and the bug is simply that this error condition is not caught and handled. (Probably because these days programmers use Exceptions too much and have forgotten that return values can also be used to indicate problems.)
I thought that, too, but Apple's definition of CTRunGetGlyphCount() is clear: "The number of glyphs that the run contains, or if there are no glyphs in this run, a value of 0."
It's not impossible that Cupertino's manual doesn't match operation, of course, but the code for CTRunGetGlyphCount() is pretty simple: it returns 0 or a pre-calculated count for the run. It's probably within the glyph run creation.
C.
Right, but the "-1 = error" logic seems to hold: Probably a miscommunication between programming teams. One team expects a string-handling function to always be successful (hey, after all, how hard is it to parse a string?), and the other team knows better their Unicode-fu, knowing an error condition *can* easily be reached with invalid strings.
> ... The question remains: Why are they using signed integers to represent a quantity that can never be <0 by definition ?
It has probably been implemented by someone used to returning negative values from a function to indicate errors, although in this case that appears to be not the case. Perhaps the type is used elsewhere where this is done.
Personally, if I see this in code that I'm reviewing, I insist it be changed. It is an abomination. Just like 'char' being signed by default in most C compilers, it makes no logical sense and is a hanger on from less enlightened times.
This post has been deleted by its author
I think you are looking at this from the wrong starting point. It's poor practice to mix data and control / status inband within the same variable.. ie: A function that normally returns the result of a computation should not return an invalid value to indicate failure. Better to call that function with a pointer to the output , then use an enumerated status type to return success / fail. One of the fundamental things about programming is being fussy about data typing and for embedded work at least, you nearly always need to know and track the size and type of variables.
What was the title of that old s/w engineering book ?
Algortithms + data structures = programs :-).
Fwir, a bit anal in it's approach, but can't argue with the title....
Chris
"There is a perfectly valid case for returning -1 for an error in a function that should always return zero or positive from normal processing, rather than throwing exceptions, but that valid case assumes that you are using a language which does not have unsigned numbers."
Given the documentation, the conventions followed by the sibling functions and the nature of the bug that case definitely does not apply here. The code is broken in this case.
That said occasionally I make use of returning magic numbers myself, but I prefer to throw an exception if possible. Sometimes I am forced to revert to the old C style where you return an OK/FAIL code and write the results via a pointer argument. Mixing exception conditions with results is asking for trouble in my experience.
The docs say: it returns "The number of glyphs that the run contains, or if there are no glyphs in this run, a value of 0."
I expect it's something along the lines of:
charcount=0
here=0
repeat
if char[here]=DELETE then charcount=charcount-1
and then it's asked for the length of the string "DELETE". I remember in my Software Engineering uni course 25 years ago several people getting caught by this.
You can start by flooding the sender of the offending text with clean texts, this will push the naughty text off the screen. I guess that you will have problems if you try to go through the text history though.
You could try asking apple but they (still) haven't publically acknowledged the bug, and thus accordingly aren't offering public advice.
Unsigned integers are best avoided in C (and C-derived languages like C++ and Objective-C) because they are contagious. For example, (1u - 2) is not -1 as you might expect, but some huge number.
It's also useful to have -1 available to represent "no such number"; for example, the length of a file that doesn't exist. Use 0 would be wrong because it's a legitimate value. More generally, having invalid representations adds redundancy which can help error checking.
"For example, (1u - 2) is not -1 as you might expect, but some huge number."
True, but that's a gap in the language and the coder should be aware of it.
"It's also useful to have -1 available to represent "no such number"; for example, the length of a file that doesn't exist."
The length of a file that does not exists should never be a concept you deal with. The program should have already done an exists(filename) call before even thinking about asking for the length of the file (not to mention stat or something). That sort of code is like something I would have written in BBC Basic 30 years ago. We should have moved on in our coding styles, even if still using languages like C which allow it.
Well, it might be. Consider the following:
unsigned int ui = 1U - 2;
signed int si = 1U - 2;
printf( "Unsigned : %u %u\r\n", ui, si );
printf( " Signed : %d %d\r\n", ui, si );
This gives:
Unsigned : 4294967295 4294967295
Signed : -1 -1
Which shows that the context is important, not just the operation.
It's all to do with the (complex) integral promotion rules within the language. Another trap for the unwary is that:
a + b + c
may not give the same result as
c + b + a
"It's also useful to have -1 available to represent 'no such number'; for example, the length of a file that doesn't exist. Use 0 would be wrong because it's a legitimate value. More generally, having invalid representations adds redundancy which can help error checking."
Wouldn't this be a good case to use the humble "null" value rather than resorting to signed integers?
Wouldn't this be a good case to use the humble "null" value rather than resorting to signed integers?
There is no "null" value at this low level of coding. What is treated at higher levels as null is generally represented at this level by a separate bit in a bitmap or a value outside the acceptable range.
"Tsk! Baby out with the bathwater again, youth of today, general lack of wherewithal, three world wars, shrapnel in the head, wouldn't've happened in my day, flogging too good for them etc etc."
E by gum, we used to have to lick out pond etc ad nauseam :-)
Nothing wrong with the youth of today. They are just as rebelious and arrogantly (un)sure of themselves as we all were at that age :-).
The problem is that programming is now seen as easy, when it never was or is easy to design and build robust systems. While any fool can write a two page utility that works, designing systems takes knowledge of data structures, algorithmics, hardware capabilities and limitations to get the best results. I wonder how many web programmers that you meet these days have any clue about what goes on under the hood, in the hardware, have ever read a book on data structures, or operating system principles ?...
Chris
Don't use one parameter to mean two (or more) different kinds of things.
That's one of my golden rules. I'm also a firm believer in Syntactic salt. Combining two things into one is 'syntactic sugar' - it's more convenient for the developer. I actually avoid returning errors when I can. Most languages allow you to ignore returned values I can't force developers to use an error code returned by reference but I can bloody well force them to declare a variable to store it so at least they can't claim they didn't notice it :)
"The compiler should really kick up a fuss for implicit casts from signed to unsigned. If there is an explicit cast then the programmer is due a spanking."
The compiler only knows what it's told. If data is coming in from outside then it is depending on the coder to correctly define what the format is.
Ime, it's typical of modern programming idiom in some areas, where people are sloppy about typing and sloppy about algorithm design, or just plain cut and paste bad code from elsewhere. K&R and other programming texts actually encourage this by using signed types (char, int etc) as the default, but signed types should only be used where actually needed, ie, for arithmetic ops, with the default unsigned everywhere else. It's the same story with the original C standard library, which tended to use (signed) int or char for everything. There, the function typing is a real mess, but may have a historical context excuse..
Back in the days of assembler, it was quite common to find bugs eg: compare or increment / decrement op, followed by a signed branch when processing unsigned values. The bug was perhaps discovered much later when the value flipped the overflow or sign bit. If you do a lot of embedded work, failure to use of the correct type is something you ignore at your peril and standards like Misra specifically warn against this sort of thing.
Programming != Software engineering :-)...
Chris
"Programming != Software engineering"
True, but when I was a Cobol database programmer we knew enough to guard against this stupidity when zipping in and out of separately collected subroutines via the Linkage Section.
Tsk! Baby out with the bathwater again, youth of today, general lack of wherewithal, three world wars, shrapnel in the head, wouldn't've happened in my day, flogging too good for them etc etc.
Talking of VMS, and VT100 terminals, reminds me of much fun in the old days. I remember a certain PAD that would drop the connection if sent something like ZZZZZ (as well as certain control sequences). People used to put them everywhere - including nicknames and in finger IDs on the unix boxes. If you hacked appropriated someone's account, you might set it as their prompt too.
Oh, it's good to see how far we have come.
VT132s were fun, they had a "readback screen" escape sequence. If you crafted the right string in mail you could write something like
DELETE<CR>EXIT<CR> SYS$SYSTEM:SHUTDOWN<CR> to the system admin's terminal, and instruct the terminal to playback the string. By the time he had rebooted the system all the evidence had gone...
Right, killer sequences... "ZZZZZ as well as certain control sequences", you mention... Well, there was this Hayes patent on inserting a "two second pause" between parts of a command to make their modems not hang up on the (otherwise valid) +++ATH0\n sequence. Of course, most manufacturers did license this, and several manufacturers found ways not to... But I do remember seeing a modem that could be pushed to hang up this way. Very fun for us BBS (ab)users! :)
Given all that "Premium product, because it just works" BS I'd expect them to a) Issue a bug fix b)Scan all their code to find similar code sequences c)Update those as well.
IOW not just the bug, but the pattern of the bug.
Let's see if we revisit this gag in the future? Different string, different function, same epic fail.
Should not be possible, and yet......
Excellent article and very detailed low level stuff.
It's actually a frustratingly easy mistake to make with Apple's APIs — those CFIndexes pop up in quite a few places — and Xcode ships with the implicit signedness conversions warning disabled. It's one of the things I always enable when I'm starting a new project. Just enabling that would probably help them catch stuff like this.
That said, if it's a latent problem in initial table setup then the true diagnosis is probably that whomever guarantees the signed value would always be positive needs fixing, so they'd probably just have thrown the explicit cast in and forgotten about it.
Could have been the unicode bug in all versions of NT4/IIS up to service pack 6a.
I remember a friend telling me how awesome the BackOffice server he'd set up for his boss was. I told him to get that shit off the Internet now. He refused.
An hour later I came back to him with a printed directory listing of the server's hard drive and pointed out that I could execute a "format c: /y" just as easily.
Within an hour, the server had been firewalled.
quote: "An hour later I came back to him with a printed directory listing of the server's hard drive and pointed out that I could execute a "format c: /y" just as easily."
And if you did that today, you'd be arrested for unauthorised intereference with a computer system, an offense under The Computer Misuse Act 1990. In fact since you're talking about NT4, it would have been a criminal act at the time... hopefully you didn't get collared for the 12 to 24 months custodial sentence you could have been liable to.
Interesting thought that; using a flaw to point out the flaw is a criminal offense, however a failure to act (failing to secure the system after being notified of the flaw) is deemed a civil offense...
This post has been deleted by its author
This is just such a fundamentally elementary bug that the fact it ended up in OSX and the iPhone product lines is just (to me at least) inconceivable! Truly incompetent! When dealing with buffer sizes/lengths, one NEVER uses signed variables, for just this reason... A true FAIL moment for the Apple software team!
FWIW, I have been writing software for large-scale systems for 30+ years. I am a senior systems engineer for a tier-one hardware/software manufacturer. And I was writing software to support Unicode back in the late 1980's when it was still in the development stages.
If I understood the article, it sounds like it's meant to be calculating something related to the string width, and getting that wrong because by passing from one place to another it interprets a glyph count of -1 as a glyph count of UINT64_MAX.
So if you're asking why floating point arithmetic is used, it's because fonts are designed with floating point arithmetic and rendered with floating point arithmetic. The OS X graphics system uses the same drawing primitives as PDF and Postscript.
I'm sure the bug could possibly be trigged by any uneven number of characters. Not had full chance to play, but my assumption is they're using the direction of the text to mess up the counter; two forwards, three back. Personally I'd compute LTR direction then char direction (i.e. for the backspace char).
No, because the function would appear to get the direction right for a single character or any group of characters with the same direction.
You'd need an initial set of characters going one direction, and then a wider reversing set of characters going the other direction.
If we're talking proportional fonts and actual pixel width rather than character width, it could be done with one narrow LTR character followed by one wider RTL character.
Otherwise, you'd need one more RTL character than LTR (or vice versa if you started with the RTL character).
If this were the case, the correct answer would be either
a) the absolute value of the result, or
b) the sum of the absolute values of the character widths in each direction
depending on whether the desired result is overlapping characters or each set of characters rendered side-by-side.
This little exploint rang a bell, so I searched Bruce Schneier's website. And, sure enough, on July 15, 2000, he observed ``Unicode is just too complex to ever be secure. '' See https://www.schneier.com/crypto-gram-0007.html#9. Doesn't exactly warm the cockles of the paranoid's heart.