Oh yes, a "developer" argument
find, grep, and make are usually I/O bound. And single-threaded too, except for make if you ask it nicely.
If you care about quick compiling then C++ isn't the best choice. It's worse than C. C never was, really, which is why back in the day "we" used turbo pascal: Poor generated code but lightning quick compiling, and a reasonable dev environment fit on a 360kB floppy. It didn't need tons of include files, either, saving big on I/O again. C, by comparison, has always been rather slow and bloated, and survived by the grace of efficient pipe implementations on unix. Even so, code quality is not that important for, say, 90-odd percent (Knuth puts it at what, 97%?) of the code, and the last bits you first attack with the right algorithms, then with optimisation, and then with assembly. For numerical things you can often just pick the right library, which might be written in something else entirely, like FORTRAN. Though personally I start with organising my programs so that overall less needs to be done, and only after that it's time to worry about whether clever algorithms are needed, and only then try and pick the best one.
But I digress. Really, you may get a bigger speed boost from a decent SSD in your development station rather than switching CPU architectures.
What I've heard was the biggest problem in finance is the incessant time stamping, meaning lots and lots of syscalls. Millions of calls per second to time(). Yes, very smart, that. Just batch up a thousand transactions, push them all through, and timestamp them before and after. If the result is the same, fine. If not, well, what chip you got in there, a dorito? Either record the difference or mop up the damage if needed. Though the high frequency crowd will have moved to something with greater precision now. Wonder if they know the difference between accuracy and precision.
As mentioned elsewhere in the discussion, there's quite a difference between single threaded throughput and ability to run lots and lots of threads concurrently. For the former you have a reasonable one in x86, for the latter, not so much. For sparc, it's the reverse.
If you still believe more clockticks per second is heaven, I don't understand why you're even bothering with x86. Financials one expects to have the budget for POWER, which comes with a decent sack of threads too. Now to make use of them.
To illustrate: Suppose, hypothetically, and ignoring all sorts of important factors, you have an x86 that can run ten million trades a second per thread, and since it has four cores it can run fourty million trades all told. And suppose you have a sparc that can only run five million trades a second per thread, but it can run eight threads per core and comes in packages of eight cores. That's sixty four million trades for a dead slow sparc. Whoo.
Important factors ignored in the example include ten gigabit ethernet interfaces integrated on-die for the sparc t2, which can give much lower latencies than the x86 can for the next couple generations or so. Still think x86 will always win? It's got the biggest market share and therefore least incentive to innovate. You'd better pay attention to that sort of thing, dear developer, or you get out-quanted by some upstart with lower ping.
How many cycles does a single trade need anyway?