Hardware acceleration
Consider yourself corrected, or at least challenged to prove your point!
I've seen this argument time and time again, and I always ask for the same thing: Proof.
There are specialised h.264 decoding parts. They're usually in TVs and the like, because there you don't want to have to put too complex a software system in them.
But when people say "hardware acceleration", they usually think something along the lines of "the processor coordinates data transfer via DMA or some other bus to a special chip which decodes the video and puts it directly onto the screen".
Yep, those special chips in dumb devices like a TV do that, and do it at very low power and heat output.
In a phone or on a laptop? There is no block of hardware dedicated to h.264 in that manner. That would be nuts, because it restricts you.
Instead, there are blocks of specialised computation that aren't much different to MMX, SSE, and so forth. That's what people are talking about when they talk hardware acceleration on a more complex device.
Think about it - otherwise, the iPhone/Android "h.264 chip" would need to be connected directly to the orientation sensor, and would be doing the animation AND resizing when you turn the device from one orientation to another. That's one heck of a complex bit of hardware when compared to the original vision of "chip which does video".
Basically, if the h.264 decoder uses them, then so can WebM. It's just a matter of doing so. Which has already been done for the most part - some of the first patches I heard of to the decoder were ARM assembler versions to improve speed, for example.
Hardware acceleration isn't an issue unless you have a device you can't get a software decoder update for. And the device manufacturers & developers have pretty much sorted that. (Although I wouldn't hold my breath waiting for Apple to join in!)
There are still dedicated "dumb" hardware decoders out there, in camcorders and TVs. But for the use cases you mentioned (desktops, laptops, phones) WebM can be accelerated, and not without much hassle.
It's down to the willingness of the vendor, and most seem willing. Check the WebM wikipedia article for a nice list...
Of course, I could be wrong here. If you know otherwise, then I want proof. I want a spec sheet(s) that show that a common GPU has a dedicated in-silicon decoder of a dumb nature, which could not be reprogrammed to do a new size/orientation/output destination or be partially used for WebM decoding.
Without wishing to sound snotty, that places the ball in your court. I've put forth my understanding, and you now have to prove me wrong. Which I would welcome, by the way.
I've been looking for that magic spec sheet since WebM was first announced, and haven't found it yet. Nobody has presented it to me, depsite numerous challenges to do so. I'm not yet tempted to call the hardware acceleration argument total balls, but I'm pretty close to it!