"The fact is, all modern CPUs since the (IIRC) K5 already do code-morphing in silicon."
You mean like Itanium, POWER6, and almost all SPARC chips? Oh wait, those are all in-order native implementations.
There certainly are advantages to scheduling and optimizing in hardware. However, there are advantages to doing it in software, too. For example, software can consider a much larger instruction window, and can perform more radical transformations, since optimization need not take place concurrently with execution. These days, it's pretty cheap to have an idle (hardware) thread that can do optimization while the main program is running.
And software has been doing many of the things you claim impossible for years: speculative code motion, profile feedback based optimizations, and so on. There are even some run time systems that will detect that program behavior has changed over time, and reoptimize accordingly.