Re: No worries
"JVM can turn on vector instructions if it discovers them in the cpu."
That presumes you've built a special JVM that targets the special instructions first. Your binary is only as portable as JVM(s) you have available to you. Someone somewhere will have to tweak those JVMs and release them to you in the first place.
"It is obvious that if you optimize continuously, it is better than optimizing once in the beginning, yes?"
It's not obvious at all. Optimisation is not free. Mechanical optimisation doesn't always make code faster, it's not perfect and it introduces risk as a consequence. Finally if you want stable behaviour from your code you would need to switch it off - jitter can be a huge problem in a distributed system, going to sleep for a millisecond can easily multiply to a penalty of a few seconds across multiple processes... :)
I am not dismissing runtime optimisation out of hand, but the downsides can be significant in practice.
In my line of work the size of working set is often a key determinant of performance - simply because cache miss penalties are so big... JVM based apps tend to be at a disadvantage in this regard because they usually chew up *more* memory than a native binary - with a JVM you bring the kitchen sink with you regardless. This may not be a big deal when you are running a couple instances on your desktop PC - but it is a big deal when you are trying to maximise the throughput of a box running tens, hundreds or even thousands of instances in parallel. It's not uncommon for the Java apps we watch over to run into the OOM Killer - and that literally kills throughput stone dead.
Memory efficient code does have a legitimate role to play, it's not just a fetish. :)
Generally speaking I find that optimising the code at the source level actually gives the biggest benefits of all - a bad design will always perform badly. :)