This is an interesting argument today. Particularly as I think the key point of such unified environments would be that they would have to be open source (likely Apache 2.0 or equivalent, or some variant on GPL/LGPL) so that the hardware vendors could implement the graph implementation systems that came up yesterday. You'd want the compute graphs to come in a platform neutral form, but there is all kinds of room for the vendor to apply their own optimization techniques and translation approaches for the output graph. It is only the graph intermediate form that needs to be standardized.
I dare say that I'd rather trust teams of experts at optimization for a particular vendor's platform to do a better job at code tuning than I could ever do as a mere human being. Nor is there any limit to how many graphs can be chased down during the optimization process; you could, in theory, apply the optimization to the entire set of code that the OS will have in memory while the HPC evaluation is performed. After all, you are working with HPC hardware, so you can parallelize the bejeezus out of the optimization and end up taking a paltry fraction of the run time to do the transformation.
Yes, that article on compute graphs really struck my fancy!