I did not know that
I did not know that ARM actually prohibited adding instructions. I just assumed the vast majority of ARM licensees simply didn't want to trouble themselves. I mean, I wouldn't.
"Basically, chip makers will be encouraged to come up with libraries and APIs that access their special instructions in a standardized way, and provide these frameworks to developers who then purchase and use the system-on-chips."
Very smart. This is like the "C intrinsics" for Intel CPUs; these are a special set of headers that let you use MMX, SSE (/SSE2/SSE3/etc.) while still having portable code. You want to use "MMXdofoo(a,b,c)" in your code? OK, use the header. If you're building for Intel (and not something like a 486...), inline assembly for MMXdoofoo is put into your code (so there's no overhead from using this type of header compared to putting inline assembly in yourself.) If you're building for something else, a C loop that implements MMXdoofoo is put in. Interestingly, I saw a few years back an ARM port of the Intel intrinsics... so code using Intel intrinsics to use MMX, SSE, SSE2, SSE3 instructions would use the ARM NEON equivalents; instead of merely being able to run some MMX-using code, it could actually run it with the equivalent speedups.