"IIRC one of the problems with wanting to put stuff in userland for security reasons was that performance on x86 was shit due to the overhead of context switching."

I believe ARM is no different in this regard because the architecture doesn't include hardware features for this (which would be required to avoid the associated penalties of switching back and forth between kernel mode and user mode). This is especially true for parts of the hardware that historically needed close-to-the-metal coding for performance reasons such as graphics and networking (both of which are latency-sensitive, recall the original Windows NT).

