no acceptable
I realize their target market is not the enterprise but it seems you did very little failure testing.
Per your mismatched node upgrade I'll share my experience on HP 3par.
HP services involved the whole time. First run through validation tests, everything checks out. Next upgrade first controller. Comes up no issue. Upgrade 2nd controller. Does not come up. We wait for a while and then they determine a failure occurred(internal disk drive for controller failed during reboot).
Ok so dispatch on site support with replacement drive. Takes a few hrs to get there. Array keeps running in degraded mode. Replace drive. Controller does not come up. Replace entire controller still does not come up. On site suppport having trouble with their USB to serial adapter crashing their laptop every 20 mins
I get my friends at 3par involved at this point monitoring the situation to make sure the right resources ate engaged.
HP tries the run book processes to get the controller online a half dozen times. Fails every time. Mismatched software is causing the active cluster node to reject the replaced controller. They try to image the controller a few more times still rejecting.
I lose patience and tell HP to fix this now it had been over 16 hrs. Got level 4 engineering involved and situation was resolved within an hour after that. (No other reboots or outages required).
Wasn't happy with how long it took but rather they take their time and get it right can't afford to lose the remaining controller.
They fixed it though without too much impact to production(there was some impact given high write workload and lack of cache mirroring without a 2nd controller). Obviously new purchases are 4 controller units(with 3par persistent cache) something i had been pushing for years already
Conversely i remember a blue arc upgrade many years ago that caused a 7 hr outage simply because the company lacked an escalation policy and on site support didn't have contacts to get help. My co worker who had the most blue arc experience at our company was able to kick people at bluearc hq and get help when their own suppprt could not. CEO later apologized to us and showed they now had implemented an escalation policy.
So yeah hard to get shit right. Sometimes the worst failure scenarios are ones that you might never think of, which I learned a long time ago so do not compromise on quality storage.