Maintenance
"Over the last two years, researchers have seen a failure rate of an eighth of that seen in a control group of servers on land, running the same workloads."
Having a lower failure rate is obviously beneficial, but it seems to come at the cost of being extremely difficult to fix or replace the parts that do fail. Would this mean moving to an SSD-style of overprovisioning, where you start off with 10%ish more servers than actually needed and bring the spares online as the used ones fail, until the capacity finally drops enough that you need to replace the whole thing?