Why the focus on microcontrollers?
If it is true these algorithmic improvements can lower both training times and resource consumption by an exceptional amount while preserving accuracy, wouldn't that also be true for all sizes of models on all devices?