This bit about asynchronous training providing noise, thus enhancing recognition.
That's quite straightforward. Optimization algorithms like Expectation Maximization can converge on local minima (or maxima, depending on what the evaluation function looks like), just as any other local-gradient-following algorithm would. (Think of Newton's method, for example.) Adding noise perturbs the system, making it more likely to jump out of a local minimum if the inflection point that forms one side of the trough isn't too much greater than the minimum value itself.
Of course the same thing can happen with the global minimum if there's a nearby local minimum and the inflection point separating them isn't much greater than the local minimum, so the effectiveness of this tweak depends on the characteristics of the curve, as well as how much noise you're adding, etc.
It also means your classification accuracy is dependent on todays hardware architecture, a poor idea.
In this case, the noise is due to out-of-order inputs caused by the asynchrony of the partitioning mechanism - a happy accident. The effect really doesn't depend on the source of that asynchrony, just its degree.
The bit about jumping out of local minima, well theres been lots of approaches to this over the last 25 years
Sure. Many of those approaches are highly deterministic, though, which means with a large, deep hierarchy of neural networks, you'll tend to train large portions of the net to respond identically to a given input. The noise created (accidentally) by the asynchrony is presumably rather more stochastic, so it might produce less self-similarity across the net. I freely admit that's just a guess - obviously we don't have the paper to read yet.
if this was working as suggested it would mean that the neural network was matching the training data more specifically reducing performance on unseen test data
Not necessarily; better optimization of the weights for training data doesn't imply overtraining, particularly if the training data would have arrived at a set of local minima far from the global minima without the added noise.