All very well but ...
A worthy attempt (if only at the symptomatic level), but benchmarks would seem somewhat moot in the face of some basic failings of principle from which the current AI paradigm suffers. The stochasticity of results and the effective impossibility of verifying how they were arrived at are fundamental barriers to trust (and indeed to a great extent barriers to improvement).