Evaluation was probably all wrong
I maybe got this all wrong, or all right. But I think the evaulation was probably all wrong. You have to take the way we link information together in order to create what we call intelligence.
Your intelligence is mostly built upon experience of actual events, connections between related or unrelated data in order to create an approximation of the answer. Absolutely, in the case of mathematics and relatively, in the case of catching a ball.
The problem with search engines and this will always be the case, is they are trying a top down approach. To just create an algorithm which approximates a complex action and display the results, so we build the computer to recognise the words we understand and give them meaning without the computer knowing why, then just tell it to return certain results when we accept certain input.
This is a critical problem, because the computer will never truely understand what it is trying to acheive. So I believe what you might be seeing here, is not an attempt to create a google killer, but an attempt to model how to link information together at the most basic level, it's very primative, remember, because the top down approach gives you results like a fully grown adult *MIGHT* W|Alpha seems to be giving you answers you'd expect from a very basic intelligence.
This might be the first steps at trying a bottom up approach, where you are not so interested AT THIS STAGE, in giving correct results, but building the model upon which to rebuild the system a stage higher on the intelligence scale.
If this is true, then it's premature to judge the engine based on it's current level of intelligence (I use the term lightly) but might be better to see it as a primative experiment in driving the next level, if and once they start to get results from data associated together and they can obviously view the results and see whether the behaviour is correct or not, then we can start to see where they might go.
However, the idea might be not to produce something that gives the correct results, remember, humans might actually know the correct result by performing an inverse search for the correct result, using the incorrect data. We know how NOT to catch a ball, surely there are 1 million ways to NOT catch it, therefore it limits down how to ACTUALLY catch it. we know that throwing our leg into the air, will not result in a catched ball. So perhaps teaching the computer to recognise a WRONG answer, is more useful than teaching one to catch a CORRECT answer.
Just a few thoughts. Maybe we are looking at it in the wrong way, perhaps the idea is not to build a google clone, but to experiment and get free testing from the public about their ideas and how to create the next level.
could be??