Re: Sigh...
Oh Sigh, Sigh!
You have a point, but I think you take it far too far.
"... teach the idiots how to use them properly (even the ones that have PhDs). Obvious problems:"
We see things here, between us, of markedly differing severity.
"Their dataset has a prior probability of criminality around 50%. That's way higher than normal and leads the system to think that criminality is common." And "Same problem with a lot of diagnostic medicine ANNs. They try to detect rare diseases with an equal handful of normal and diseased cases. They look great in the literature, but never get adopted, because they keep flagging up healthy people--they've been heavily biased to think that the problem exists."
I'm not sure at all that this is relevant, especially the first bit. Training on the examples is best done with near equal numbers of samples for each class: otherwise there is likely to be criticism on that very issue. Evaluation is, likewise, best done on datasets of near equal class size; and it's easier with equal-size evaluation sets.
For operational use: Bayesian statistics does indeed require weighting with the real-life class occurrence rates - this can be dealt with totally outside of class-specific modelling. This by use of the a priori knowledge of class occurrence statistics.
"Second problem is the data. Are the pictures random? I doubt it."
Read the paper, as linked. It is much better than you (think and) write, though it does have its deficiencies.
They've started by just looking at Han Chinese.
Looking within one racial characteristic (especially on such a small dataset) it actually sound science.
"Then they picked pictures of non-criminals by browsing the web and picked pictures of criminals by scouring for wanted posters."
No! Read the paper. All the photos are from non-criminal identification sources. I suspect this is from existing photos on ID cards or driving licences, or similar. Whilst this is not ideal, there is no bias in data-capture mechanism or in the likely 'happiness' of the subjects.
"Looking at their conclusion faces, I can easily classify criminals vs. non-criminals simply by noticing whether the person is smiling."
No you cannot: see above!
However, there is a problem with the demographics, particularly of the non-criminal dataset. There is a high preponderance of university-educated people. I suspect (only suspect) that this is derived from using current students/staff and their spouses or near-spouses. Note in the paper, the collared shirts of the non-criminals and the non-collared shirts of the criminals. Some clear demographic selection would have been useful here: most likely on employment status and earnings for the non-criminals; also on the type of crime for the criminals: violence against the person, violence against property, white-collar crimes - and so on.
"Third problem is feature selection. I'm sure the algorithm didn't automatically choose to look at facial features."
True, but so what?
"So, the authors picked out a bunch of features they thought might be relevant (neo-phrenology as previously noted) and discovered that some of them were more relevant than others."
Again, so what? Whether the individual or composite features are designed manually or automatically matters nothing, providing their training and the evaluation is unbiased (including lack of bias by repeated manual feedback).
"From this paper, I would conclude that Chinese people tend to post pictures of smiling people online and criminals tend to look unhappy in mugshots. Thus, it's easy to distinguish between a selfie and a mugshot."
See the paper and above: neither 'mugshots' (definitely) nor 'selfies' (it seems) are used. Thus neither data capture quality nor associated (mood/stylistic) effects are relevant deficiencies.
Best regards