Human judges aren't very good either
humans are excellent at making sense of the nuance of the written word
No, in fact, we're not, at least by any objective metric. Of course much depends on what you mean by "making sense" - which is not a term of art in Natural Language Processing, or for that matter in linguistics or cognate fields.
Human judges are wildly inconsistent when interpreting parole (specific instances of language use). That's been demonstrated in study after study. Indeed, the article goes on to refer to a couple, e.g. in identifying "false" online reviews.1
There are micro-inconsistencies, due to ambiguities and other interpretive issues at the phrasal level; and there are systemic macro-inconsistencies in interpreting larger passages or entire texts. The history of literary criticism amply demonstrates that. Vast swathes of theory in various disciplines - linguistics, literature, translation theory, etc - document and discuss the issues.
NLP techniques for classifying documents this way - between "probably genuine" and "suspect" - actually do the best when they don't try to emulate human judges. The article hints at that too, but without discussing the technology it's hard to give a useful picture. Algorithmic tools such as Support Vector Machines, Maximum Entropy Markov Models, and Latent Semantic Analysis are almost certainly quite different from whatever it is that human readers do; they produce usable results in these applications for just that reason. Bots and trolls are generally optimized for deceiving human judges,2 so classifiers that use radically different techniques are more likely to spot them.
Anyone interested in further information on the topic might look up some of the presentations Bing Liu has online. He's one of the experts in finding false online reviews, which is an interesting area because it has strong economic consequences, so there's an active arms race going on. Wikipedia is more ideological axes being ground and lulz.
1Here "false" generally means "referring to events which did not occur", as when paid shills or bots create positive reviews for one purveyor or negative reviews for competitors.
2Or for performance, taking the "shotgun" approach of making more work than the evaluators can handle. But that's a different problem.