Words are the new sticks and stones, boffins say.
And there I was, thinking that "malicious email" was about the payload that would take over your computer when clicked upon.
So tell me, what is the minimum amount the algorithm needs? Will this be enough:?
"Hi! How are you?
I send you this file in order to have your advice!
See you later. Thanks"
On a more serious note, this algorith has, given a sufficiently large corpus and after a sufficient amount of fine-tuning (with a-priori knowledge? how else would they tune?), a 80% chance to pick the right author from a list of suspects. That's not quite lifting the veil of general anonymity. Wonder how well reference humans would do, sufficiently tuned through being familiar with the writing style of (all of) the suspects. Did they perchance do a comparative study?
Also, what is it that makes this /suitable for court use/? The 80%? Is that it? Is the enron corpus the only one that they used to test this with? Then I wouldn't be so sure they'd get 80% elsewhere too. In theory maybe, but that doesn't automatically scale to the courts. Go find a few other bodies of emails to try this on, and we'll see what happens then.