I guess they picked the wrong guy to test the spam algorithm on.
<see title>
Linux kernel supremo Linus Torvalds has published a scathing open letter to Google's Gmail team after discovering that the service had incorrectly marked hundreds of his incoming email threads as spam – including ones containing kernel patches. "Something you did recently has been an unmitigated disaster," Torvalds wrote in …
This post has been deleted by its author
At least they were not discarding it altogether as they do with mail being sent to domains which have both v6 and v4 MXes.
If gmail starts trying to deliver via v6 it _NEVER_ falls back to a v4 MX. So if the v6 MX has an issue, well you just lost your mail. So effectively, there is no MX order fallback.
So, looking at that, do you expect antispam to work in a company which has degenerated to a point where the mail team does not understand the concept of an MX how mail delivery should work? I would not. I guess I am not the only one too as I have just noticed that comcast has removed the v6 MX off their DNS records.
Basically, Google's enforcing DKIM from certain domains, and if a message is "from" someone whose e-mail host provides proper DKIM, but it's missing it, Google (and Yahoo) servers reject it. Mailing lists aren't usually set up to properly handle DKIM (being, effectively, a relay), and therefore get rejected.
The workaround that I saw one mailing list use was to resend the e-mail from the mailing list's address, append "via (mailing list name)" to the name on the from field, and just have both the mailing list and the original author in reply-to.
It's DMARC that's to blame (being a broken solution). DKIM itself is fine.
It's also the obnoxious "conversation view" I suspect. Switch that off and only the actual spam ends up in the spam box. I probably get less mail than Linus, but I do get 100 or more per day on gmail, and the false positives are rare. But come on, if you care about this, you need to eyeball the spam folder a couple of times a week. I see less than 1% false positives and they are almost all from user@yahoo.com via mailing lists, and caused by DMARC.
DMARC is massively broken, because it mandates an SPF test on the From header, even if a Sender header is present. What it should do is to test the Sender if present, else the From, but it doesn't.
Most mailing lists work completely RFC-compliant by adding a Sender header (known as the 'secretary scenario'). However, to get past DMARC tests, they have to violate the RFC and rewrite the From header instead, concealing the originator of the mail.
This post has been deleted by its author
He probably uses it because:
1) Volume - The quantity of general mail he gets and the amount of spam he might get be vast, so he needs something that can handle the GBs of email smoothly. His inbox size could be huge and difficult to not only maintain but search. Bet he doesn't "Inbox Zero" at the end of the day.
2) Conversation View - Yes I know that Thunderbird or other tools do provide conversation view but Thunderbird is increasingly less performant especially on larger mail boxes (in my experience).
3) Spam - This is the controversial part of the mix because it is the bit that failed, but good spam filters are rare.
I spent many years running my own mail server, dealing with spam filters, I've moved to a hosted solution mainly because I can't be bothered with the hassle any more. My IMAP host isn't google which means it is relatively slow and has terrible webmail, but at least I don't have to worry anymore.
Anyone have any suggestions of good value hosted IMAP providers with decent webmail (e.g. Horde IMP not just Squirrel Mail) and configurable spam filters?
Google has legal right to keep and analyse every single mail that comes in or out. Also being an American company it can create major international problem for hackers.
NSA probably runs a cooler Linux (SELinux) too, why doesn't Linus get linus@nsa.gov next time? They probably have better anti spam filters.
Is GMail code open source?
I know it doesn't violate the GPL because it's never transferred to anyone, making it even more closed than non-open source code released as binary only.
Thereby, the fact it runs on some variant of Linux is close to irrelevant. Google does whatever it likes and you have to accept it.
If I where Linus, I would run the mailing list out of standard, well known mail applications, without Google-in-the-middle.
Like an April Fool gag. You're ready to post "Really!! He uses Gmail??" - then you remember what day it is.
(Notice, I'm hedging my bets, here!)
Nb. So, what, he uses Gmail because it's free?? My superior provider charges me ~£5 a year. Runs a tight ship, does ol' Linus?
"Nb. So, what, he uses Gmail because it's free?? My superior provider charges me ~£5 a year."
Gmail may not necessarily be free, depending on how Torvalds and/or the Foundation use it - for UK pricing, I'll they'll see your £5 per year and raise it to £3.30 or £6.60 per user per month.
I haven't checked those prices for myself - I've no intention of farming the email for any of my domains out to Google - but because of clients who seem to think Google's arse is the source of our sunshine.
There is something *soul sucking* about having people, some very clever indeed in their anti-social way, making your job harder on purpose...and it's every day. It rarely slows down and it never ever stops. After 20 years of Spam (I had been running a mailserver for some years by 1994) it was just too depressing.
Antispam tends to be a grind job and I, for one, don't want Linus Torvalds having to deal with it. He has plenty of necessary grind in his life already without adding to it.
The day I stopped running a significant mail server was a bright and wonderful new chapter in my life.
I am in no way exaggerating.
> Antispam tends to be a grind job
Yes and unfortunately our continued use of ever-fancier methods of looking for it after it has arrived only serves to mask the problem so the end users (and NB higher-ups) don't understand the scale of it except for some stats and using it to pretend they never got that email.
There are easy ways to reduce the quantity of it (and I'm as guilty as anyone of posting that 'your spam solution does not work' list) but require several things to happen and the fact that any one thing won't kill all of it at once shouldn't be the reason not to bother but there's a huge amount of money (as well as our time and efforts) tied up in this 'filtering technology' so nobody wants to try anything different.
100% agree.
It's a pity though that a man who knows how hard large scale software development is, how hard a bug free program is, and how hard accurate spam detection is... would have a rant at a bunch of software developers for making the occasional and utterly unavoidable mistake.
"He can write his own OS, his own version control system (Git), why can't he write his own spam filter? Or at least bother to setup his own LINUX server with a nice FOSS mail server and spam filter."
Linus Torvalds openly admits that his strength is coding the Linux kernel, and he has little time, interest or skill for other aspects of computing, such as the GNOME or KDE shells.
Is it just me, or does Google manage to break everything that they create?
Certainly search is not as useful as it used to be, the new Contacts thingy sucks big time, Maps has become just plain irritating, and the new Gmail interface looks like a dud.
And Android... don't get me started.
Google's excellent spam filtering was one of the very few things that have made me stay with Gmail.
Lately though I'm seeing hundreds of spam filtered messages each day in the spam folder. A month ago it may have been a couple dozen.
Sure enough, it looks like Google is "improving" their spam handling.
I expect nothing but a complete disaster.
Do you have any mail messages looking like gorillas? If Google used the same AI, I'm not surprised.
Moreover I believe the mailing list messages are very different from the average mail message because of their peculiar contents, and I'm sure they fooled that AI easily.
"Is it just me, or does Google manage to break everything that they create?"
Yes, just like everyone else does. Marketing is involved in everything and products need to be "refreshed" regularly. People have shorter attention spans and expect "new, shiny" on an ever more regular basis, mostly instigated and exacerbated by the self same marketing crowd. They've created a demand for "new" and have to fulfill, even it's just a new coat of paint with no technical improvements or even a technical backward step.
Those of us who are older or getting older either never fell for this marketing trick or are becoming jaded and cynical enough to see through it. But there's another (million) suckers born every minute.
Yep, this explains it all:
the Gmail team said it is bringing the same machine learning technology it developed for Google Search and Google Now to bear on Gmail's spam filter
Well, except I don't know how it accounts for the yuckiness of the new Maps.
"I thought that everybody just used gmail for web accounts.... stuff that's not related to genuine work."
You need to think this through a bit further. You are wrongly assuming that places where spammers harvest addresses and genuine work addresses are different things. If genuine work involves a presence on mailing lists then you use gmail, Yahoo or Hotmail/Live/Outlook/whatever-MS-calls-it-this-week for such addresses. And you have another private address for less public work and probably another for private life. If that really so difficult to understand?
... Why didn't Linus just call up someone at google or perhaps even send an email, I am sure he knows *someone that could allow him to bypass the lack of customer service.
I have alot of respect for this guy's skills and what he has done, but I only seem to hear about him going nuts about something or at someone.... is he Steve Balmer's long lost brother, raised by the kind and loving people of Finland, thus not being the same evil Steve?
Linus, a nice guy but calm down, given the work on the kernel never ends, he of all people must know computers are not yet perfect!
For some reason, those never made the news. Much more of Linus' work is available for all to peruse via the LKML. Judge by those emails, not the unusual ones picked out by journalists. (Imagine how dull the Register would be if Journalists did not filter LKML)
>>"Why didn't Linus just call up someone at google or perhaps even send an email, I am sure he knows *someone that could allow him to bypass the lack of customer service."
Possibly because Linus Torvalds, creator of Linux, has a community spirit and wants to raise things publicly and for the benefit of all, rather than some invisible and perhaps partial fix.
Though equally plausible, when dealing with a corporation the size of Google, even Linus realises that public opinion is a useful weapon to wield.
It may be that Google doesn't even have places where they can take bug reports, and there is a serious reason why Linus might not know people involved in Google Mail.
There seem to be 2 groups of software engineers. The one Linus belongs to is the one trying to solve problems in the most elegant and simplest way. They think that knowing how to solve a problem is the most important part of software design, and a low number of lines of code is one of their top priorities. They know that, when they do a proper job, a small number of orthogonal features can provide a world of use to the user.
The other group puts its emphasis on development processes. They commonly start with hugely complex designs and frameworks designed to solve very general cases, often even much more general than what they want to actually do. The rationale for this is that, hypothetically, you could reuse those components. In reality, this rarely gets done, as they are not as general as the developer thought they would be, and changing them to be more general would mean changing them, which means changing your old projects.
Those two groups rarely talk to each other since their views are so different. Google Mail probably was done by the later group.
It's noteworthy that in the bigger scale of things, the first group is seen as the one that gets things done. UNIX is a typical example of a product of that first group. In contrast the second one seems to be responsible for many projects which try to solve a rather trivial problem in such a complex way, it's hard to maintain the code. Such projects also seem to "never get done" and continuously evolve for years without getting to a point where they are "done".
"Ah, much like systemd then?"
Perhaps yes, but there are so many other examples. A great example are "modern" desktop systems like Gnome or KDE which try to solve trivial problems, but are huge. That's why there are other developments like "suckless" which aims to create simple yet powerful tools.
Systemd is probably not the worst in that range, but it's the most problematic as both groups need to boot their systems. Therefore it's a point of conflict. It's possible to live without a GUI, but it's incredibly hard to live without your OS booting up.
'A great example are "modern" desktop systems like Gnome or KDE which try to solve trivial problems, but are huge. That's why there are other developments like "suckless" which aims to create simple yet powerful tools.'
However one of my requirements for a desktop system is to enable me to place files etc on the desktop just where I want them. It's amazing how many developers who style themselves "UX designers" or the like seem to take it upon themselves to design systems which expressly prevent this in the name of simplicity. Things should be as simple as possible but no simpler.
"However one of my requirements for a desktop system is to enable me to place files etc on the desktop just where I want them."
Yes, but that's actually not simplicity, that's just the usual UX-designer idiocy. In fact not having that ability creates more complexity. Suddenly the desktop behaves differently to directory windows which both takes more code and makes everything less consistent to be used.
I don't think the Linux kernel can reasonably claim to be solving problems in the "most elegant and simplest way". Maybe back in 1990 when it was tiny and lightweight and didn't need to run on much besides i386, but even then I seem to recall all kinds of ugly hackery to get past the processor's way of doing things. Fast forward 25 years and the amount of cruft in there is astounding. That's not a dig at anyone's l33t skillz, but Linux is huge now and by definition huge is not elegant nor simple.
How much of that bloat is device drivers? I get the impression the kernel core is still relatively compact, but keep in mind it's got drivers for every common desktop device since 1990 in there. Fortunately they made it modular so you don't have to load that stuff if you don't need it.
Actually The Reg and other news sites generate revenue based on hit rate, which means they write sensational stories to get more people to click/read. This increases ad revenues considerably and that's what the hacks here get paid to do.
As far as Linus is concerned, like many of us, he is outraged at the incompetence, ignorance and unacceptable demeanor of people who are entrusted with properly performing their responsibilities. Improperly marking 30% of legitimate e-mail as SPAM can cause a world of problems for anyone, especially anyone in business. Those like Google, Comcast, AT&T, et al who through ignorance or evil cause millions of people serious financial losses due to their improper and misguided decisions, could care less about the people they damage and the costs to others for the irresponsible behavior.
The fact that governmental authorities charged with protecting consumers from this type of abuse rarely punish unscrupulous companies who violate laws while damaging consumers, only serves to illustrate how mad the world has gotten and why people like Linus are outraged by the devious acts of brain dead entities like Google, Comcast, AT&T et al. When these unethical, disgraceful entities can buy favor with the government consumer protection and regulation agencies to continue their evil ways, it frustrates the Hell out of honest, educated, hard working people who are doing their best to contribute to society and make this a better world. The bad guys are starting to surpass the good guys and it's because the governmental agencies have sold out in many cases to the crims.
I was going to write exactly the same thing. People have short memories and, as a victim of the recent Adobe hack, I get spam delivered into my Outlook inbox at work all the fucking time.
Google reckons their AI now catches 99.9% of spam email.
"Why doesn't he put a spam filter into Linux, if he thinks this is important."
Because the kernel is no place for a spam filter? The kernel doesn't know anything about email.
Your question is equivalent to asking why an engine designer doesn't paint the white walls on tyres.
I made two separate points in my post.
The main point is - why would a grown-up use gmail for work purposes? The irony (which downvoters obviously didn't get), is that this particular free software isn't fit for purpose, as it never is, BECAUSE NOBODY PAID FOR IT.
Second point you've focused wrongly. have a ponder about whether IE is part of Windows. Most of general population will say yes.....
Hint: very few of the users would understand the concept of kernel space.
A child complains "why won't the rest of the world do what I want"
A grownup would have sat back and said "I see a problem in the world (SPAM, not SPAM Filter) I have a huge influence in what I do, but I'm in slightly the wrong box to fix it. What can I do to expand the box to put myself in the position to fix it."
I can think of half a dozen fixes at kernel level, given the Linux install base in email servers. Probably most of them are crap solutionsp, as IANAE, but surely Linus and his cronies could have a go? Just consider email as a (standardised) NIC driver with a broken security and fix it. Write an RFC to standardise a email security layer, if that makes you more comfortable. Or, maintain hash-tables for duplication of SMTP packets
Change your ideas of the boundary between kernel and user. $deity$ didn't write the boundary!
@sequester
Are you a spammer ? Please be more verbose about your rant ... if you want to avoid a lot of spam, a receiving system must be able to test if the sender really exists.
SPAM accounts for over 97% of email our company receives, yes we manage to filter out the whole lot of it, no, we do not use gmail, but I guess that for gmail, it is more like 99.99997% (note that that was just a wild guess, it might be much worse), checking the sender is legitimate.
Unfortunately ISPs like Comcast Cable are illegally blocking ALL e-mail from hundreds of ISPs around the world under the false claims that the ISP is sending out excessive SPAM. Instead of just blocking SPAM as reputable ISPs do, Comcast has made a unilateral and arbitrary decision to use automated blanket blockage of all e-mail including proper, legitimate e-mail sent to U.S. Comcast customers from ISPs in Europe, Asia and Australia including some of the largest ISPs in these countries.
That means that U.S. Comcast customers are unable to communicate with family, friends, business associates, etc. in other countries when Comcast is illegally blocking whatever ISP these people or businesses are using and there are many of these legitimate ISPs.
Comcast has failed to even notify it's U.S. customers that it has been illegally blanket blocking legitimate international e-mail for close to two years. The FCC and FTC has been provided irrefutable proof of Comcast's illegal international e-mail blockage via the error messages Comcast sends to people sending e-mail to U.S. Comcast subscribers. When contacted Comcast ALWAYS insists that it is only blocking specific e-mail addresses and not ALL e-mail from certain ISPs. This is an outright lie.
After showing Comcast the proof that they are illegally blocking ALL e-mail including all legitimate e-mail which is the majority of e-mail from ISPs who's servers are "white listed" by industry sources who check on SPAM daily, such as Cisco, Comcast still refuses to terminate their illegal blockage of legitimate international e-mail servers. On occasion Comcast will unblock a specific e-mail address for perhaps a few days or weeks and then their automated system blocks the legitimate e-mail once again. Comcast routinely lies to it's customers and government authorities about this illegal e-mail blockage.
Like most U.S. government agencies the FCC and FTC have proven completely incompetent and unresponsive to Comcast's illegal international e-mail blockage. The typical response is to send the complaint to Comcast who responds with a politically correct letter to deceive the regulator agency into believing that Comcast is using an appropriate procedure to reduce SPAM when in fact the blanket blockage of legitimate e-mail constitutes consumer fraud as Comcast customers are paying for ALL legitimate e-mail to be delivered and it is not being delivered. No one authorized Comcast to illegally block legitimate international e-mail.
Probably not one in a hundred Comcast customers knows that international e-mail sent to them is being blocked. So if you are for instance trying to book hotels in Europe, Asia or Australia for travel purposes, you are unable to do so when Comcast is blocking ALL e-mail from whatever ISP the hotel is using. This illegal blockage of international e-mail sent to U.S. Comcast subscribers is not only illegal, it's unacceptable and undermines the services that Comcast customers pay for monthly.
While we all want to stop SPAM using illegal blanket blockage of all international e-mail from legitimate ISPs with white listed servers is simply unacceptable and outrageous yet Comcast has been doing this for close to two years and refuses to terminate this unlawful act.
Reality is stranger than fiction but the Comcast illegal e-mail blocking is as real as a heart attack. Attacking the messenger doesn't change reality. All statements in the above are documented to be factual and that is precisely why people should be outraged at both Google and Comcast and any entity that illegally blocks legitimate e-mail or mis-marks e-mail as SPAM when it is not. There is a very good reason why Linus was pissed off because the e-mails he failed to receive are important and failing to receive them has compromised his ability to perform his responsibilities.
There is no SPAMMING at all, just facts that should make any reasonable person angry just like Linus is.
Your worthless comment adds nothing to this discussion about good e-mail being improperly labeled as SPAM. If Google, Comcast or some other ISP is also completely blocking e-mail or mislabeling e-mail as SPAM, it is certainly useful for people to know this problem exists so that they can take action to resolve this serious matter. Most people care greatly about their e-mail as Linus demonstrated.
" that its spam filter's rate of false positives is down to less than 0.05 per cent."
but that's just a single statistic. an average across all mailboxes. Without knowing something more descriptive (such as the Coefficient of Variance) you can't really measure their success as experienced by users.
You could get 0.05% by having 99 users with an exceptional 0.01% false positives and one unlucky sod with 4% falsely identified
Probably a better descriptor would be the Positive predictive value (http://www.networkworld.com/article/2336754/software/spam-and-statistics.html)
I've NEVER had real spam to my Google email address, but regularly have to check it to pull out false positives, so make that a 100% false-positive rate for me.
I was annoyed when I discovered what was happening, so went to disable it. But can you? Like hell!
"I know, let's add a feature that silently deletes peoples email..... and make it so that it can't be switched off!"
[ Yes, you can get around it by creating filters to automatically undo the spam categorising, but what's wrong with a simple "Off"? ]
Unfortunately, Gmail doesn't provide an explicit way to disable their spam filter, something simple, like checking a box in the configuration menu that reads "disable spam filtering". However, there's a non-obvious way to throw a monkey wrench into the works to effectively disable it so ALL messages stay in the inbox or are forwarded to a destination of one's choice. This is particularly important for POP3 users, as the Junk folder on Gmail's server is inaccessible with POP. It is also important if you're using multiple Gmail accounts and want copies of messages to the various accounts to funnel into a main account that you monitor regularly.
The way to do it is set up a filter that says something like, if the message DOESN"T HAVE the string "AbeCeiw32#%x139tt3", NEVER send it to Spam. By picking a long, random string of characters for the test, the probability of ever getting a message that contains that string is essentially zero, so all messages go to the inbox.
Developers rarely enjoy being system administrators, and they're rarely much good at it. The two skill sets are actually very different.
I *am* a system administrator, and I still started farming out my email after a while. There's a reason this was one of the first "cloud" services. Maintaining a mail server is a lot of work, and the work doesn't scale down much with size; if anything it gets worse, because many spam filtering techniques don't work as well without a large volume of mail to chew on. When I realized I was spending a couple hours a week tweaking spam filters and babysitting queues, I decided I had better uses for that time.
Google had not responded to The Register's request for comment. Torvalds, on the other hand, did have some advice for the Chocolate Factory's mail wranglers.
Long before the hacks at El Reg wrote this up Gmail Anti-Spam Product Manager Sri Somanchi had already replied to the G+ post saying "The Gmail anti-spam team would love to look into why this is happening"
just sayin'
Is it possible that Torvalds gets a lot of email containing or with attached program code. Surely the mere presence of that should raise a red-flag for potential malware. OK so maybe G have recently done something that increases the likelihood of that being flagged as spam and maybe if Torvalds flags the false positives G's AI will "learn" his preference. Viewed from another perspective if G's change is to be more aggressive about code inclusion/attachment that's probably beneficial to most non-tech users.
Is it possible that Torvalds gets a lot of email containing or with attached program code. Surely the mere presence of that should raise a red-flag for potential malware
Any code that Torvalds receives will be source. This looks very different to executable malware; a spam filter that mistakes the two is beyond useless.
Vic.
I discovered the same problem last week, but didn't go beyond moving the messages back to my inbox. This could be a real problem for any businesses relying on gmail (hosted gmail counts too!). I need to make an announcement to my company, and google needs to make a fix to theirs!
Top marks to Google. I for one am sick to death of Linux spam.
Every day I get numerous emails from Nigerians offering me ten million lines of code if I'll send them my bank details. Then there are the offers of kernel patches to increase my penis size, to say nothing of the improbable emails from young women offering hot device driver action.
Ask Google what their "Lost Message Rate" is, as opposed to their "False Positive Rate".
The "Lost Message Rate" tells you what proportion of non-spam messages are being gobbled up by the spam filter. While on the other hand, the "False Positive Rate" is more of a reflection of the amount of spam you are getting - as the more spam you get, the better the "False Positive Rate" gets - even thought you may be loosing the same number of non-spam messages.
In any case, fortunately it is easily fixed. Just in a key in your email address itself - which might look something like this:
"John Smith -12345" <joHN.SmiTH -12345@eXamPLE.Com>
OK - Google currently won't let you do this, but that is another issue.
More here: http://www.geobytes.com/message-keys/
Just checked. Only 31 messages in my spam folder, but 18 of those should not have been spam.
In fact they belong to a mailing list that I've created a filter for in gmail that should be putting them into an appropriate folder before any other filtering takes place!
What's the point of being able to create email filters when gmail decides to ignore them and do its own thing?
(And yes, I do own and run my own email server, but I've had the gmail address for some time and use it for all my personal email)