Hmmm
Allowing google to even more intimately know my details or insecure search helping black hatters.
I know which one I'm going to choose and it's not secure search.
Last month Google made secure search the default option for logged in users – mostly to improve privacy protection. But there is a beneficial side-effect - it is harder for fraudsters to manipulate the search engine rankings of scam sites. Users signed into Google are now offered the ability to send search queries over secure …
When you're not logged in, you can still surf to https://www.google.com rather than http://www.googe.com and gain the advantages of secure search. It's simply been made the default for logged-in users, that's all. Its been available for non-logged-in users for some time now.
I understand that referers should not be passed to http sites from an https site, per:
http://tools.ietf.org/html/rfc2616#section-15.1.3
So, the question is, are Google just playing by those rules, or are they also hiding the referers when they link to an https site?
But even if that's intended for browsers, isn't it the case that for Google to have been passing referers across the secure > insecure boundary is against the spirit of that guidance, if not the letter?
Just to be clear, I'm not trying to argue in favour or otherwise of what they are doing, just trying to get clarification on what they are are doing, and if possible, why. For example, if I change my site to be https, will I get my referer data back?
Would the tech from Google lurking in the corner care to comment?
Logically, if you need to suppress the Refer(r)er header when you cross from https to http for privacy reasons, it only makes sense to do the same when you cross from one domain to another when using https for both (since you don't want sensitive information from example.com to get logged in the access log for nosey-buggers.net)
Therefore, regardless of what rfc2616 requires browsers actually do, suppressing all referral information would actually be in the spirit of the document, while passing potentially sensitive information along would not.
I have no idea which google actually do, but since they call it "secure", I'd imagine they do the sensible thing and suppress for all.
Ok, wondering what the deal was I just went to a trio of google.com sites; https://encrypted..., http://www... & https://www... to see what the differences were in the links provided. For anyone interested my search term was "the register" without quotes. Unsurprising, El Reg popped up first and I have included the initial part of the link below:
From https://encrypted.google.com
https://encrypted.google.com/url?sa=t&rct=j&q=the%20register&source=web&cd=1&sqi=2
From http://www.google.com
http://www.google.com/url?sa=t&rct=j&q=the%20register&source=web&cd=1&sqi=2
and https://www.google.com
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&sqi=2
I abbreviated the links because from what I can tell most of the differences lie here, I don't know what value there is in several 20+ character long strings that are probably just cookie info anyway and everyone is free to perform the same experiment.
I find it interesting to note that only in the last of the three links does the query term, &q=, get removed but it is transferring from an https site to an http site as pointed out by AC mentioning rfc2616 above. It doesn't tell us if that info gets passed beyond google to any other https server however.
What does it mean for you and me? Personally, it means copy the real address from below the link and paste it in the address bar to perform an end run around both google and the scammers. Yeah, I know the all seeing eyes of GOOG will read that and soon disappear the actual address from the search description.
Should I mention that on trying the same at Bing, the browser produced an invalid SSL cert. warning? Oh, I guess I already did.
Interesting experiments, but what you're looking at are the querystring parameters on the links, whereas we're talking about the HTTP Referer header that gets sent along with the request.
You might be able to see that using Firebug in Firefox, or equivalent tools, or of course if you have access to the destination website you can see it in the logs.
The Referrer header usually includes the Query String, so taking the search terms out of the Query String is the only way for Google to get them out of the Referrer header (which is generated by your browser, not by Google).
The original article doesn't make it clear that SSL is irrelevant to changed behaviour - indeed, if most browsers follow RFC2616, and don't send a Referrer when following a link from a https to a http site, the change in the query-string doesn't matter, because the target site won't see any referrer information at all.
In fact, it's possible that this is just an efficiency tweak - there's no point in wasting cycles and bytes including the search terms in the querystring if it's not going to be used because RFC2616 says that the browser should discard it.
It's quite easy to test (I'm using Firefox 8.0):
.
http->http (http://google.com->http://ssltest.net)
Referer sent: true
.
http->https (http://google.com->https://ssllabs.com)
Referer sent: true
.
https->https (https://google.com->https://ssllabs.com)
Referer sent: true
.
https->http (https://google.com->http://ssltest.net)
Referer sent: false
Thank you, most helpful.
So now the logical next step is that everyone moves their sites over to https, which, if we're limited to one site per IP address, means we should run out of IPv4 addresses even sooner!
To which end, I don't suppose anyone has any cunning tricks to get round this, using multiple certficates simultaneously to decode https requests and shunt them over to the correct site when you have several unrelated https sites on the same server/IP address? (and yes, I mean multiple certificates: wildcard etc certs aren't flexible enough). I'm thinking for IIS/Windows here, but Linux tricks equally welcome.
...if I'm not the only one here who's thinking lately that "black-hat SEO" is kind of a redundant expression these days. I'm wondering because I've read some Web site copy that's supposedly search-engine optimized, and, maa-aaan, what a load of gibberish. I suppose it's "human-readable" in that I can understand the words on the page, but what I've seen is so infested with empty buzzwords and catch phrases that I can hardly imagine any actual humans being able to get through it all without their brains exploding.
Also... stop me if I'm wrong, but isn't it possible to do a secure search by simply typing "https://" ahead of the URL instead of regular old "http://", without being logged in? I'm only wondering because i have three gmail accounts -- two for a couple of Blogger blogs I run, and one as a backup address for work -- but I never run searches while I'm logged in...for security reasons, if you know what I mean.