Re: Pixel binning
As you know, the individual sensor "pixels" are only sensitive to luminance so in order to have a colour image you need to apply colour filters.
In a "traditional" sensor, as I understand it, one pixel in a group of 2x2 has a red filter, one a blue filter and two have green filters. Other combinations are possible, of course.
In the final result, each image pixel corresponds to one sensor pixel, and the full RGB value of the image pixel is calculated by interpolating from adjacent pixels so that - in effect - what you end up with is an image with full resolution luminance but quarter resolution colour. It's actually a bit more complex than that because what the sensor pixels are measuring doesn't give them a true luminance measurement and because with the RGBG sytem you get a quarter resolution for red and blue, but a half resolution for green - our eyes are more sensitive to green anyway.
And then, as you say, you perform yet more blurring if the image is stored as a JPEG so the "true" resolution of the final image is actually lower again.
With "pixel binned" sensors the four pixels in the group are output as just one final image pixel. Effectively you end up with full resolution colour, but you "throw away" a lot of luminance and some green-channel resolution. What you gain (in theory) is "accuracy" - by combining the luminance values of four sensor pixels you should end up with a "cleaner" (less noisy) result. In theory you also gain some sensitivity because although each sensor pixel is smaller than in a traditional sensor and so can "collect" fewer photons during the period when it is doing so, there are four of them all collecting at the same time.
These sorts of techniques have been used for many years in various forms. "HDR" image recording is one common example - in this case the "sensor pixels" which are combined are separated in time rather than space, as the camera takes three (usually) images in rapid succession, each with slightly different settings. The difference is that while a 12Mpixel camera which uses HDR is marketed as a 12Mpixel camera - not a 36Mpixel camera - a 12Mpixel camera which uses pixel binning is marketed as having 48Mpixels. Of course, it really does have 48Mpixels, but you only get that resolution if you treat the sensor as a traditional sensor, which sort of defeats the object and probably leads to worse images than you would have got from an actual 12Mpixel sensor with larger pixels.
There is an additional thing at play here, of course, and that is that very few people view the images they take at 1:1 on their screens. If you are viewing a 12Mpixel image on a 1920x1200 desktop monitor, each monitor pixel is combining the values of four or five image pixels, so is effectively performing "pixel binning" on display! With very high resolution displays, we rarely watch them from a distance where individual display pixels are discernable to the eye, so even if we did have the image at 1:1 scale on screen, the Mk I eyeball performs the pixel binning.
It also applies to other fields; I was once involved involved in the creation of a device for measuring the thickness of materials (metals, mainly) by firing an ultrasonic pulse into the material and timing the reflection(s). To create a cheaper device we tried to do away with the extremely accurate high-speed timers usually used in these circumstances and instead took multiple measurements using a less accurate timer which was not synchronised to the measurement process, theorising that the natural "dither" thus introduced could be averaged out to give a more accurate result. It did seem to work, but I left the project before it was commercialised.
M.