Testing is easy?
This could, conceivably, not be down to Apple at all.
The measuring equipment and testing chambers needed to determine if a device meets radio regulations are not cheap, and testing is a relatively specialised process, which means that it makes sense for a company to specialise in offering testing services. The company that does this gets rare expertise (which is marketable), and gets to use the test infrastructure all the time, rather than only when a new product is being developed. Only the largest companies will have their own facilities, and Apple might be one. Standards testing organisations are people like the TUV's, Element, Eurofins, DEKRA...(there are many).
There is an assumption lurking here, that if two testing organisations test the same pre-production model of device that they will get the same answer. Also, if there is a natural variability in the devices, I expect you would have procedures in place to prevent someone privately pre-testing a set of devices and only sending on the devices for formal testing that passed the selection testing.
You also have to make sure that the testing organisation knows its stuff - that's what you pay for, but...Radio Testing: An Insider’s Guide From an Outsider’s View
There's a lot going on here: two tests on the same device by different organisations could be giving different results for many reasons, and the variability could be higher if they are testing two different samples from (possibly) different production lines. The testing organisations could have methodological differences. Production devices could be subtly different from the pre-production ones used during product development. Somebody somewhere, might have been trying to game the system.
Meanwhile, the SAR levels for the general public are set deliberately low (as in, around 10% of levels shown to cause an effect*), so there's a pretty small chance this will have had any detectable health effect.
SAR thresholds for electromagnetic exposure using functional thermal dose limits
Conditional safety margins for less conservative peak local SAR assessment: A probabilistic approach
A quick overview of what SAR testing involves: Varkotan: Specific absorption rate (SAR) Testing – 5 things for companies to consider
NN
*The effect is heating of body tissue. At the frequencies used by mobile phones, the emitted electromagnetic radiation is non-ionising, so the working assumption is that the principal effect is tissue heating. This is controversial in some circles, where people claim biological effects other than heating, but the regulations are concerned with heating effects. Epidemiological studies of mobile phone usage have been pretty good at confirming the null hypothesis, green jelly beans aside. The reasons why the SAR limits are set at the levels they are are outlined in this document from the International Commission on Non-Ionising Radiation Protection (ICNIRP): ICNIRP webpage on: RF EMFs 100 kHz - 300 GHz; [pdf] Actual ICNIRP document: ICNIRP GUIDELINES - FOR LIMITING EXPOSURE TO ELECTROMAGNETIC FIELDS (100 KHZ TO 300 GHZ) but the tl;dr grossly oversimplified summary is that for exposure to the head, the commission took a conservative lower value of the temperature rise required that has known adverse biological effects, determined the power level of radiation required to generate that rise, and set the exposure limit to be one tenth of that level.