Having to lose two power supplies does not make a SPOF
To Anonymous Coward(Posted Thursday 15th January 2009 13:13 GMT), SPOF means Single Point of Failure. Losing two power supplies is Multiple Points of Failure...far less probable. Even so, in the IBM Bladecenter, the power supply pair that must fail together to take the enclosure down are connected to separate power harnesses, with each power harness meant to connect to separate external power grids. Therefore the failure of any single power grid will only take out one member of each of the two power supply pairs, leaving the enclosure running.
HP and Dell have also adopted the same design in splitting the six power supply inputs over two power grids. Unfortunately, instead of extending this redundancy to the DC side of the power supplies, they all converge onto one DC bus on one midplane.
The IBM Bladecenter has two separate midplanes, each one with it's own DC bus. That's why all IBM blade servers have two power connectors, they draw power from two separate DC buses. I don't know if the active components you speak of are hardware monitors or in the data and power paths...whatever the case may be, there are two duplicate sets because there are two midplanes, so again, no SPOF.
Was the HP power supply recall a result of a bad batch of power supplies? That does happen to every vendor from time to time, so it is plausible that this is just bad luck. However, the recall affects all power supplies for the C7000 manufactured before 20 March 2008, that is, since the launch of the C7000 in 2006. By their own calculation, HP claim to have shipped more than a million blades. Considering e-class, p-class and c-class, c-class is by far the most successful and would account for 500,000 blades or more. Assuming 6 power supplies for every 16 blades, that's around 180,000 power supplies! That is not a bad batch, it's an expensive design flaw. A profit making company would not make that kind of recall unless the cost of not doing it was even more costly...it makes you wonder about HP's definition of "extremely rare".
Unfortunately, the design flaw is not in the power supply (I would expect HP to be capable of making power supplies as good as IBM) but in not having a redundant DC bus. To fix this is a lot harder, because the midplane would need to be changed and a redundant power connector has to be added to every blade. This is a whole new architecture which would be incompatible with existing blades, something HP would loathe to do given that e-class, p-class and c-class blades are mutually incompatible.
So rather than fix the real problem, HP have elected to issue improved power supplies (probably with better DC fault isolation) to reduce the probability of failure. It's like issuing a recall on all cars to upgrade the suspension rather that fixing the potholes in the road that are causing the crashes in the first place. I can understand why they have done this, but it certainly convinces me that my VMware cluster is going to be deployed on rack mount servers rather than blades..at least not HP or Dell blades anyway.