Have a euphemism on us.
"non-deterministic endpoint behaviour"
I'm nicking that for my next status report.
Resistors, which cost a few cents apiece, are bricking pricey Cisco Adaptive Security Appliances (ASAs). A Cisco field notice reveals that models ASA5508 and ASA5516 “might fail in operation, after 18 months or longer, due to a damaged component.” “Due to a manufacturing process issue, some ASA5508 and ASA5516 security …
Is the result of allowing a bean counter to have input on a component purchasing decision.
Looks as though anyone who has one of those pieces of kit, should apply for a replacement immediately, rather than wait for a failure and a further three months.
The report sounds as though this is a 'when' not an 'if' situation.
As much as we might like to blame bean counters for all the ills in the world: Considering how cheap resistors are in bulk I would guess it's more likely a design or BOM error, using an under-specced component which is overheating and destroying itself over time.
That a single resistor failure renders the entire kit non-repairable suggests to me it's in a critical path somewhere, perhaps part of the power supply (hence my suspicion of overheating), causing a cascade failure when it eventually expires.
And if the components pass the manufacturers QC process?
i.e. capacitor plague, premature flash failure, Intel Atom CPU's.
While Cisco "always" seems to have these issues, it's hard to tell how many are caused by volume and how many are just down to being historically more open about hardware issues/RMA's than other parts of the IT industry. I've been told by HP/IBM/Sun field engineers "you should get X replaced, its a known issue", I have had to prove that Dell have known issues "why do all the serial numbers in this range reboot and fry CPU's/VRM's?" while Cisco will either tell you in a TAC case that its a known issue and they will arrange an RMA or have the information publicly available if you choose to look.
If the bean counter bought a resistor with a lower power rating than the one on the BOM, then they didn't do their job properly. If the component on the BOM had a lower power rating than was proper for the circuit to have proper margins, then that is an Engineering screw up.
A bean counter might substitute the make (supplier) of a resistor, but would never substitute a lower wattage part without approval from the R&D department. No, it's almost certain that this is a design error. It is possible that the designer specified a resistor that had a power rating greater than the power it needed to dissipate, but failed to take into consideration that the stated power rating of a resistor must be de-rated if the ambient temperature is high and/or there is inadequate ventillation.
"A process failure during manufacturing of one batch of resistors should be found when the random samples of the batch are tested."
Generally, items like resistors are not tested for 18 months. As another poster suggested, this may be an under-spec'd item which is not itself at fault, merely over-used. That a mere resistor dying causes an entire unit to fail sounds like a potential design issue too. Re-furb of a board with a dead resistor ought to be possible.
I do not know of any manufacturer that does sample testing of resistors apart from verifying the value. In my factory the pick & place machine has a built-in tester on its centering jaws, and checks the value and dimensions of most SMD passive components (and diodes) prior to placing them on the board. But we have never tested that a resistor can indeed dissipate its rated power for a long time without failing.
It all sounds very familiar, there was a clock issue that would brick 5516's a while ago, and you couldn't order a replacement immediately unless your kit had been in service for 18 months or it was a critical device in your infrastructure - you had to join a queue just to order replacements - massive fail.
From the FAQ:
Q: When did you become aware of this issue?
Cisco learned about the scope of potential customer impacts due to this issue in late November 2016.
How is this news now?
It depends what the actual cause is. The actual resistor is unlikely to have a fault. The issues I can see are:
Incorrect specification, i.e it is a lower wattage than is required
Incorrect installation so that it is stressed on the board(assuming a traditional wired component)
Incorrect lead shaping that has stressed the body and they break.
The is a remote possibility of a duff batch or resistors but I think it is quite far down the list of options.
I favour the first and that the resistor is at the very top end of it's dissipation rating (or over) and they are frying.
This is far more likely to be an issue on the production line; there are many variables involved but it could be as simple as a chip shooter being misaligned (it only has to be a few degrees out) which can cause damage to the part that might not show up for quite a while.
I have seen that in the past.
It might also be an incorrect value (I have seen that on multiple occasions) that causes a cascade failure later.
The vast majority of parts used today are surface mount (no colour codes) and different manufacturers have different ways of encoding the value that does not match the classic method (1st significant digit, 2nd significant digit, multiplier) (argh) so it is not always clear just what the part really is. Sometimes there is a third significant digit.
There is little incentive to ship counterfeit resistors of the bog standard type (in large quantities such as Switchzilla would buy, they are literally dozens to a penny. These things come on reels of up to 10,000 parts).
There are occasions when the incoming product is defective; if that is true then the bill would probably go to the vendor.
Whatever it is, this sounds like a batch problem (either from the part or the actual line) as it is highly likely that Cisco uses multiple manufacturing locations for different pieces of kit.
I have seen\experienced a PCB assembly where the PCB was designed with one resistor into close a proximity to the stand-off & would short out to it, either that or it was a pick n place or assembler error..
A simply fix just to prise it with a fingernail away.
The Advisory doesn't mention anything that can be taken to mean "substandard component fitted".
In fact from the wording of the advisory:
“Due to a manufacturing process issue, some ASA5508 and ASA5516 security appliances might have a damaged resistor component,”
I suspect some machine was set up incorrectly so that the component on installation was subject to stresses (mechanical/thermal) that weakened it.
I know of a similar issue with counterfeit parts (diodes) that entered the supply chain of major US telecom vendor in the early/mid 2000s (merged & bought twice since).
As a UK based engineer I had to visit our UK and Ireland staging areas and inspect products waiting to be sent to customer sites, as the company didn't know when (time frame) they were used in the manufacturing process.
BTW, these diodes were used in the -48v dc power circuits and would/could fail spectacularly! Hence icon.
This is always the trade off between setting a short time, and having to develop the mechanisms for updates in the field, and picking a long date and hoping nobody ever hits it!
I'm guessing the person who used a certificate lasting 10 years is long gone. Then again they probably also didn't think people would still be using the phone 10 years later.
Cisco’s fix for the mess is sending you a new appliance. Administrators in Asia, Argentina, Mexico, Venezuela, Colombia, Brazil, Mexico, Russia, Turkey and the UAE have been warned they may need to wait up to three months for their new kit to arrive. Cisco blames “importation regulations” for that delay.
Just a moment. Your firewall fails, and Cisco says that it's going to take three months to provide a replacement?
They might as well not bother, frankly. If a supplier told me that then I suspect that the business might not want to be without any internet access for three months and that's going to leave me with no choice but to buy a replacement, and with that sort of service the replacement certainly wouldn't be a Cisco box.
No, the field notice says not only that you can proactively replace, but they recommend that you do. Obviously you then have to keep your fingers crossed that the current one doesn't fail until it arrives.
Also if your current one does fail, I'm sure shouting at them to get the import stuff done quicker might help. If the only issue is money and taxes then Cisco can eat that cost as well.
I can't imagine replacing all customer units is going to be cheap. The units they get back can't be resold as new.
" the field notice says not only that you can proactively replace, but they recommend that you do."
The problem becomes if that 3 months delivery time applies to units under support contracts.
If so, the support contract isn't worth the paper it's printed on.
"The units they get back can't be resold as new."
They'll probably go on the shelf as "spares" for warranty replacements. It might be convenient for them to delay shipping out to customers who's units have not yet failed while they build up a stock of refurbed units.
It's caused by US import restrictions for those countries - its not a Cisco specific issue.
As this is potentially a large scale replacement programme, it is unlikely that there is sufficient stock in said countries so they trey and flag this to their customers.
There are alternatives to this process (no communication, hope things don't fail, use a vendor that never has hardware failures) but it's not as terrible as you make out.
By the way, if you find the hardware vendor that doesn't have failures, can you let us all know? Just make sure you use a realistic sample size (i.e. at least 100 units, ideally 1000) so it matches the real world.
"As this is potentially a large scale replacement programme, it is unlikely that there is sufficient stock in said countries so they trey and flag this to their customers."
They know how many they've sold in any given country. This means they can bulk ship the replacements to the local warehouse NOW and announce a proactive replacement program as soon as they have sufficient stock.
"This means they can bulk ship the replacements to the local warehouse NOW and announce a proactive replacement program as soon as they have sufficient stock."
They can bulk ship equipment once it has passed the various customs requirements, which, for devices incorporating "munitions" such as encryption takes three months.
The delay isn't getting equipment from in-house warehouses - the delay is getting equipment into in-country warehouses for distribution.
SMD resistors in bulk cost much less than a few cents a piece. A UK small volume supplier is offering reels of 5000 for £4.50 (0.09p each), and in real bulk the price can go even lower. Buying substandard grey components therefore makes no real sense at all. But the bean counters always think otherwise. A colleague was told by the company accountant that he should find and re-stock or formally account for any resistor that got dropped on the floor in the workshop.
However it's also possible that the pick and place or the soldering line is out of tolerance, and that's a quite tricky problem to solve. Mostly happens in low cost assembly plants where they have inadequate QA monitoring.
"A colleague was told by the company accountant that he should find and re-stock or formally account for any resistor that got dropped on the floor in the workshop."
My response to that would be to tell said accountant that the time spent doing so would be charged to the accountant.
Back in techie days, the supervisor not only allocated jobs but also filled in timesheets for the staff. When supervisors were removed (for cost reasons) and staff told to fill in their own timesheets, the accountants objected LOUDLY to the techies documenting the time taken to fill in timesheets because it was being charged back to them.
Meanwhile, in a particular research lab I worked in, every.single.fscking.resistor or other component would get entered individually into an ERP database, tracked, and "dispensed" as needed. This makes sense for a computer or $50,000 spectrum analyzer. But a $0.005 discrete?
No ship! Buy a reel of 5,000 resistors? You get 5,000 manual entries in ERP. Care to guess how many entries are jacked? That's how your $0.005 component becomes $2-5. Granted, we were a lab instead of a manufacturer, but that's unreal.
On top of this, I'd frequently get called in the carpet to justify things such as, "why does this design have four precision resistors and several hundred other resistors?" My reply would be along the lines of "We've got five guys in here, at 150 bucks an hour, arguing over six cents on a $45k piece of equipment. What are you smoking? I want some...'
Any they wonder why the competent design engineers left, followed by the clients
No-one's going for the conspiracy theory angle?
This is switchzilla afterall who unlike Huawei who 'may' have installed backdoors for government shady types, Cisco verifiable have installed backdoors for government shady types. Especially since this is a firewall product, what better piece of kits to monitor internal and external traffic?
Or yeah.. Occam's razor and its just crap QA on the kit (more boring though).
Hang on, I think I hear helicopters....
I'm not saying Cisco kit isn't good. Just that the idea that Cisco is better than all other kit and 'no one ever got fired for using Cisco' and the roll of the eyes if you suggest to a CCNA that maybe some other vendor's kit is much better value for money or you can get faster kit with more feature for the same price seems misplaced.
Cisco have had more than their fair share of security vulnerabilities, have US government hacking, have had errors that have forced the kit to not work properly, bugs, failed components etc. Just like any other manufacturer can have, however for Cisco kit you have to pay a premium for it.
For some operations it may be the only usable operation, sure but they are few and far between. I remember having a discussion with someone once about a datacentre switch . The 1 Gig switch he was showing was about 5% faster than the competition and had 10Gig SFP+ ports for the backbone or high throughput servers. It cost a lot of money. I pointed out that you could get a full rack of 10Gig ethernet ports on a switch, with all the features that were required for half the price and it also had 4 SFP+ ports for extended connectivity over fibre. Not only that the SFP+ fibre modules were 1/5th price. Not cisco though was it, reliability, industry standard etc etc. Anyway we used the other manufacturer and after 5 years it was still running, and supporting an ethernet SAN.
Yes it took a little to learn the new terms for the equivalent Cisco proprietary functions (but which performed the same thing) and command line was different but logically the same. However many features were far, far better - like a proper centralised management system, greater estate visibility, shallower learning curve with a fairly decent GUI for less experienced techs, Even the switch monitoring software could read flows from them and save configs, however most was just SNMP anyway.
If the thinking is 'it has to be Cisco' then you can miss out on some great products and you might end up buying rubbish like their Phone system rather than one of the much better alternatives, or something inappropriate for your organisation like the Cisco WiFi can be.
"Cisco seems to have gone down the intentionally not obvious to configure"
When you look at the ancestry of the different modules, all bodged together from various borged companies, with wildly different underlaying coding strategies and command syntax it's more understandable
It's also more understandable that "Cisco" is a rats nest of poor quality - and frequently internally incompatible - code all mashed together with questionable QC and as little care as can be justified - and then there are the INTENTIONAL backdoors to take into consideration.
I take that back. Calling it a rat's nest is unkind to rats.
This post has been deleted by its author
The ASA 5508 and 5516 have Intel C2000 Series CPU's...
The resistor is just the workaround that was installed in "re-worked" units - intended to stop AVR 54.
Cisco hasn't fixed the underlying issue ...instead they're blaming the band-aid.
Biting the hand that feeds IT © 1998–2020