
Not sure I can top the story
but I have been on a 30 hour round trip in Europe to configure a printer as the hawdware outsourcer was not permitted access to PC's that could execute its commands...
Welcome again to On-Call, our weekly wander through readers' recollections of their ramblings to customer sites after being called out to fix stuff. This week, reader Tim shares a tale from “About 7 years ago when I was working for a software-as-a-service company based in the UK.” Things were going well and the company was …
Easy to solve - ask the guy to use his phone to take a short video of the server front and back (or even a photo), and email it to you. All the information you need to say the words "Press that button there".
Plus - if the servers aren't built or working at all, you don't even need this evidence to say "Press the button...." without any cost or risk at all
To be fair the inventor of "off" leds should be drawn and quartered followed by serious torture. Many years ago my mum bought a Sony receiver that had a red "off" led which turned green when it was powered on. The problem was up to then nearly everything had red "on" leds which went dark when powered off. Imagine my surprise when I pressed the 1/0 button to turn it "off" only to be met with Aida with the volume set at 11. Funny thing, not many devices still have that arrangement but unfortunately the evil bastards have discovered blue leds for damn near everything and they seemingly pump them at 20 watts.
Exactly. This is incompetence from both sides.
The people trying to login are so incompetent they can't do basic troubleshooting, so they pass it on to the techie. But unfortunately the techie although knowledgeable enough to fix the issue, is socially incompetent, so doesn't call the company to "clarify" the situation beforehand and save a trip.
Knowledge vs complexity gap
knowing how to rack a server is one thing
troubleshooting why it won't power up once power cable(s) is/are plugged in and the lights are on is another.
http://blog.ipspace.net/2012/03/knowledge-and-complexity.html
http://blog.ipspace.net/2015/11/can-you-afford-to-reformat-your-data.html
loads of out source companies relying on unskilled and inexperienced staff to provide services that on paper are cheaper but in the long run are poor value, like in this case requiring an engineer to cross a continent to power on kit.
"the light behind the power button flashes orange when the server is off, and lights up bright blue when you turn it on. So even if you were colour blind the flashing should have been a give away."
It's going to be pretty much impossible to win this argument, because even if for colour blindness, there are no standards for power/running indicators - there is nothing obvious. I've seen it go every which way, even separate marked leds.
At the end of the day, if you don't know what the bloody led does on a Dell, or any other type of box, you just don't know.
Even checking the fans is iffy. I've seen domestic grade equipment among enterprise gear, and on those, the fans are rather subtle - I've resorted to a torch at the back to see if I can see the blades turning.
So, as per the other post, this is a fail on multiple sides.
Because DRAC/ILO/IPMI look very expensive, so the bean counters dislike them.
When you're building any infrastructure that has quite a few servers, the additional cost of DRAC/ILO/IPMI soon adds up to a hefty bill.
Everybody here knows that when you factor in the potential costs - longer outages, and time saved when called out - they're actually pretty good value for money. Not wasting time having to go to the data centre to deploy the Mk I Finger O' Doom is pretty handy. An IP KVM was a useful alternative, but the lack of the power feature made it very much an inferior solution - which was reflected in the pricing of the two technologies.
But try telling that to the guy who doesn't understand, and is wondering why every server is more expensive by a three figure sum...
The drive to virtualisation has often been justified solely just on the basis of shaving that cost off each server (and having standardised drivers/devices on your servers). As you scale up, it becomes a significant saving.
IIRC the basic DRAC functionality, including power control, are free or nearly. The Enterprise version with remote console and the like is to be paid separately.
While a KVM is usally better for remote control than the ActiveX/Java "virtual KVM" of DRAC, for basic tasks and to check/control hardware, on-board management systems are very useful especially if the machines are away from you. Virtualization doesn't help, you still need a way to manage the hw the hypervisor are running on...
Also the actual versions of on-board management system can host the server drivers and automatically install them at setup.
Did your IPKVM and power strip combo support remote-mounting media so that you could re-install the server from bare metal remotely? Did it alert you to hardware issues with the server? Was it able to automatically reboot your server (ASR) when it detected something was wonky or crashed? Did it provide access to the integrated management (hardware) log on the server when the server was offline? Did it allow you to power-adjust the CPUs to stay within power requirements/tolerances? Did it allow you to automatically power down servers at 'quiet' times and then start them up again when needed? I'm Hazarding a guess the answer to most of those is "no".
So, as ever, one pays one's money for the features and things one particularly needs. Some people only need basic power control and KVM, some people need more. Price alone is an invalid comparison - we're IT people, we know this.
Personally, I'm only interested in some, but not all, of the things mentioned above, but experience has been a hard task master and now if anyone around our place provides/specs a server without proper iLo/DRAC-level management on board they get a very stern Paddington-Bear-Stare-grade talking to. It's a philosophy that's saved our bacon (or at least saved a lot of wasted time) enough times to be warranted for us. Meanwhile, our physical KVM (where needed) does just that and nothing more. YMMV, as they say... :-)
S.
"https://www.getonsip.com/"
I must be the only one who reads T&Cs. No commerical use for getonsip.
"How can they tell" you ask?
Yeah right, that's what's making the bulk of software writers ONLY offer rediculously expensive subscription models for everyone.
Thanks for your attitude.
I will never forget the customer that told me the computer was *definitely* on because they "weren't allowed" to turn them off and therefore refused to even consider just turning it on as an option. The fact the screen was black and it wasn't on the network failed to shift her viewpoint. OK ticket logged, someone might get back to you tomorrow...
"0th - Have you plug it in?"
Having returned a printer to a customer after workshop repair she phoned back to say it had arrived but now it didn't work at all, it's worse than it was before it was sent away so she'd have to use the loan one while we take it back.
So tech takes her though obvious, starting with the zeroth law, is it plugged in. Yes, it's plugged in, it's tuned on, the head does it's normal start-up sequence, ready light is on, paper loaded, paper out light is off but nothing prints, no head movement, nada.
So I get sent to site, arrive on site, plug in the centronics cable, demonstrate it is working and leave.
Well, it WAS plugged in, wasn't it?
I was sent from London to Milan for the day to cable and configure some modems. When I got back to Milan airport the Customs went on strike for a couple of hours. I sat in the bar being paid an hourly rate of £mucho and was very happy :)
P.S. When they resumed working, the custom guys queried the pin hammer and Stanley knife in my carry-on toolbox. I said that it was impossible to do any damage with those. Been proved wrong there.
"Can you send me a HD picture of front and rear of the racks?"
<get pictures>
"OK, the servers have power but are not actually on. Press the little 'On' button."
<servers power on>
"Happy to help! Bill is in the post."
Time and impact of flights saved, look like a hero to the customer. Profit?
"Cameras are banned in here" - quite often.
Back in the day when I was involved in designing industrial computers, all the little lights were labelled, either next to LEDs or actually on the lenses of larger ones. So the red LED in the bottom left corner was clearly labelled "Power" and the big green latching pushbutton with the guard against accidental shutoff was labelled "Run".
But designers know better.
The gazillion optical drives with the insert/eject button underneath the tray are a testament to saving 5 cents per drive.
The manufacturers probably saved a few million over the years with that design, but every single user pays a price when pushing the button with the tray in the ejected position.
"But designers know better."
Or beancounters. Or PHBs.
"I say chaps, how can we save money on fascia mouldings or labels having to be localised for language?"
How about we invent some little cross-language intuitive symbols that everyone can understand no matter what language they speak or even if dyslexic. But make them different to everyone else's so we can patent the design.
Great! Now how can we save on the cost of highlighting them in contrasting colours?
Just emboss or raise the symbols so they are black on a black background because we just know everyone has bright lights pointing directly at the front panel and it's never dark there, no siree.
Excellent! Anyone for choca mocha skinny latte?
"Just emboss or raise the symbols so they are black on a black background because we just know everyone has bright lights pointing directly at the front panel and it's never dark there, no siree."
Anyone tried finding the supplied screwdriver plus drive mounting screws in an HP MicroServer in a dark corner of a server room?
Black on black.
And yes, mobile phones were banned in that server room.
OR the all-knowing server case designers place the power button right where your little knuckle would naturally rest while levering out a thumbdrive from the conveniently placed top USB 3.0 ports...
Took three damn hours to resych the RAID on that bitch, ate the billable time because it was my fuck up. Spent it between staring at the percentage on the screen, praying the OS would boot, and hating on whomever genius that thought it was a clever design...
I used to work on a helldesk for a financial company that supported remote branches that had no local techie. One chap landed a call about a file server that wouldn't boot and after supposedly trying all of the usual booked an engineer (which cost the company) to go out and check it. The engineer arrived on-site, pressed the eject button on the floppy drive and went home. Needless to say my colleague got thoroughly chewed out for that.
The big one these days is a USB drive plugged in overnight so the pc doesn't boot the next day.
On lenovos all you see is a black screen with white blinking cursor.
Trivial to fix once you know about it, but with USB drives getting tinier, can be easy to overlook on a quick glance at the box. It's one of my top questions for checking out novice techies.
I'll use that to replace my old one. That ran: two new WinBoxes, identical specs, with fresh installs of the OS and DTP software. Both have Quark Xpress parallel port dongles in place. Both are connected to the Internet. Different serial numbers for Quark on both machines, adjacent to each other on the list of serial numbers as supplied by Quark for this deployment. One machine runs Quark, the other doesn't, producing a "license failure" message. Diagnose and repair.
@TRT: Check if both PCs have the same settings for the parallel port in BIOS/EFI. Check that Windows has parallel port with exactly the same settings on both machines. Confirm that the OS sees the dongles. If not, check that they are screwed in (or clasped) properly. Swap the dongles if it still fails. Swap the hard disks if the error is now that the serial number doesn't match.
Almost certain it's a parallel port off or failed, or one dongle failed. Very unlikely to be any other cause.
@toughluck.
I'd swap the dongles over first, but you're right. Based on a real world occurrence. F***ing Dell delivered 40 machines with (parallel port settings) 30 set to ECP, 10 set to SPP in the BIOS. Took the previous tech days to discover that some didn't work. And to figure out what was wrong, they set it in the interview for the new computer tech, me. I figured it was such a stinker I'd carry on using it!
Not just "pc" in scope for the usb key issue. A Dell 1950 does that if usb hd is enabled as a target in the bios and the bootloader on the key is borked. The first time its nearly had me napping because I thought I had video issues on some of the boxes by the time I wandered back to the kvm station on another floor before deciding to get someone else to perform the complex task of pushing the on button on the contents of a rack one by one while I stood at the station and being able to see all the perc controller crap etc before going into blinky underscore of death mode.
To the original story, as a *owner of dell 1u hardware, there was obviously a requirement to be utterly deaf with no skin sensation of draft in addition to colour blind for the local IT support. When they first power before the environmental sensor tells the board thingy that no its not about to melt (this is a achievement...), all the considerable amount of very small high rpm fans arranged across the middle of the chassis accelerate to max speed and it has a go at making the rack move from the rearward thrust if you leave the rear doors off the cab/had to find a creative solution a too short rack cab...
* now ex, I ripped the xeon's and ram for my workstation out the last still twitching still overly hot carcass of the last one this week, and it felt good to finally slay the last of the beasts.
"It's one of my top questions for checking out novice techies."
A blank screen 'cos an extra bootable device is plugged in? In 20 years I've not seen that, so maybe it's only testing if a techy has seen a Lenovo do that rather than any real indication of skill/knowledge.
Surely the standard boot order should NOT be USB or removable devices before the HDD?
"Surely the standard boot order should NOT be USB or removable devices before the HDD?"
The HP MIcroServer N40L has USB Priority set to High by default, and simply plugging a new backup drive in will cause it to look at that on the next reboot, even if you have previously set the boot order to look at the internal disks first.
Simple answer: Set the USB Priority to Low in the BIOS.
Not quite so straigtforward when you want to use the internal USB socket for booting on a permanent basis. Net booting or a hacked BIOS seem to be the real answer here.
"Simple answer: Set the USB Priority to Low in the BIOS."
Yes, exactly my point. Users desktops especially, and most servers, should be set to boot from their specific boot device as their highest (only?) priority. Only a tech or higher should ever need to boot any other device and it should never "occidentally" attempt to boot some other device. Any tech needing to boot a pendrive or other device probably has bigger problems to deal with and finding the key-press to select the boot device or changing the BIOS config should be part of that process.
"Not quite so straigtforward when you want to use the internal USB socket for booting on a permanent basis. Net booting or a hacked BIOS seem to be the real answer here."
Yes, there's always an exception :-)
This post has been deleted by its author
> pressed the eject button on the floppy drive and went home
The lone techie in a far away office was instructed to insert *the* one and only diagnostics CD into the drive of a rather large server box to determine what the fault was. The CD drive didn't have a tray to put the disc in, it had a slot that you pushed the CD into. sadly, there was a small gap in the chassis just under the CD drive.
Andrew, that wasn't at a call-centre in Bristol was it?
I once worked on a project for M$ elsewhere in the UK. The Bristol call-centre systems started collapsing at 18:00 every day. Much wailing and finger-pointing at us the software devs. How could our code be so useless? Eventually, we were sent to the call-centre in Bristol. Standing looking at the main server screen at 18:00, it suddenly powered-down.
WTF just happened?
Still staring at the dead server, we were annoyed by the interruption. "Excuse me, dears" said a cheery voice behind. Turning round, we found the cleaner, with her hoover plugged into the one power socket in the server room. With the server's power lead dangling on the floor.
"Err, did you just unplug that?"
"Oh yes, dear, but I always plug it back in when I'm finished."
A while back I was between assignments, and got some ad hoc assignments from my temp agency.
One such memorable assignment was related to a credit card terminal.
The assigment: to set up a new terminal in a shop.
I got a nice little documentation burn before you read and self terminate after package. And spent some time familiarizing myself with the procedure I was supposed to perform the next day. Didn't seem to complex, but there was a part with activation that needed a 25 digit code or some such. This part looked like it was a bit beyond your regular user.
I then drove a couple of hours to the site. Took a look at the terminal I was supposed to set up and activate. It was already placed where it should be, and nicely connected. But not on or activated.
Talked to the customer, and she'd had a bad experience last time she tried to activate a terminal.
OK, as the connections looked fine, so I fired it up and looked at it while it self tested all OK. Including connections to the bank... basically it seemed to be activated already.
- Hm, has this terminal been sed before somewhere? I asked.
- Naturally, it's the same I had problem with the last time. We didn't need this POS for a while, so we've had the terminal in storage the last 6 months.
OK, so we tested charging my card some minimal amount. Worked like supposed, both banks accepted. Had the back office test while I found some lunch. All OK. (I allways get suspicious when something look to be too easy, hence the vigorous testing.)
Total time spent fixing the issue: 5 min
Time spent testing: 15 min
Invoice: 1 hour work + 4 hour drive + expenses.
If nothing else, it makes for a good story.
All tests turned out OK.
I had to drive for about 2 hours to Manchester (4 hour roundtrip) once just to prove to the data centre engineer that the aircon had failed. It was the usual response, everything's green, all the units are on. My environment monitoring told me different, as did my servers which were close to their operational limits and automatic power down.
So it was on a Sunday afternoon I drove to Manchester with a thermometer, on arrival took said engineer to cab where it was possible without the thermometer to diagnose the temperature issue, the warm draft up your trouser leg from the floor vent gave it away. So he took me to the offending air con unit and pointed out it was working - everything was green. On my request he lifted the floor panel next to it and stuck his hand in the air flow - it was blowing hot air - no **** Sherlock.
So one Sunday afternoon ruined due to said engineer clearly having no common sense, a lack of ability to tell what temperature it is either from his own senses or utilising something as simple as a thermometer and believing what the aircon was telling him rather than his own senses or umpteen temperature sensors.
Back in the last millennia, I worked for a company who had a server at a customer's site in a London train depot, I was based in Derby
It "stopped working", no power to it at all. Being the lowly trainee at the time I wasn't important to speak to the client myself, but was dispatched on a train to the bright lights to try to resolve the issue, armed with spare PSU and cables.
Arrived on site, lifted the floorbox, and turned the plug back on
Case closed, long lunch on expenses, beers on the train home
Back in 1997, I was asked to drive from work to a travel shop to fix a printer issue.
The staff had moved the PC and Printer and reported that the printer parallel cable would not go back in the same slot on the computer.
Drove down in my little Fiat Panda to find the parallel cable connector had the screw holder attached.
Two minutes later the PC was able to print again, when the screw holder was reattached to the PC.
On my way back, I decided that this was time to move jobs. A 6 hour round trip for two screws.
In the heady days before triage...
I had to drive from Manchester to Aberdeen for a printer that wouldn't print. Walked in the room spotted that they were using cheap ribbons that didn't have the little piece of conductive tape at the end that marked the end of ribbon.
Asked for a spare, fitted it and walked out, drove back home again.
Had to get from Manchester to Leeds within the hour or "we'll cancel the contract" for a dead monitor. Walked in turned up the brightness control and walked right out again.
Sometime around 2000 when I was a support technician, I was asked to go to see one of the directors. He's been abroad on a business trip, somewhere like Hong Kong, and had been unable to get any sound from his laptop at all. He demonstrated in his office by trying to play a CD.
10 seconds after looking at it I rotated the volume knob at the front (Toshiba Satellite Pro) and walked out before his face became too red through embarrassment.
Responding to a call out for 'network problems' I once drove 2 hours into the treacherous snow covered mountains of western Canada only to find that the machine in question was in WORKGRUOP. 2 hours there, 2 hours back, plus one hour of on-site (even though it took about 2 minutes to solve). Our troubleshooter did ask over the phone, but unfortunately there still hasn't been dyslexia for cure found.
I was staring at a nice rack (ooh-err) in a server room in West Africa, a few years ago, and recorded some interesting observations about the Cowboys we'd hired, when the civil war was on. The main observation? Well, what do you use to keep a server in place, in a rack, to prevent it falling on the server below it? Erm, you use the server below it. So, one server correctly fixed in, on rails, another two on top of it, and a customer with a distinctly African sense of humour taking the piss out of my employer!
About 15 years ago the company I worked for bought a HP Laserjet. It came with an Infrared port that was on a lead that plugged into a serial port (it looked like a small mouse). The idea being that a user could maneuver the IR port to line up with their laptop to send a print job (not that IR printing ever worked!).
Anyhow, I was setting the printer up whilst a colleague was configuring the network queue (Novel as I recall). While I was plugging it in and filling it with paper a Senior Design Engineer saw me tinkering with the IR "mouse" and asked what it was. With tongue firmly in cheek, I said it was a voice-recognition device for the printer. She didn't believe me until I said "print test page" whilst holding the IR port up to my mouth. She didn't know my colleague had already queued up said test page on the Novel queue so, sure enough, out popped a test page once the printer had warmed up.
She was gobsmacked.
Absolute true story.
"(not that IR printing ever worked!)"
Oh, I don't know... Some 15 years ago I had a MP2100 (Apple Newton, the first generation of them that actually worked and consequentially the last they ever made, had still the best address database on any mobile device ever, but I digress...). The local Staples had a HP Laserjet as a demo model on display. The customers could press a button to print a test page. The printer also hat an IR port. So every now and then whten I had to print something I'd pop into Staples and used the demo printer. Always worked like a charm.
Working for an audio company once, i got called out because the techie on site almost went nuts. It was a small setup with just a few mics and he did not manage to get the signal out to the speakers. I´m quite skilled in phone support. So i went through the procedure step by step up to the point where the guy asked me i if thought he was an idiot. Inputs fine, output routing on the mixer fine, amp lit up but no signal indication. So i went there (didn´t have to fly to the states) to find that the bloody amp was indeed lit up by some blue LEDs onboard the PCB, shining brightly out of the cooling grill on front of the amp. Without being switched on ... Techie was quite embarressed but i don´t blame him. ISO Norm states that an electrical device has to be switched off completly by the main switch. Well, that piece of crap is no longer avaible on the market. As for servers i appreciate some blinkyblinky to indicate the state of the PSUs before i switch it on.
When I worked as agency staff as a field engineer, I once attended a business that had a problem with an IBM PC, that the clock kept going wrong on.
Unknown colleagues had previously investigated this issue on multiple occasions, the last time they had replaced the motherboard - but this issue hadn't gone away...
So, when the support request was reopened due to a "repeat issue" I was sent to site.
When I was able to access the site, I discovered what the problem was - before I had even touched the PC.
Approximately 5 feet to the right of this PC was a Novell Netware Server - that we didn't support
- and it's system time was wrong by exactly the same amount...
I think the customer would have been a lot happier if they'd received this information - and the bill - on the first ever site visit, rather than after my colleagues had already been to site on several occasions...
About 10 years ago I was asked to fix a Cisco router by my (then) employer's office in Kampala, Uganda. Not being Cisco qualified to write a config, but having a config available that they could use. I emailed them the file and informed the office that they needed a Cisco engineer to do this, which they then told me they would source locally. A few days later they emailed me to say that they had got an engineer in, that he had taken the router back to his office, zeroed the router and loaded the configuration and brought it back to them. I asked if there was still a problem and they said that it was just not working - nothing switched on or booted. Asked then to check the power source which they did and said it was all correctly powered and plugged in. As a precursor to coming out there I requested them to send me a photograph of the server cabinet.
The cables were all correctly plugged in, exactly as they said ... just that nothing was actually connected up to the mains.
When you support one of the largest resources projects in Australia, people tend to get toey when they can't access the Internet. I got a phone call on a Monday morning to say that their internet was down and they could not get it working. So I had to catch a flight at 11.30 in the morning, fly to site (2 hours), sit around for 2 hours (in 35 + degrees Celsius) while they tied up the barge and installed a walkway, get on board then find out that they had fixed the problem shortly after I left Perth. Turns out that the POE injector for the wireless link (over which the vessel connected to a satellite service) was in the bridge and someone decided they needed the power outlet to charge their mobile phone.
Fortunately I did have to adjust the rotation of the aerials, so at least I wasn't totally wasted on site ( took 30 minutes for the paperwork and 5 minutes for the adjustment)
Could not fly back until the following day so was there overnight.
...I was once nearly refused access to a satellite office.
Having discovered that I actually lived extremely close to said office (a converted house), the duty to look after the place was duly transferred to me rather than sending a support person that lived a bus and ferry ride away. Fair enough.
I'd been supporting them for months, gotten to know the staff over the phone, gotten friendly.
Well, the day came when the boss wanted me to pop by and give the computers a once-over (don't ask). 'Great', I thought. 'I'll rock up there, and be home with the kettle on pronto'.
I rang the doorbell. Admin / reception person answered with a look of incredulity on her face. 'Yes?' she asked, just barely removed from full League Of Gentlemen membership.
'It's ******, from IT'
'What?'
'It's ******, from IT. We talk on the phone - I'm here today like we organised'
'Oh no. No, you look too young'.
'Sorry?'
'You look too young. I thought you were a bit older. I...I don't know if I should let you...'
'Touch the computers?'
'No, sorry love'.
I turn, take a couple of steps down the path.
'That's OK, I'll just close the ticket and let my boss know I'm too young to be trusted'.
'No..I...come back...'
With that, her manager came out (having not heard this exchange), shook my hand, and took me inside. With age and wisdom, I don't know why I didn't clear off anyway owing to flagrant ageism - it runs both ways.
A small local firm I do tech support for forwarded me an email from their ISP late on Fri 27 Nov instructing them to update their router with new DSL sign-in details by Mon the 30th Nov "or it'll stop working".
So I go in near closing time on Saturday to minimise disruption just in case anything goes wrong. Put in the new details just to find the new credentials don't work. Try ringing the ISP only to find their support desk is shut until Mon.
So I ring them up 1st thing on Mon to have a moan. Luckily the guy's more awake than he was when he first sent the details out checks the login attempt log at his end and noticed after pasting the password into Notepad that the two characters in the middle should be lI and not 1I...