What's in a name?
As the old mantra goes, it's always DNS...
Welcome to a continent-trotting edition of On Call, in which a Register reader takes a trip to sunnier climes only to be let down by a clown in windswept Blighty. Our hero, whom we shall call Simon though that is not his name, was gainfully employed at a UK telecoms outfit way back in the mid-1990s. Carrying the vaunted title …
Or you go to check the cable is indeed properly plugged in only to find the family pet happily gnawing through the insulation to get to the tasty, tasty tingly bits inside.
"I've found the cause of the outage. I need to buy a new cable & you need to bury Mister Giggles. Rest in peace little kitty..."
My rats loved the cable with plasticizer ingredients to allow bending, never touched copper large diameter in brown plastic.
The nibbled edges of cable are still visible in my office, but covered with blue electricians tape where bitten. Like the plasters used in food kitchens, visible and gives a warning.
RPI my rat. a 'man' Looks at cable icon
A friend described a similar problem with a house rabbit (who knew!) that nibbled through various cables until the mains lead was the last straw. Years ago my cunning plan to have the builder route thin ethernet round a new build house eventually fell foul of mice. Perhaps not always DNS?
Had something similar a while back.
Plugged cable into socket... couple of hours later the link went down, found cable not fully inserted into socket. Assumed my error, reinserted cable, checking for the click this time... couple more hours, no link, cable actually hanging loose this time.
Turned out to be an odd combination of extra strong spring contacts in the socket and the motherboard being slightly out of alignment with the case, so the socket was fouling the back panel slightly, the latch wasn't latching properly and the strong contacts had the strength to eject the plug.
A place i used to work at had a hideously complex solution for DNS.
Some core DNS servers looking after all sorts of name spaces
A couple of ADs, each running DNS, with forwarders setup to the core
External DNS servers replicated the public bits of the namespace.
All tied together with a bunch of scripts, schedules databases and who knows what. The people that put most of this together had left long ago so that the only bits anyone really understood where the AD DNS (not that it is particularly difficult). The rest became a complete mystery and it was not unknown for all sorts of weird things to happen if the core DNS was given a funny look or a keyboard pressed too hard. Clients would "go missing" or only exist on one of the servers. Websites would only resolve if you were inside the border, or the reverse, public facing services would stop resolving and nobody could find out why.
As usual, smart people doing really clever things that are great as long as those people are still around to support it.
DNS can be as simple as you want or as complex........
My approach to the "Websites would only resolve if you were inside the border, or the reverse" problem is:
Any external domains where the services are hosted internally, are a CNAME to the internal domain.
Eternally, the internal domain resolves to the gateway address. Internally, it resolves to the actual server.
It (mostly) works. You just need to remember to update the internal domain in three different places - external, guest network, internal network.
>>Most of the AD admins I've met didn't even understand the concept of a reverse resolution zone
Where I work this appears to be true.... none of the other windoze DCs (there are many) are set to create/maintain PTRs in their AD Sites (mine are...) and that's for quite a large domain.
There is another small matter, which I will not elucidate more than saying "internal domain controller located in a DMZ", that serves to convince me that the bunch of windows acolytes in charge don't actually know or, indeed, care what is going on as long as the lights are blinking and the rust continues to spin.
Have a pint for Friday ---->
Not a DNS story, but ... at a Big Hospital+Uni where I worked in the 1990s, we ran bootp, not DHCP. Win95 boxes used billgpc.exe to deal with bootp, and IP addresses were assigned by techies editing a text file, from the console, on the Novell server which served that subnet (of which there were many - we had a class C IP assignment).
We techies were motivated and careful, and we had no disasters, but we did get calls from people whose PCs wouldn't get an IP address ... because instead of calling IT, they'd simply picked up their PC, moved it to a different office, and plugged it into the network jack ... which was connected to a different subnet than the office the PC came from.
10+ years ago at a previous employer, we had a nerd in HR, he had all of his home network devices set to static addresses, including the wireless on his mobile phone, which he would bring into the office, with the same IP address as the device handling our DNS**, fun times chasing that one down*. Funny thing is he was almost always the first to complain the internet was down!! When we figured it out, I was happy to drop the bomb on him! It's your own feckin fault! Nimrod...
*(he later claimed it kept him safe while torrenting?)
** this was quickly changed to something different, the network guy learned something!
It isn’t always dns
I had someone do similar but they plugged a dhcp server into a production segment. The server was for building and imagine hp thin clients.
Could not work out why nobody could get an up address as it was giving out the same addresses as the lab but with itself as the gateway.
That was a fun morning….
We have a projector which even if set to "do not do DHCP" still does DHCP. Odd, as it's a Panasonic and we have a lot of other Panasonics which are rock solid and respect the DHCP setting. Panasonic tech support (who are generally pretty good, once they get around to answering your email) refused to believe us, so the blasted thing is barred from the network altogether and when I need to fiddle with it I actually have to go out and use the remote!
Not seen that with projectors, but have seen it happen with random printers.
For the worse one, I go to the DHCP server and set up a reservation for that IP address, so if it does go into DHCP client mode - it gets the IP address that I want it to have in the first place!
Printers? DHCP? How about we set the wayback machine for early 90s, with some lab gear. The devices did not use DHCP, or bootp, but RARP. Having gotten an IP address, they fetched various config data via TFTP. Worked fine (for small enough values of "fine").
Then it didn't.
One day we started getting complaints that the boxen could not be accessed from Windows (NT3.1?). Worked fine from SunOS, SYSV on 386, even VMS Vaxen. Huh?
Apparently, someone had brought in a "network printer adapter", so they could use their Centronics-cabled printer over ethernet. This box had the charming feature of, if it saw an RARP reply, deciding that it was the intended recipient, and re-assign its IP to that in the reply (which it had not solicited).
When a user initiated a connection, and it had not already, it would ARP for the desired IP address. The actual box, _and_ the "mini network print server" would both respond. Which MAC address should it use?
Apparently, most OSes would take the first ARP reply, while Windows would take the most recent. Since the computers were typically in the same lab, if not on the same bench, the gear was on the same ethernet segment and would be seen first. The printer adapter was on a different segment, so would reply later. Windows, and only Windows, would accept the second, unsolicited, update. So the first packets actually intended for the lab gear were instead sent to the printer adapter, which for some reason was not listening on any of the intended ports.
Sorry about the verbosity. trust me, the actual diagnosis took longer than typing (or reading) this.
Suffice to say that this pioneer of BYODADTI (Bring your own device and don't tell IT) a department head, of course, was informed of what his brilliant actions had wrought.
Then an actual policy about such unsolicited upgrades was written. (and read? don't be silly)
Err… I set my desktop devices (computers, servers, printers, etc) to use DHCP reserved addresses. Mobile devices (laptops, phones, tablets, etc.) just use DHCP. It makes it trivial to detect Someone Else attempting to get on the network. I have a nice little spreadsheet with the IPs, MACs, and names of each device, plus other relevant info. The only devices which have a truly static IP are the DNS/DHCP server and the cableco ‘modem’. (Not the same device, I turned DNS/DHCP off on the cableco device, it screwed up too often.)
Several times there have been attempts to access the wireless network using default cableco creds and/or default router vendor creds, and default IPs. As my net doesn’t use the default creds or IPs, those attempts fail. And the attempts are really obvious.
In the late nineties I set up a similar sounding "suite" of computers for a research conference we had organised. It was hosted at a local hotel and the hotel had an ASDL line (with BT) installed for us. Set up was simple, our university IT bods had supplied a router and configuration instructions and all was running well - the academics could all log onto their emails (most had login details written on bits of paper) the day before the conference started. Next morning there was no connection to the outside world. All the connections checked out, the configuration hadn't been changed. The telephone support that BT offered were confident that the fault was at our end.
I had to get the hotel management to call in the BT engineer. Said management made it clear we would have to meet the costs if there was no fault. Fortunately the hotel's ASDL contract was sufficiently expensive that an engineer arrived inside an hour. He checked all my wiring, configurations, etc. and could find no fault. He then made a call to the relevant exchange and two minutes and some sheepish looks later the system was working again. It turned out someone had unplugged the ASDL line at the exchange end
Or as in one memorable case a whole small CSP was down for a week with the BT leased line failing in interesting ways and other business users in the same area also reporting issues, but BT repeatedly stated there were no known issues. Then suddenly it started working without us doing anything (we'd run out of ideas after doing everything we possibly could multiple times over that week) and a couple of days later BT sent out an email telling us that they'd finally fixed a major fault and they were sorry for the intermittent service over the last week...
For me, at home, Plusnet, but same difference.
I had degraded ADSL service and no incoming voice calls, so called customer service.
"That will cost you £150 for the engineer" came the reply.
So as I was out of contract, I just got Virgin instead, not much extra for 100Mbps, and not a penny more.
After a few years Virgin dropped to 50Mbps and upped their fees. They refused to fix the speed.
So I gave notice and moved to Now! who use the old telephone line.
Lo! and behold, the old Plusnet fault was still there, but as it was a new install, they came out to fix it gratis.
BUT they didn't fix it for 3 weeks (it was a single break in the copper pair on the pole outside, and the ADSL still managed a respectable 20Mbps) so I got compo from Now! equating to 6 months free service. I have yet to pay a bill.
A week after I gave notice to Virgin they 'fixed' the speed (i.e. removed the traffic shaping) and I got my 100Mbps back. But the damage was already done. Bye bye beardy!
This, to be fair, isn't just a BT phenomenon. It seems to occur with lots of tech based services.
You try all the usual fixes. Nothing helps. (Online it says there's no problem, if such an option is available).
You phone the service supplier who insists there's no problem at their end.
You go for a cup of tea while you think about it.
You return to the problem.... and it's gone.
Think you've got your dates or technologies mixed up. ADSL didn't launch until the early 2000s in the UK. Unplugging it at the exchange sounds about right for BT though!
BT certainly didn't have any back in the 1990s, it was the very end of the 1990s when we put in Colossus - which was the backbone that initially delivered the backbone network for Dial-up and leased line connections and later broadband services.
By 2000 all we had was a lot (250,000+) of dial ports and a load of leased lines.
That would make sense. One of BT's favourite tricks was disconnecting ISDN lines randomly, that included Business Highway, Home Highway and traditional ISDN.
Lost count of how may times we came to use our fallback ISDN connections for sites only to find that BT had switched them off or unpatched them for no reason. Part of the issue I suspect was the lack of dial-tone so some engineers thought it was a spare copper pair.
likewise. every time I saw a bt van outside the wooden hut that bt laughingly referred to as our exchange, it was a 50/50 chance that one of our isdn lines had been nicked. despite the fact the isdn boxes had little green lights on when they were working, every time I called to register a fault, I had to agree to foot the bill if it turned out to be my fault.
It's still pretty normal. There are exactly 3 properties served from the DP atop the pole by the roadside here.
The third property have had great problems with their connection, which didn't exactly fit well with "working from home".
Coming back from town a few months back, I noticed a batch of engineers working at the manholes up the road. Sure enough, on getting in the house, I found our line dead.
A walk up the road followed by "Oh it's nothing to do with us. We're just testing the cables here."
"In that case whose is the ladder leaning against the pole with the DP on it? I'll go and lock it safely away."
I did, and logged a fault using my mobile. Later that day I handed the fresh engineer a spare ladder, and was told that on examination our pair had been "borrowed".
They've been that way for years.
I remember one case they had been working in the manholes for hours trying to get a neighbour working for their business line, at which point our line went off. Had a chat with them, engineer was adamant he'd not done anything to cause it. Fault raised, engineer next day found our line snapped in a temporary joint. He looked at the bag, it had the previous days date on and details of the engineer, he then gave me a description of said engineer - yep the one the day before who said it wasn't him!
Another time I got home, found no dial tone and my line was down at 57M from 80M. Engineer turned up and found the issue instantly, another engineer had redone the DP, but instead of the pair to my house picked two random wires! Correct wires connected and back working. Impressive that I could get 57M with one wire disconnected in the DP!
We had one of the earliest ADSL lines out into a school I worked at, and that would be late 99 or early 2000 from memory as I only worked there for a year.
After that, I moved back to the local County Council, supporting the Regional Broadband Consortium, who eventually ended up with a telco licence and putting naked DSL, then EFM into schools. As there was no dial tone, we regularly lost service when someone stole our pairs as there was no tone…
(probably told this before - apologies)
Working in local radio in the late 1990s, we still had analogue lines (EPS85?) into local sports grounds etc. When not in-use (no kit plugged in) the thing looped studio programme back to our racks so it was possible to check the line was working without going to the stadium.
On several occasions everything would be absolutely fine up until a day or two before a big international event, then our line would go dead.
Nine times out of ten it was a local BT engineer, tasked with installing temporary phone lines for foreign broadcasters, checking the local DP by simple dint of looking for line voltage (50V for phones, 70V for ISDN?). Of course, our analogue lines were free of such so, quick yank of the jumpers and temporary line installed... but not working.
I think it was about 1996 that we got our first ISDN kit and lines installed. Didn't solve everything, but did make ad-hoc outside broadcasts somewhat easier to arrange, especially with Cabletel coming to Cardiff and very, very keen for business.
Comxast has a nasty habit of doing network "upgrades" that require you to reboot your modem te regain connectivity the next morning.
One Sunday, this did not work. I looked, as one does, at what IP was being pulled by the WAN side of my router. "That's funny," I thought, "the default GW subnet doesn't match the IP I was assigned", followed shortly by me setting the router for manually assigned IP and setting the default GW to the same subnet as the IP address I was originally assigned. Voila! Connectivity! Yeah, they had entered, and were handing out the wrong value.
Then, I made my fatal error...I tried to communicate what I fou s to Comcast's helpline. Suffice it to say, I had great performance unitl mid-afternoon, because I was the only user on my subnet.
Now on FIOS. It seems to be a much more professional operation. Which isn't really saying all that much.
Once upon a time an HPUX server had to be upgraded from 10.20 to 11.
It was top critical as it was handling all the bank transactions of a 9BUSD per year turnover company, through an X400 messaging relying on an X25 line.
No-one had dared to even touch the thing, but my then boss (a notorious cowboy) gave me this "gift" as a new joiner :)
"Easy stuff, mate, just boot on disks, install HPUX 11/X25/X400, then patch all with the repository, end of job". Bastard.
As it happened, all went well, that evening. Except the darn X25 daemon which would flatly refuse to launch.
It took me all night, many tired calls to HP support, with the first lines asking the usual trivial questions, before ending up at the X25 L3 specialist which gave me the solution, one I would have never found alone.
I left the building at 11:30 am, having started the day prior at 8:00 am, completely zombified ...
It took me all night, many tired calls to HP support, with the first lines asking the usual trivial questions, before ending up at the X25 L3 specialist which gave me the solution, one I would have never found alone.
did you really leave us hanging like that? i mean...
I remember going to a conference in the US. One of my colleagues, his first trip abroad, did a couple of back to back sessions logging on to the systems back in the UK.
These were the days before wireless where you unplugged the cable from the telephone, and plugged it into your laptop.
All went well - till he came to check out of the hotel and had a phone bill of thousands of pounds (more than the cost of his whole trip).
When he got home and tried to claim his expense, his manager paid (after lots of grumbling) and asked "Why didn't you use the AT&T freephone local number". No one had explained to the poor guy how you work internationally.
Ah I remember the joys of our International remote access system and the joys of finding the right AT&T local number so it was free from the hotel. Generally recall most of them gave you terrible speeds.
Thankfully I managed to be one of the first people on our VPN based solution, so I only had to suffer it for a couple of tricks. I also recall back then quite a few hotels were not expecting VPNs so you could use their WiFi back to the VPN without seeing the screen to sign-in/pay for the service.
Now international roaming...
One of my colleagues came with us to Canada when his son was about one month old, as we were flying there he was taken into hospital seriously ill. Got the message when we arrived (I had the group mobile) and gave him the option to fly home straight away. He decided to stay and my boss told him to use the mobile/international charge card as needed to speak to family. My boss wouldn't tell me the size of the roaming bill, but he did confirm it would have been cheaper for him to fly home.
"Ever been as ready as you can, only to have all your hard work undone by slipshod behavior"
Yes. Every time BT is involved in anything. If you want something well and truly fscked up, you can rely on BT.
(the guys on the ground are generally fine, it's the management level that balls it up)
This holds true for Virgin Media too. With the rider that the frontline telephone support are often pretty clueless if their script doesn't take them to a known resolution.
But VM managers seem to think it's a good idea to pretend there's no problem when there is. So the service status won't show an area wide fault and those frontline script followers won't have been told, so they just carry on with the " turn it off and on again"s. But sometimes you'll get one who's a bit less clueless ( or even actually sensible, though that's a rarity) who'll say "Funny, I've had quite a few calls saying that today.......".
Often you'll only find out it's a problem at their end after you've been to the "We'll send an engineer to you, how would a week on Wednesday do"" end of the line, then found out that the service has come back on again an hour or two later.
Cellnet was 60% BT until they bought out Securicor in 1999. But was run at arms reach until it became 100% owned.
BT didn't quite seem sure that mobile phones would take off, hence a joint venture and not using the BT name at the start. Similar with the Internet it took a good while before they realised just how big it was going to become, pre-Colossus (multiple STM-16 links) their internet platform didn't even have anything as big as a STM-1 link in it.
For the youngsters these are funny SDH links the technology we used before we dared to use Ethernet for WAN links. STM-1 being 155M and STM-16 2.5G.
And then bought EE, one half of which was formerly Orange, initially established by Hutchison, who now own Three, and was at one point owned by NTL, now Virgin Media, the new owners of O2, and at another point owned by Vodafone. Orange is now owned by France Telecom who changed their own name to Orange.
The history gets fun in Telecoms. Freeserve were launched by Dixons and an ISP back in 1998, Wanadoo (part of France Telecom) purchased it and then it became part of their Orange UK operations. That then became part of EE and of course BT purchased EE, so now part of BT.
Freeserve shook up the market back in the day and BT fought against their business model/approach with the regulators. So now whilst the original brand may be dead several of the team that worked on Freeserve in the early days are now BT employees via the chain of acquisitions above!
And on the telecoms side of things, Freeserve partnered with Planet Internet which became part of Cable and Wireless which is now part of Vodafone (infrastructure bit), and also part of BT via the T-Mobile side of the EE merger (wireless bit), and also part of Virgin Media (residential cable bit).
This post has been deleted by its author
Once upon a time, me and a counterpart on a project were working on a split DNS concept for a big intranet in a meeting room which had a lovely massive whiteboard (I think it was easily 4..5m long). After what was a good session we had the whole thing full with notes and scenarios.
It was only when we decided to wipe the board that we discovered that we had been using permanent markers that some nitwit had placed there instead of the wipeable ones that should have been there.
As we had cause to be in that room I could see that it took weeks before they tasked some poor sap with trying to get it clean again.
And yes, since that day I check what I have in my hand, without fail :).
Hmmm. I might..."might", I say...have been inclined to think, "F it", and leave the cleanup to whoever put the markers there.
But I would have been responsible, and gone in search of a large bottle of isopropanol (making sure it took some time, which may have involved a visit to the local pub) before returning to clean the board.
That's all doable as long as it's your meeting room and not booked hourly with others, and it wasn't our building so we couldn't hang around either.
Intentions were aplenty, but nil ability to execute them. All we could do is notify them.
Oh, and we removed the markers. Just in case.
At one place I worked it was a regular occurance.
Some "bod" would write a message with a pernament marker instead of a white board marker.
Went on for years until a new H&S elf discovered _exactly_ which solvent we used to clean off the ink.
Oh dear ..... :-)
Once seen an entire trading floor network taken down by some genuis (these were the days of manually configured IP addresses, NT 3.51 era so late 1990s ) by the technician manually getting the workstation IP and gateway backwards, so making the new workstation think it was the gateway for the entire network.
Actually, I say once. More than once
Had the same sort of thing today. I was supposed to be demonstrating a new deployment system today, via teams, To make it easier to stream, I set up a vm and have been deploying stuff to that machine for weeks restoring a snapshot taken just after the vm was setup to ensure the VM was clean.
It's been working fine all week, but come the demonstration, the VM just sat there doing nothing when I tried to deploy to it. Restoring the machine to a clean state and restarting the process just produced the same result.
Thankfully, I'd screen captured one of my test runs, so I'd been able to use that to demonstrate
I was onsite deploying servers at a private hospital once where the rack had been carefully replaced by the CTO of my firm right next to the emergency power cut off switch. Essentially, if you opened the door too quickly, you'd hit the power switch and cut the power to the whole building.
Naturally, all us peasant engineers were careful and we took the door off the rack and flipped it so that it would open the other way, but the CTO whenever he went onsite would flip it back and 9 times out of 10 he would trip the emergency cut off like the colossal bell end he was.
What might not surprise you is that he now works for Rackspace.
Had similar issues years ago, used to run the IT for DVLA number plate auctions for my company. Temporary dial-up lines (long time ago) and the connections were often flaky. DNS caused much grief at the time and a few entries in the HOSTS file solved many potential problems
I was managing a launch of a very early public access system in libraries, everything had been tested many times but the delivery of the kit and the kiosk to house it was late.
To make matters worse the cabinet provider delivered the kiosk to the IT department not the library on a day when no van was available. Fortunately that fitted in the back of my Ford Granada with the seats down.
We fitted everything into the cabinet and then found out that it refused to talk to the central server over the network, my technical team were getting nowhere and we had the Leader of the County Council, local MP Service director and press arriving in an hour.
The only quick and definite fix was to get the server to site and connect the kiosk directly to it.
I left the team sorting out cabling from the kiosk to a nearby issues desk and went to the the server in the trusty Granada. I did notice some white powder on the floor by the server (under a desk of course). I brought the server back it was connected to the kiosk, everything fired up and the launch was a success. All the way through the presentation of the new service, interviews etc I noticed my hands were itching bu was required to be present until the launch was completed.
By the time we managed to get out and adjourn to a local hostelry my hands were double their normal size and beetroot red. It turned out that the white powered was ant poison which was very toxic. needless to say once we sorted out the comms problem the server wasn't returned to the Library HQ but was installed in a rack in My data centre (having had the ant poison cleaned off the cabinet)