I wonder if he is the same guy
I remember one guy explained something about AOL mail inner workings on Slashdot, a usual "citation needed" nerd asked "How would you know?", he replied "Because I coded it."
Welcome again to Who, me? In case you've missed previous editions of the column, it's a confessional in which readers share their stories of having broken stuff. Badly. This week, a fellow reader named "Bert" asked you to "Cast your mind back to the mid-90s when America Online (AOL) was the biggest online service and the …
Probably fully formatted on the vehicle as:
I had a friend who operated a mobile disco service, and considered himself to be quite “with it” on the internet. He wasn’t bad, as he was perfectly capable of setting up a hosted domain and coding a rudimentary HTML page complete with every flashing and marquee effect the W3C had made available. An early adopter of internet advertising, he had posted his URL on his brand new vehicle INSTEAD OF his phone number.
His business pretty much evaporated overnight. For two reasons. One of which was that it was 1997 and no-one that seemed to require his services owned a computer, let alone a modem. Secondly, the URL on his van had omitted all the non alphabet formatting... it said along the lines of httpwwwmobilediscocouk.
When I was at university in Oxford in the early 80s there was a company selling audio equipement who used their phone number as their business name with regular ads in the student what's on listings with the tag line "you know our name, you know our number" ... which some of us referred to as "you don't know our number, you'll never find it!"
A better solution would have been exim, seeing as this resource exhaustion tale has all the hallmarks of a sendmail shop. In the 1990's a friend of mine was running a forum mailing list for a very well known computer software company and was suffering the same sort of problem caused by sendmail's 1 msg then 1 delivery -> 1 process to deliver it "paradigm". A bit of a problem on a forum with 5000 people on it. He changed to exim and the problem went away.
Correct - Exim was "experimental" for a fair few years. Postfix came out in 1998, but I didn't use it till v2.2.
smtpd_timeout was a lovely thing, not to mention all the smtp client timeouts going the other way (no waiting forever tying up a process waiting for a receiving MTA to respond).
Commercial load balancers were a bit shit but you could do something with a fast PC with a handful of network cards installed. They had FDDI on the outside and even then you could install versions of Linux that could then fan out that FDDI with a host of machines with 10Mbit ethernet on the inside, either simple round robin or something slightly more intelligent.
All they had to do to figure out load balancing was ask one of the major Usenet outfits. Wouldn't have changed the main problem, though ... AOL's mail system was a home-grown clusterfuck that had Internet email grafted onto it as an afterthought. The guy that ran it (not this "Bert" character, but rather someone I'll call "L") told me on more than one occasion that he wished he could rip it all out and start over with something sane like sendmail (!!), but the PTB wouldn't let him.
qmail and postfix had nothing to do with AOL's issues. qmail was written to address security issues that weren't seen as a problem when sendmail was written; postfix was written to be an easier alternative to sendmail. qmail's bones were laid down in late 1995, before the AOL meltdown. postfix was an IBM-Watson reearch project about a year later, and I have it on good authority that IBM didn't give a rat's ass about anything AOL was doing.
qmail was written to address security issues that weren't seen as a problem when sendmail was written
Indeed. My home use of sendmail was *extremely* short (as in a matter of days) I then switched over to this new, secure and lightweight qmail (and later added on ezmlm rather than listserv for mail list handling)
From the article - remember this 1996 when servers were feeble – the server would probably reboot
I've *never* had one of my linux boxes reboor itself because of swap exhaustion. Probably because I use the above-mentioned qmail (and postfix on FreeBSD boxes) rather than sendmail. Both of which handle queuing a hell of a lot better than sendmail does (not difficult!)
 I foolishly later took a job herding Sun boxen - all of which used sendmail. So I got to experience the joys later. Including on making it play nicely with Exchange 5.5 - which advertised that it supported ESMTP. At which point the Solaris sendmail tried doing batch SMTP delivery - which Exchange couldn't handle and so silently discarded the emails. Fixed by telling sendmail to ignore ESMTP announcments for the internal set of IP addresses that the various Exchange boxes used. Those were the good old days of packet-switched frame relay networks..
I had a thought about this ...
They could have fiddled with the DNS to get a poor mans load balancer. Set the mx to (say) a.domain.tld with (say) a ttl of 3 hours. After (say) half an hour, change the mx to b.domain.tld, also with a tld of 3 hours. After another half hour, change to c.domain.tld. And so on. You could script the DNS updates to automate it.
Then each resolver would cache just one of a.domain.tld, b.domain.tld, etc and so (using the numbers originally given would try and contact only one of 5 different MXs. Different resolvers would cache different records depending on the timing of when they last fetched the records. That was definitely doable back then.
If they had geographically distributed servers then they could also have done some conditional DNS stuff to present different MXs to different area - can be done with BIND using views, but I don't know whether that feature was available then.
'95/'96 we both had JANET email access at work and we had a rented property with a single phone point just inside the door so our Centris 650 was offline. It had to wait for the move up here to Dundee end of '98 and into our own house with a Telewest cable account and a separate phone line to get it online.
Eudora was our mail app of choice, at work for me as well as at home. Having two email addresses, one at home and one professionally seemed luxurious and somewhat decadent.
I was working as a data network administrator at an Australian university when this happened. Luckily I had written local hacks into sendmail to do a form of exponential back off when emails were unduly delayed. At the time we hosted a number of listervs and other various sundries, as well as 25,000 users doing their normal thing.
So when the outage hit our mail queue grew to tens of thousands of emails, a fair number of which were to the mailing lists (hi Pavement fans mailing list) enquiring if anyone else hand noticed the outage and asking others to reach out to users who weren't answering; my response, "Are you helping? Good, well stop." Our poor Sparcserver20 reached a load average of 128 but it stayed up, one of only a handful of Aussie university mail servers that didn't bounce at least once during the outage. I know some other university mail admins null routed the email to AOL via DNS lame delegation hacks. All our email to AOL eventually delivered about 48-72 hours later.
Did I mention I hand wrote our sendmail.cf file?
I vaguely remember it now, because of the effect on ListServe. I had a UK JANET account at work back then and other than the effects on email and listsserve we had ringside seats. That big fat, optical JANET pipe was as serious luxury. I could send multi megabite email attachments (science data) and apart from having to confirm that I wanted to do that it would go. Had to be to another JANET account though.
"10MB seemed to be an infinite limit for a mail attachment"
When I was rputting together my first PC, Tiny Computers sold you the box but you had to add a HDD and I remember thinking as I bought a 40MB HDD that despite buying the smallest HDD available I was gettting something so vastly huge (twice the size of the disk on my work PC) that I was never likely to fill!
(Though going back another 5-10 years I can remember when adding a 4kB RAM *card* - that's really 4096 bytes - to the 6800 processor at school was a huge deal as it meant we could play StarTrek!)
"I bought a 40MB HDD [...] I was never likely to fill"
Considering floppy disks used to carry 1.44 megs meaning 40 megs held less than thirty disks, and even back then a single floppy held basically not a whole lot of anything, that sounds a tad bit optimistic if said PC was meant for anything beyond Haiku storage.
"Considering floppy disks used to carry 1.44 megs meaning 40 megs held less than thirty disks, and even back then a single floppy held basically not a whole lot of anything, that sounds a tad bit optimistic if said PC was meant for anything beyond Haiku storage."
Actually, small (physically - 5.25 inch) floppies used to hold 160 kb, compared to the 2nd generation full sized (8 inch) floppies which held a massive 250 kb.
I eventually provided office applications (word processing, spreadsheet, database) for an entire office (one floor of a 20 story office tower) running on a computer with a 5MB hard drive, enabling us to ditch the stand alone word processing machines.
Considering floppy disks used to carry 1.44 megs meaning 40 megs held less than thirty disks, and even back then a single floppy held basically not a whole lot of anything, that sounds a tad bit optimistic
You had 1.44M floppies? You kids didn't know when you had it good. We had 180K floppies, and were grateful for it, didn't have to hump cartons of punchcatds around anymore.
Formatting a microdrive cartridge would stretch the tape and increase its potential capacity. If you were a really boring nerd, you could write a program that would repeatedly format the cartridge then measure its capacity, stopping after two formats gave more-or-less the same value. Or the tape snapped.
Kids of today- tell 'em you used to literally physically stretch your storage to get a couple of extra kilobytes out of it, they'll ignore you and hope you go away.
1.44 megs? Luxury ...
Seriously, my first 8" floppys were 256K ... My first 5¼s were 160K. The system ran off one of those disks. In those days, the thought of a 40meg HDD at home was purely in the realm of fantasy. (By way of reference, in mid-1980 an 18 meg NorthStar drive cost in the neighborhood of $4,200 ... in 1980 dollars. About a year and a half later, Apple debuted a 5 meg drive for $3500 ... People lusted after these GIGANTIC storage devices.)
Oh boo-hoo. And punched cards held mere bytes. Yeah, the 8" floppies on our CP/M Z80 boxes held stupid little, but it's irrelevant - by the time buying a HDD for a home computer was a thing that even non-NASA personnel could reasonably do, nobody used anything other than 1.44 floppies; compared to which 40 megs were a luxury, but a vewwy-vewwy modest luxury indeed. Nobody I knew walked around with less than a full box of floppies by then, and when your existing data instantly takes up over a third of your allegedly humongolicious new HDD, starting to longingly ogle one at least four times as big before you even installed this one is what you do, not expecting it to never fill up.
By the time buying a HDD for a home computer was a thing that even non-NASA personnel could reasonably do, the IBM XT came with a 10meg HDD and a 5¼ 360K floppy. 1.44s were a number of years in the future. Nobody I knew walked around with a full box of floppies, unless they had just purchased them.
"by the time buying a HDD for a home computer was a thing that even non-NASA personnel could reasonably do, nobody used anything other than 1.44 floppies"
Nope. My family's second computer had a hard drive and a 5.25" floppy drive. Played many games off of 5.25" floppies. I can't remember if the first one had a HDD; I do recall it had two 5.25" floppy drives.
I once had a general sysadmin-type inteview where I was asked if I'd ever hand-editted a sendmail.cf. I think it was a trap, but couldn't be sure. However, I had done it, just once - I'd added a line (or commented one out, I forget exactly) to block the open relay that was there in the default, so I gave that reply.
Mind you, I didn't get the job. Maybe they needed gung-ho and/or expert sendmail hackers? Or was me even thinking about going near a sendmail.cf file with a text editor too crazy for words? I suppose I will never know.
For those too young to remember, AOL used to bombard us with "free" CDs. The worst mistake that you could make was to put one in your PC. Trying to remove AOL was almost impossible, it wrote itself into multiple registry locations. (possibly hundreds!)
Future archaeologists will find an entire strata of discarded AOL CDs :(
That's Usenet Oracle, you heathen!
After pondering deeply, the Oracle decrees: Citing Wikipedia when there is a perfectly good Jargon File entry is grounds for immediate banishment, only revocable by printing an ASCII art Snoopy on a line printer and hand delivering it to the Harvard Science Center's observatory whilest waving a rubber chicken.
n.b. Biff was not a barker, that is a baseless, malicious lie!
"For those too young to remember, AOL used to bombard us with "free" CDs. The worst mistake that you could make was to put one in your PC. Trying to remove AOL was almost impossible, it wrote itself into multiple registry locations."
Registry locations? I recall AOL floppies showing up everywhere, and I am reasonably sure they only needed MSDOS...
Very few people had CDROM drives. I managed a special deal available for Microsoft employees and their families that could net a 2x CDROM drive, complete with the SCSI card, for only $400. No other non-business users I knew had them for years after that.
Early AOL floppys came with a "runtime" version of PC/GEOS that could be modified to become bootable. I know a couple of folks who used this as their primary GUI ...
For some unknown reason, I've been in the habit of burying "time capsules" of miscellaneous industry tat since 1993. One of those archives contains AOL floppys, the next one in the series contains AOL CDs.
Trying to remove AOL was almost impossible
One company I worked for we had a real charmer for a marketing director (took pride in the fact that he could make his secretary cry, refused to answer his own emails and generally treated anyone lower-grade than him with utter contempt - especially IT..).
We had a *very* strict 'thou shalt not install non-work software on thy PC rule' - breahing of which was a disciplinary offence and could lead to dismissal.
Said director had finally been given a laptop so he 'could work from home'. After a week, he stormed into the Desktop support cube and threw the laptop at us saying "it doesn't work". Eventually, we managed to get out of him that the corporate dialup (which was flaky on a good day) no longer worked.
Delving into the reasons why, it soon became obvious that he'd install AOL. And his home person finance software. And had an 'interesting' collection of images (this was the late 90's so nothing too amazing but they still drove a chieftan tank through the corporate guidelines).
The presence of the AOL dialler meant that there was no way whatsoever to get the corporate dialler to work. Even uninstalling it failed so we informed said director that he needed to save his information off the laptop as we were going to have to rebuild it.
Once we had the laptop back, we nuked it from orbit.. Corporate dialup now worked again.
Two days later, he was back in our cube screaming that he was going to get us all sacked because we'd deleted all his finance data - turns out that the home finance programme saved all its data in the programme directory (as was common in the late 90s) and that he hadn't bothered to back that up so all his data was gone. He went off to HR while I went to have a chat with the local site director (a really nice guy who had had this marketing director foisted on him by headquarters but nevertheless outranked him).
Site director apparently tore very large lumps out of Marketing director and told him to amend his ways or he would get relocated to the smallest, most rural backwoods US location that could be found and left there to rot (company policy was that directors *never* got sacked - even if the site they worked at got closed then they just got found an essentially-meaningless job elsewhere).
He never spoke to us again - any interaction was via his secretary (who also benefitted from his enforced change in attitude).
JANET, Compuserv, Demon, then force9 (later to become plusnet), for me. I have since BT'd, but I'm better now thanks (A&A if someone else is paying, but plusnet still used, and yes I know they are BT owned...but they are not BT. You can ring them, for a start. With A&A, or more specifically AK, I installed an early-ish consumer level voip system - network alchemy, I think - back in the days when the Rev came out to install the equipment on the cabling I had pre-installed. I had a nokia 9000 and the time and AK had just got a brand new 9110 - so it would be 1998). I was gutted when Demon went under. I first hosted my own website in the Demon days..and it's still running now...but hosted under very different circumstances.
Aol was always one to be avoided. The volume, and quality, of the advertising was the clue.
I seem to recall that would be about the time my organisation rolled out Lotus CCMail after which I pointed out it cached your User and password in plain text in the .INI file (which didn't go down well, apparently messengers are still not liked...)
Was also about the time I got my first 18.8baud modem and screeched onto the internet with CompuServe along with hideous numeric email address.
In more recent times I was involved in one of the many early internet bank launches. You will remember these as several crashed at launch due to the number of reporters wanting to report how they would crash at launch...
this case though I identified that one of the key web handling programs was a flagrant copy of he reference code in the OS vendors manual, and guess what, it didn't work and didn't release resources on exit. Therefore repeatedly crashing.
Turn up on customer site miles away from home after being called to help, getting late, just told buy some new pants and shirts on expenses, you are not going home until it's fixed... Wasn't even my code!
"[...] screeched onto the internet with CompuServe along with hideous numeric email address."
When our company gave us access to the internet all our email addresses were derivatives of the company's X400 format address - which were neither short nor memorable.
In spite of the company having a short brand name - it was several years before we got email@example.com
I worked at one of the last JANET X25 sites before the service closed in 1997 - the internal network was TCP/IP but one Unix host had Coloured Book Protocol support for JANET access to the Uni over the road.
Getting mail from SMTP (internal) to X25 (JANET) to X400 hosts was... interesting.
Sometimes, someone even managed to get a reply to me.
Just not very often.
(Hmm, maybe El Reg needs a "What did YOU do in the War, Grandad ?" icon...)
On or about 2002 a similar failure occurred at Yahoo when a new MX was added that inadvertently exceeded the max UDP packet size and thus caused all DNS queries to fall back to TCP.
The sudden influx of huge numbers of TCP connections swamped the DNS servers such that our update mechanisms failed when we attempted a reversion. We couldn't even ssh into them to make a manual fix. Furthermore the servers also stopped responding to UDP queries!
We had to get our data-centre folk to disconnect each server from the network so they could log in at the console and apply a manual fix to unwind the mess.
While no mail should have been lost it caused queries to *.yahoo.com to timeout for a number of hours so presumably many millions of dollars of lost ad revenue during that time.
Unsurprisingly the instigator of the change lost their DNS update privileges but they did keep their job and went on to become a VP.
Seems unfair that a reasonably unexpected failure of a routine update means that you're disqualified from maintaining that system in future. On the other hand, you probably would want someone else to do it in future, because you've used up your "get out of cockup free" card there.
I was summoned to the top people one day and told that henceforth I would be working on something called Novell NetWare. As the official support channel from Novell was via Compuserve I was given an account and a dialup modem.
I found it amazingly useful and you could do all sorts of things such as book flights electronically which was unheard of back then. Unfortunately the bosses forgot to tell me it cost something like 25 dollars an hour to use, with predictable results :(
Afterwards I found that if you waited until 7PM it "only" cost 7 dollars so I used to login at 7:01 and typed my fingers off trying to cram everything in.
I was told a similar story by an ex network-op of mine - as I recall (and feel free to correct me if I'm wrong) but it was something silly like Ghana (or at least, the state I was told about) having a very small netblock allocation as a country, or only having a single gateway in, or the whole lot being proxied through one server, or other similar 'brick wall' hard limit connectivity problem that up until then, hadn't been a problem.
Then 'something' occured that made their IP range get flagged up, the whole internet went down for the country.
I think the story I was told was about a Gulf state (Jordan? Qatar?) but it's apocryphal anyway, so it might be the same tale.
Anyway, reminiscence over, back to work, sigh.
'Something' occured that made their IP range get flagged up, the whole internet went down for the country
This sort of thing used to occasionally happen for the NHS. You'd end up having to answer captchas to use Google, and get snotty "suspicious usage pattern" messages from other websites.
I haven't seen it in a while, so maybe the proxies have been properly sorted out now. Who am I kidding.
My first modem was also 2400 bps, it was a 'loan' modem from where I worked as they'd just bought a 28.8kbs one.
I just used it for BBSs etc at the time. I remember one local computer shop (PCs, Amigas, Atari STs etc) had things like stock-lists on a dial-up BBS system. You could even pre-order items via the BBS, and then go pick them up. (No payment system etc back then). They even gave you a free Compuserve email address (wow! ;-) ).
Once I got proper Internet access (Demon), I went and bought a USRobotics external 56k modem. Was in an aluminium case, almost exactly the same size as an external 3.5" floppy drive. This was all on an Amiga, didn't get round to switching to PC till the late 90s!
My first modem was also 2400 bps, it was a 'loan' modem from where I worked as they'd just bought a 28.8kbs one.
Ohh, lookshoorey. My first modem was
banging rocks togethera bit of veroboard with a NE565, a NE567, a CD4066 and an opamp, plus a bunch of resistors, capacitors, switches and some trimmers, lovingly soldered into a device that could whistle 300 bits per second down a phone line, and listen to them coming the other way too. You needed to dial the number yourself on a Real Telephone. It's still somewhere in a box. Then a Racal-Decca 1200 baud modem, which too required External Dial Assist. A 9600 baud one, previously used for Remote Diagnostics at DEC FS, and finally a 28k8 V32.something before ISDN arrived with its whopping 128k (and still 64k when you got a call).
My first modem was also 2400 bps
Pah. The youth of today. My first modem was a 1200/75 non-autodial (about the size - and weight - of two housebricks!).
I well remember embiggening my parents phone bill by dialling to Almac BBS in Scotland (we lived in London). There was some... discussion about the on-line time and my parents took to occasionally lifting their phone extension in order to knock me offline..
"Well do I remember AOL. We had it back in the '90s. Used a 2400 bps modem to connect, and everything took FOREVER. My parents finally got fed up and bought a 28.8 kbps modem - and got no increase in speed. We rapidly switched to a local ISP, which was WAY faster."
Don't be silly. Things got very fast when I traded in my fast (300 bps) modem for a 1200 bps hot rod.
My first was a 600bps modem I knicked out of skip from a warehouse that was closing down, sometime around 1989. The modem had big rubber acoustic couplers and my Dad had to "borrow" a suitable phone from work that would fit the coupler pads! Then this thing called the "Internet" appeared around 1991, my mate at Uni said it was the best thing ever and...it was utter crap! Lots of boring pages through this thing called a browser, tons of very boring text only pages and you needed Windows. No pirated software, no message boards worth bothering with and I went back to BBSs for a anther couple of years as you could use BBS with software that ran on DOS without needing a whopping 4MB of memory to run Windows.
I thought it would be really cool to include a wedding address on my wedding invitations sent out in 1996. Nobody RSVPd by email of course because none of our friends had an email address at the time.
I still thought it was cool.
It wasn't AOL either!
My father-in-law, in the early '80s, sent an email to everyone on the 'net at the time, regarding the completion of a 9-month-long batch job. Included a few details such as height and weight, and ended it with "IT'S A GIRL!" - it was my wife's birth announcement. Came across the printout recently. I may frame it.
"There was no way we could have crammed all that traffic through a tiny 10Mbps Ethernet port." So he didn’t."
If he'd just used a regular sized Ethernet port (at least assuming it was UTP...) rather than some non-standard tiny one, he could be running 10Gbps by now...
I'll get my coat.
*The capitalisation is correct :)
This just sounds like they were on the bleeding edge of email systems, and something was going to die, somewhere.
The fact that they couldn't get a load balancer strong enough to handle their volume tells you something was going to give.
An honest attempt was made to address the issues, and it failed. Meh.
when I was a sysadmin at PSINet, we'd inherited some UUCP servers from the EUNet GB days. Despite it being a popular service with a big reseller, we didn't want to continue it, so we used Y2K as an excuse to shut it down.
Now, I'm kind of sad we did.
Posting anonymously because there are probably people still angry about that!
I did it with an SQL query. I’d been doing lots of fairly long (to run) queries without issue, and forgot busy Hour was actually going to be busy, that day (customer had just migrated a load of their subscribers but still, we weren’t expecting it to be so busy). The whole situation went like this:
1. Loyal AC starts the query running.
2. And pops out to the local Chinese chippy, for some chips and curry.
3. And arrives back to find the company’s directors (all four of the trilogy) in the server room at 5:30PM.
Turns out one of our processes wasn’t resilient to a rather important table being locked too long.
AC - that one doesn’t go my CV.
Back in that time, email in South Africa was only available to a select few - and we had to dial up to CompuServe (ick) or our local BBS in order to retrieve our email.
Then it started to gather momentum, and more people got firstname.lastname@example.org addresses - mine was ****@mweb.co.za for a long while, but when they got too uppity with their pricing, I went over to google.com
And today I'm contemplating the move away from google.com to something else where my emails won't be perused by bots in order to serve advertisements targeted at myself.
I remember that day and it's all Bert's fault! Well....no, really I don't remember that day. And given what I was using email for back then, I'm fairly certain that no negative impact would have found its way into my life. I might have missed a turn of two of my play-by-email D&D game (yes, play by email D&D in 1996. Have I mentioned I
was am a massive nerd?) but that would have been it.
I once did this when outraged by receiving SEVERAL UNWANTED SPAM MAILS at work. 1996 or early 97. So banged out a quick oneliner in shell to send BIGNUM emails (I seem to recall 10 million) back to him saying "Stop spamming me!" or somesuch. Then got on with my job, fierce with the pride of the righteously revenged.
Got a half-aggro half-WTF phonecall around close-of-business from our main sysadmin in Mountain View, CA. Our London boxes all connected to head office there, and thence via their ISP to the world.
Apparently I brought down the biggest silicon valley ISP's mail server. After apparently a lot of drama recovering, they'd given our man a serious shouty earful and a half for hosting a spammer in-house, and threatened to drop the company. :) Oops. All were soothed once I explained, albeit with some stern finger-wagging.
15ish years later the boot was on the other foot. Doing some part-time sysad work at a little ISP/hostedservices shop, and a BadActor cracked a client's website and kicked off a massive spam-spray (a real one this time). The queue was BIGBIGNUM and had filled the queue volume, all in the space of a few minutes. Too many files for "rm *" to work and the main man was having a major stress attack because he couldn't see a way out that didn't involve dropping services for the duration. Quick little shellscript along lines of "while true;do; DELFILES=`ls PATTERN | head 200`; rm -f $DELFILES;done " then sit back and wait for the race condition to resolve in our favour.
"Apparently I brought down the biggest silicon valley ISP's mail server."
That would have been Netcom.The only major crash I can remember Netcom having back then was a misconfiguration in the BGP code, pushed out to all border routers. Seems to me somebody fat-fingered a stray "&" where it shouldn't ought to have been ... But that was perpetrated by MAE-East, not San Jose. They also had issues with majordomo and bounce floods/loops in that time frame, but not enough to crash the system. Mail was already fairly robust by 1996 ...
Biting the hand that feeds IT © 1998–2020