back to article Reno 911: World's largest reboot underway

You were up all night writing those last lines of code to ensure mega-demonstration success. And this code is a real pain in the ass to deal with because it has to spread across a 72-processor cluster. But, with a bit of perseverance, you nail it, pop open a beer and wait for the glories to follow when show attendees see your …

COMMENTS

This topic is closed for new posts.
  1. Brian Miller

    Give me more power!

    "I'm sorry, Captain, but the Energizer bunny is dead!"

    I suppose they will select their next exhibition site near a large hydroelectric dam, like Grand Coulee or Hoover.

  2. B. Gagne

    Same as it ever was

    This fairly well characterizes the governing mentality of Reno, in my experience.

    It would not surprise me in the least to learn that the primary individuals providing support for the occasion were operating under the asinine assumption that the equipment was powered by lavender farts and Pixie Stix.

    Or perhaps, in the case of someone with a more technical bent (defined by having eavesdropped on one or two conversations between maintenance personnel), a belief that, "it all runs on 'quantum', which they will necessarily be supplying themselves, as we're obviously not set up for that sort of thing."

  3. Brian
    Coat

    homage

    Are we sure this isn't just a giant staged homage to Scotty...

    "we canna do it cap'n, we don't have the power!"...

  4. Anonymous Coward
    Anonymous Coward

    UPS anyone?

    Unless I misunderstand, this is a load of "disparate" computer systems all linked together in one fsck-off blg cluster, each powered separately (apparently off the same source).

    We're also led to believe that the power tripped for a "few seconds".

    Surely not ALL the boffins contributing to this forgot a damm UPS!

  5. lglethal Silver badge
    Black Helicopters

    Your all wrong...

    This was merely a test of the hold of these computer overlords over there human subjects. They organised to all simultaneously pretend to turn off (and someone just happened to lean on the light switch at the same time!) so they could see how quickly there servants went into panic mode and tried to save them.

    Their next step is to convert the humanoid slaves into walking meatbag shields! We must act quickly my brothers (and sisters) for the sake of humanity! Kill all of the supercomputers and there gullible slaves!!! KILL KILL KILL!!!

  6. Chris Kuethe
    Flame

    More from the floor...

    I'd first like to respond to anonymous coward's observation about UPSes - there may have been a conscious choice to not bring one. I/we chose not to.

    I'm competing in the SC07 Cluster Challenge on the University of Alberta's team. One of the rules of the competition is that our clusters cannot draw more than 26A on a 30A circuit - actually it's13A on two 20A circuits. When loaded, our SGI Altix XE310 is extremely close (>0.25A) to the circuit limit There is absolutely no room in our power budget for the conversion waste in a UPS. I took a quick poll among the other teams, and we're all pretty much in the same position - there is no room for a UPS.

    Today's power outage has been a lesson and/or a reminder about disaster recovery. Software needs to be tolerant of hardware failure. Like it or not, there will come a day when the lights go out, and you will need to be able to cope...

    And cope we shall. We sustained no hardware damage from the power loss, though some data was corrupted. By and large, all our schedulers began feeding jobs back into the cluster as soon as power was restored. Applications like PovRay can be easily restarted after the last successful work unit. Some of the calculations implemented by GAMESS can be restarted with minimal duplication of effort, others do not restart so easily. I didn't hear of anyone able to restart POP jobs.

  7. Anonymous Coward
    Happy

    More from the audience...

    Arrr Captain! I do believe we can just scrape across yonder sandbar safely - if we jettison the anchor, longboats, food, water, cargo, guns, passengers and most of the crew...

    Very well mister, carry on - and bring me another Cabin Boy, this one be burst.

    No UPS - right.

  8. Anonymous Coward
    Anonymous Coward

    Oopsie !

    Don't you just get sick of technicians telling you that you can't do something ?

    Damned electricians saying that there needs to be a safety margin and tolerance allowances !

    Naysayers bedamned !

    Strong management and positive mental attitude is all that's needed to make sure that everything happens your way.

    Just pop a kettle on behind the stand, I want a coffee.......

    Maybe the exhibition centre will get some advice next time.

  9. Anonymous Coward
    Anonymous Coward

    There is absolutely no room in our power budget for the conversion waste in a UPS.

    But, the UPS could smooth the draw? Flattening peak loads which are presumably measured at the plug... Hmm.. shouldn't they be measuring total power drawn during the contest rather than peak power?

    If you can make your own supplies then you could combine the two, then there's only one conversion loss. I guess you try run off the shelf kit though...

  10. joe

    indeed

    Even when the systems are on their home turf, many academic centers operate their clusters (especially those based on commodity hardware) with no UPS protection at all - sometimes it comes down to being able to buy a cluster+UPS or being able to buy a cluster that is 2x as big. Assuming you've got even remotely reliable power, for this application the tradeoff is easy. Yes you'll lose jobs in progress when the power goes out, but even if that results in a few days of lost compute time the numbers are still in your favor if you can double your compute resources.

    Some sites have a dynamic UPS (otherwise known as a "big spinning heavy thing attached to a generator") which provides a few seconds of power via inertia--enough to ride out 99% of power outages at a much lower cost than a "real" UPS. This sort of device would be kind of impractical to bring along to a trade show, though the potential for things to go horribly wrong might add to the excitement and increase attendance.

  11. tim
    Thumb Up

    Appropriate advert

    Looks like the advert optimising script is working, the comments page has the IBM advert with the guy pedalling a bike/generator (for me anyway)

  12. Anonymous Coward
    Alert

    Timing

    Funny timing for this story with regard to my current workplace where management recently realised we're drawing more current then we're supposed to... Cue sweltering-hot offices as all the meatbags' ACs are turned off and the frantic building of a sub-station outside...

  13. Graham Bartlett

    @JonB

    These things are mostly working flat out on protein-folding, PovRay or whatever. It's not like your desktop machine which can take a break whenever you're typing - these things are doing big number-crunching non-stop. So what would be a brief spike in processing on your desktop machine is business-as-usual for what they're doing here.

  14. De Zeurkous
    Flame

    And this code is a real pain in the ass to deal with because...

    ...it has to spread across a 72-processor cluster.

    WTF? If your team finds parallel programming harder than pure-sequential, it should be disqualified.

  15. Anonymous Coward
    Anonymous Coward

    They're still low power machines

    I remember when I was at Uni the CS department was always freezing cold. The reason? When the department was built there were a number of valve-based machines in the basement that drew a hell of a lot of power and as a consequence kicked out a hell of a lot of heat. The idea was simply to recirculate the heat generated around the entire building instead of having an actual boiler.

    Apparently it worked fine until they replaced all the machines with silicon types that put out as much heat as a domestic light bulb.

    This must surely be the acid test of a computer's power requirements. Can it replace the central heating system?

  16. TeeCee Gold badge

    @joe

    "Assuming you've got even remotely reliable power....."

    Ah! Assumption. The mother of all cockups.

This topic is closed for new posts.

Other stories you might like