back to article Discord details how it dodged latency with a super-disk made in the cloud

Chat platform Discord delivered a playful slap to Google yesterday with a post describing how the company dealt with "reliability issues" to achieve some impressively low latency. Discord deals with 4 billion messages sent through the platform per day by its millions of users. The company runs a set of NoSQL database clusters …

  1. Bitsminer Silver badge

    "behave in ways that are nothing like their physical datacenter counterparts"

    Things are not what they seem.

    Pink Floyd - Sheep

  2. Anonymous Coward
    Anonymous Coward

    Let us not forget that issue is not a valid synonym for problem. Its an Americanism that has spread its way across the Atlantic.

    1. Anonymous Coward
      Anonymous Coward

      Not sure where the downvotes came from. Makes sense to me.

      Quite a few service desk calls are things that users are unable to do that turn out to be matters of config or product understanding. Meaning they are "fixed" without any underlying changes.

      Only last week, I worked on a "problem" that arose from a subtle misunderstanding about what a display was showing. Once it was understood, no further action was needed. So an issue, not a problem.

      The most important thing was to record the incident so successors and other colleagues also understand it.

  3. Colin Bull 1
    Happy

    Why not RAID 10 on SSD

    Back in the day when 4GB memory was about £4k, we would raid 10 our data on live services because a good controller would allow writing to one of the pair while reading the other and because reads do not need use any locks would be extremely fast. This was more expensive than RAID5 or 3 or 4 but much more resilient, allowed for different batches of disks in a mirror pair and was a magnitude faster at replacing a failed disk.

    1. Roland6 Silver badge

      Re: Why not RAID 10 on SSD

      Well, the article isn't particularly clear, but I suspect the servers were on GCP, so the SAN options are limited to what Google offered.

      It does seem the team at Discord had a hard lesson in fundamentals of design for transactional performance, with solution design being made more difficult by the use of cloud.

  4. Anonymous Coward
    Anonymous Coward

    Since when do SSD drives have bad sectors? We run thousands in our organization and have never had an issue with a single one.

    1. DougMac

      Because they auto-remap the bad sectors behind your back.

      Now the remapped sector has to have an additional lookup command adding latency.

      You can run out of the spare sectors that they hide from you as well.

      This is one reason that disk wipe software had to develop special methods dealing with SSDs, because wiping all *active* sectors doesn't wipe *all* data off the disk.

    2. Joe Dietz

      Not particularly relevant to the tech stack in question, but remapping around bad sectors is a very old problem and an SSD is no different. NTFS for instance has had a feature since NT 3.5 era where if there is a write error, it will mark the sector as bad in its internal tables, remap the LBA to another free block and perform the write again. Doesn't help for read errors obviously.. .but its one of those little things that if you start to see this happening in the event log... you got a disk on borrowed time. Then came SMART which basically does the same thing at the hardware level, again if you notice these events, your disk is on borrowed time. These behaviors are great... right until they aren't since its easy to ignore/never see the events that tell you of the pending failure.

      1. Nifty Silver badge

        Every HDD that's failed on me has done so with warning enough - slowdowns and noisiness - that I've managed to back up and migrate to a new drive in time.

        Meanwhile every SSD that's failed has done so catastrophically. Though in one case it was just the boot sector of the SSD, that was bad enough.

  5. Sin2x

    Plain old mdadm RAID is a super-disk nowadays? We sure are degrading quick.

    1. TheWeetabix
      Coat

      I see what you did there, sir. Time to go!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like