back to article Oracle boasts zettascale 'AI supercomputer,' just don’t ask about precision

Oracle says it's already taking orders on a 2.4 zettaFLOPS cluster with "three times as many GPUs as the Frontier supercomputer." But let's get precise about precision: Oracle hasn't actually managed a 2,000x performance boost over the United States' top-ranked supercomputer — those are "AI zettaFLOPS" and the tiniest, …

  1. Justthefacts Silver badge

    And yet, the predictability of performance improvement…

    So, here’s RFC1607, written in 1993 by Vint Cerf. Extrapolating thirty years to 2023 to the “CERN Exaflop array”. Everybody laughed, of course, because it’s obviously ludicrous to extrapolate Moore’s Law by half a lifetime. Except, the first Exaflop did indeed arrive right on time, in 2022. Unreasonably accurate.

    https://datatracker.ietf.org/doc/html/rfc1607

    1. Anonymous Coward
      Anonymous Coward

      Re: And yet, the predictability of performance improvement…

      At least he fesses-up right at the beginning that this 30-years-advanced knowledge comes from "a reverse time-capsule apparently sent from 2023".

      That really explains the accuracy IMHO!

  2. HuBo
    Go

    Round and round it goes

    Hard to keep up with the Joneses in this race for the biggest AI machine, with Nadella's MS Azure Eagles at 14K H100s each, with 5 of those built per month it seems (72K GPUs/month), Zuckerberg's Meta Grand Tetons at 50K H100 for two, Musk's xAI Colossus at 100K H100s to be upgraded soon-ish to 200K H100/200s, and now Ellison's Oracle Zettascale AI Supercomputer (OZAIS?) at 131K Blackwells (equiv. 210K H100s), phew! ... But only until that Altman/Nadella 5GW million GPU death-star-gate project emerges ... and the whole cycle starts again!

    It's great business for Nvidia, but "between 5.2 and 5.9 exaFLOPS" of FP64 HPC-oriented oomph (while likely better than some of China's secret supercomputers) is not very much for a machine the size of "OZAIS". AMD's MI300s would boost that up by 3x or 4x I think (with similar AI performance, except that related specifically to the convenience and performance of CUDA).

  3. Claptrap314 Silver badge
    Facepalm

    FP4 ???

    Okay, so is double precision 64-bit? Because this article seems to think that it is, and this is a (bit) of a problem. (DP is in fact 53 bits of precision, (one is implied), a sign bit, and 11 bits for the exponent). So FP4 would be... a sign bit, + two bits for the exponent & two bits for the mantissa. Here's the map:

    1111 QNAN

    1110 -inf

    1101 -3.0

    1100 -2.0

    1011 -1.5

    1010 -1.0

    1001 -0.5

    1000 -0.0

    0111 SNAN

    0110 +inf

    0101 +3.0

    0100 +2.0

    0011 +1.5

    0010 +1.0

    0001 +0.5

    0000 +0.0

    Yeah, I'm a mathematician, but I would be hard pressed to do anything with such a representation.

    1. Kevin McMurtrie Silver badge

      Re: FP4 ???

      4 bits can good enough for AI if there are enough parameters to make the data essentially sparse.

      The big stretch is calling it floating point. At 4 bits it's just a lookup table. You can hard-wire all the math so it finishes in one clock cycle.

    2. Bebu
      Windows

      Re: FP4 ???

      "mathematician, but I would be hard pressed to do anything with such a representation"

      Without any idea at all about what FP4 would be useful for or about LLMs generally, I would punt that you could use FP4 numbers as weights for a very large collection of linearly independent vectors (I suspect neither orthogonal or normalised.)

      Like Camelot, AI is best avoided as 'tis a silly place and hopefully equally ephemeral.

    3. Anonymous Coward
      Anonymous Coward

      Re: FP4 ???

      In "Ultra-Low Precision 4-bit Training of Deep Neural Networks", the IBM Watson folks who invented FP4, describe it as:

      "The radix-4 FP4 format with [sign,exponent,mantissa] = [1,3,0] is essentially a logarithmic format (4ⁿ) that spans a range of ±4³ (= ±2⁶) and can represent (scaled) gradient values as small as 4⁻³ (= 2⁻⁶) (Fig. 2(a))."

      ... aka, it has nothing to do with your math-degree-from-a-box-of-Cracker-Jacks in-depth analysis ...

      HuggingFace notes that an E2M1 (and sign bit) 4-bit FP format may be used too, with the caveat: "In general, 3 exponent bits do a bit better in most cases. But sometimes 2 exponent bits and a mantissa bit yield better performance".

  4. Pete 2 Silver badge

    More power

    > a 2.4 zettaFLOPS cluster

    Is that the new minimum specification for running Oracle software?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like