back to article AMD demos 'Berlin' Opteron, world's first heterogeneous system architecture server chip

AMD will give the first public demo of its second-generation Opteron X-Series server processor, code-named "Berlin", at the Red Hat Summit in San Francisco on Wednesday. The company tells The Reg that the demo will consist of X2100 Series Opteron running a Linux environment based on the Fedora Project. Berlin is not one of …

COMMENTS

This topic is closed for new posts.
  1. phil dude
    Thumb Up

    computational density...

    if this chip can get to 5TFlop DGEMM DP, I will be impressed.

    I have been reading about the use of 3x Radeon 6990 which achieved 5.1 TFlops using OpenCL.

    http://devgurus.amd.com/message/1285375#1285375

    I mention it because the FP64 rate of those cards is 1/4 or 1/8 the FP32 rate. Apparently Geforce Titan FP64= 1/3 FP32.

    Are there technical reasons FP64=1/2 FP32 is not available from either vendor?

    P.

    1. Nick Ryan Silver badge

      Re: computational density...

      That's intriguing. It would seem to me that it implies that the FP64 processing is implemented using the multiple steps of the FP32 circuitry (splitting and then re-merging the values?) rather than native FP64 circuitry.

      1. phil dude
        Linux

        Re: computational density...

        thank you. It will be interesting to see the "opteron vs GPU" ratio of operational density, as clearly this will be a pricey chip....

        P.

        1. Steve Todd

          Re: computational density...

          It's not always faster to build wider ALUs, there is a propagation delay for the carry bits from one binary digit to the next for example. There are clever tricks that can be used to reduce/minimise that delay, but they cost silicon real estate. With GPU computation you have many cores, so you want to keep them as small as possible, so what you end up with is a compromise between size and performance.

          1. This post has been deleted by a moderator

            1. Steve Todd

              Re: computational density...

              You've forgotten about ripple effects with carry. An integer add of 1 can ripple all the way up to the MSB, and there's an extra delay as the signal cascades up each bit. The maximum clock speed for a 32 bit add is, from this, twice the maximum for a 64 bit (FPGA designers sometimes save space by doing the work on smaller hardware multiple times, but at a higher clock).

              You can bypass this with more complex circuitry, but back to my point of space requirements.

              I can't help with your prejudices against phrases that are common when talking about this kind of thing.

              1. Frumious Bandersnatch

                Re: computational density...

                Have an upvote. I can't see any reason why someone would downvote your original post, since what you say is perfectly fine (even using "real estate", which makes for a good analogy). The detractors should take a look at FPGA programming (eg, this free introductory course) if they don't understand (or want to know) why "wider isn't always better" as you put it.

              2. Anonymous Coward
                Anonymous Coward

                Re: computational density...

                "You've forgotten about ripple effects with carry. An integer add of 1 can ripple all the way up to the MSB, and there's an extra delay as the signal cascades up each bit."

                Cool that you remember this from digital logic design but you seem to have missed the subsequent lesson. Circuits can be designed to see if a group of bits will overflow into the next group of bits, and then these circuits can be arranged in a hierarchy such that the time for the LSB to "ripple" over to the MSB is a base-2 log of the number of bits. So e.g. an adder can easily be designed to add 64 bit numbers vs. 32 bit with only one extra gate delay and not 32 extra delays.

        2. Alan Brown Silver badge

          Re: computational density...

          These were announced 10 months ago: http://www.engadget.com/2013/05/29/amd-unveils-opteron-x-series/ and listed there for $64 (X1150) / $99 (X2150)

          The question will be what the X2150 brings to the party that's better than the Athlon 5350 (which is 1/2 that price) other than ECC support

          http://en.wikipedia.org/wiki/Jaguar_%28microarchitecture%29

          If you don't need the GPU (and most won't in a server) then go for the X1100s

      2. xyzw

        Re: computational density...

        It's because you need more transistors for FP64 to make "the same operation" as FP32.

        To make a simple example

        - a FP32 is [A]*[B] (where A and B are 32 bits quantities),

        - a FP64 is [CA]*[DB] = A*B + (CB+AD)*2^32 + CD*2^64 (A,B,C,D are 64 bits quantities)

        This means a (full) FP64 MUL requires 4 times more transistor than a FP32 MUL... or you can reuse some existing circuitry, and the cost of lower performance when using FP64 operations

        If you are manufacturing video games card, you only (mostly) care about FP32, but can give away some FP64 capabilities without hitting you cost.

        If you are manufacturing card for GPGPU in finance, it is likely your buyer will care mostly (only?) about FP64 capabilities and is "happy" (historically!) to pay a premium to get more transistors, i.e. more FP64 performances (it's not totally true today).

    2. Anonymous Coward
      Anonymous Coward

      Re: computational density...

      FirePro W9100?

      http://www.anandtech.com/show/7927/amd-launches-firepro-w9100

      "Double Precision 1/2"

      1. Iceman

        Re: computational density...

        I can confirm the W9100 is indeed 1/2 rate.

    3. Marco van de Voort

      Re: computational density...

      Personally I hope for the opposite. That uploading/downloading to the GPU no longer require more time than doing it directly on the GPU in the first place. (*)

      (*) I'm not thinking HPC here, but simple machine-vision applications.

      1. ToddR

        Re: computational density...

        You dont need to upload/download do you? Isn't it using hUMA, so GPU and DRAM is shared?

  2. btrower

    Crossed fingers, but...

    I am an AMD supporter from way back, but recent times have been pretty grim. Unless something dramatic happens I will be switching to Intel CPUs on my own workstations or maybe even something like ARM.

    What I would like to see is a Lego-style system that allows natural additions of whatever you need be it CPU, GPU, RAM, DISK, etc.

    In the 21st century we should not be stuck upgrading entire systems or scrapping perfectly good portions of systems because one portion of the system is not up to snuff.

    We keep bashing our heads against artificial architectural limitations because of the single moronic meme "That should be enough. I can't see why we would need more." I understand that for practical reasons it is difficult to produce 6 bit vs 4 bit, 8 bit vs 6 bit, 16 bit vs 8 bit, 32 bit vs 16 bit, 64 bit vs 32 bit, etc, but why are architects unable to see the pattern there? The fact that a chip designer just flat out cannot imagine why I would want a 16K bit register should not be my problem. Their lack of imagination should not be permanently baked into the architecture. Perhaps the implementations need to be crippled due to practical considerations, but the architectures and the APIs should not be dragged down too.

    Yes, there are timing and EM considerations, heat dissipation problems, fundamental limitations due to the speed of light, etc, etc. We may *never* be able to physically build some systems but our architectural designs should still not be assuming failure in advance.

    I have seen people argue strenuously in favor of the GOF 'singleton' as a valid pattern rather than a corrupt extension of global variables. It is a bad pattern from which significant evil flows: It breaks scope by definition and specifies a specific cardinality (typically 1) that creates havoc in future designs. Examples: mouse, keyboard, window, screen, desktop, CPU, thread of execution, directory, disk, etc. All of those have been deeply crafted into architecture in such a way that breakage continues to this day. If something is architecturally designed with a perfectly artificial limit due to the lack of imagination of the architect it will eventually break.

    We should have something akin to an architectural cloud whereby implementation and architecture are deeply separate such that scaling out to address spaces in the trillions of Yottabytes and well beyond is no problem.

    There has been in the past a frighteningly moronic argument that we won't ever need addressing beyond a certain point because it exceeds the number of all the particles in the universe. That made sense to too many people for my comfort. That which we can specify is effectively without bound. If your math microscope is powerful enough you can count more points than the number of particles in the universe between 0 and 1, 0 and 0.1, 0 and .000000000001 ... carve it as fine as you please and there are always more points there. There is a simple counting argument used for things like this. If all the particles in the universe are the number X and I wish to specify X+1, I need an address space larger than X. The argument can be repeated ad infinitum. We don't have a rule that the counting numbers stop at a googleplex because it does not make sense. All the artificial limits in the computing architecture universe make no more sense than specifying a particular 'top number' for counting beyond which you have to stop.

    </rant>

    1. Steve Todd

      Re: Crossed fingers, but...

      Eh? I'm not following your logic. It seems to be similar to that used by old mainframe users when the microprocessor came out ("but if it breaks how will you repair it?"). More and more gets integrated onto the CPU because (1) there is now space for it, (2) it makes little difference to the price and (3) the propagation delays inherent in going off-chip kill performance.

      As to whether you need 8, 16, 32, 64 or whatever size of register, that's down to the performance wall in clock speeds. You can't (easily) make the chip go any faster, but you can make it do more in one clock cycle by making the processor wider. You keep doing that until diminishing returns set in and the extra width ceases to be useful. 64 bit integer and (possibly) 128 bit floating point. Once you have hit those limits you can only multiply the number of compute cores, which HSA is one variant of.

    2. Dazed & Confused

      Re: Crossed fingers, but...

      >There has been in the past a frighteningly moronic argument that we won't ever need addressing beyond a certain point because it exceeds the number of all the particles in the universe.

      That is not a moronic argument at all.

      Where could we store the x+1th address?

  3. Arthur Jackson
    Coat

    Berlin

    With specs like these AMD are set to Cruse away from the opposition and become top gun in the processor world.

    Certainly took my breath away.

    1. AMBxx Silver badge
      Happy

      Re: Berlin

      That's just awful.

      Have an upvote.

      1. Chika
        Happy

        Re: Berlin

        hUMA-na-hUMA-na...

  4. Daniel von Asmuth
    Gimp

    heterogenous architectures

    Well, there would be the Intel Itanic that supported both IA-32 and IA-64 instruction sets. Will this AMD baby drive the next version of Microsoft XBox?

  5. Anonymous Coward
    Anonymous Coward

    Lots of potential

    AMD's new homogeneous APUs have a lot of potential that is just starting to show. They also have ARM based SoCs and x86 SoCs so they are looking better than many might expect. As usual AMD is thinking outside the box. Now it's up to their partners to focus the capabilities of these new products be it in consoles, desktops or servers.

  6. Tel

    But how well will it mine crypto currencies?

  7. W. Anderson

    Even if this new Chip technology has all the benefits and advantages stated, which I hope it does, AMD has lost considerable ground in recent years against Intel and now ARM, IBM Powr series Chip technologies in adoption and sales by the company's frenetic suckup and embarrassing pandering to Microsoft to the dismissal and ultimate detriment of enterprise Linux vendors and their close hardware partners to AMD, while Microsoft cooperated with Intel in shutting out AMD in most of Windows deployments.

    While I hope this could be a turning point for AMD Chip sales and deployments, the outreach to Linux vendors who by coincidence control most all Cloud computing, virtualization and big data analysis, "maybe too little, too late "as Intel has firmly and "consistently" established itself with Linux as a tier 1 partner and even contributed substantially to Linux kernel development.

    1. asdf

      >"maybe too little, too late "

      That could be said perhaps for Intel too. The world seems to be moving away from volume sales of high margin latest greatest cutting edge super powerful processors to cut rate commodity SoC as they are now good enough for most use cases. Stuff like APUs are for niche markets. Even in the cloud as you mention they are starting to look at ARM and shiver Atom instead of super powerful hot running chips.

  8. Anonymous Coward
    Anonymous Coward

    Once again it's AMD doing the innovation. Intel really suck.

    1. jason 7

      Indeed, if you look back most of the big 'pushes' over the past 10 years or more have been developed by AMD and not Intel. Sure Intel make nice powerful chips but...boringggg.

  9. Stevie

    Bah!

    See? "Berlin". Sexy, intriguing, mysterious. Not "Adventurous Aardvark" or "Batshit Badger".

    *That's* how to code name IT stuff.

This topic is closed for new posts.

Other stories you might like

Biting the hand that feeds IT © 1998–2022