back to article This supercomputing board can be yours for $99. Here's how

Adapteva, an upstart RISC processor and co-processor designer that has tried to break into the big-time with its Epiphany chips for the past several years, is sick and tired of the old way of trying to get design wins to fund its future development. So it has started up a community project called Parallella that seeks to get …


This topic is closed for new posts.
  1. Yet Another Anonymous coward Silver badge

    So the market is

    HPC but where power consumption matters more than performance and you don't need to transfer much data on-off the parralel processor. Can't immediately think of anything.

    As for the $99 price point, if you want to play -you can rent a Tesla GPU on Amazon for $bugger_all

    1. The Indomitable Gall

      Missed point.

      Err... you may have missed the point.

      1) They're suggesting that the power consumption is a barrier to wider adoption.

      2) Also, what the Reg didn't cover was the other barrier to adoption: parallel computing suffers from a lack of skilled programmers. The first computing revolution was powered by self-taught hobbyist programmers on single-processor boards. The developers believe that this has created a generation of single-processor-centric programmers without the skills for parallel work. They want to create a hobbyist scene for parallel processing and foment a skills revolution in the parallel computing sphere, which will then (hopefully) allow genuine parallel processing to become part of mainstream computing, as opposed to the minimalist OS-managed parallelism of current-gen multicore processors.

      Cynics viewpoint: what we have is a bunch of clever blokes who developed a clever processor and found that the people who could use it don't want it, and those who might want it couldn't use it, so they're repositioning it as a hobbyist teaching toy.

      Optimist's viewpoint: a bunch of clever blokes developed a clever processor that solves a clever problem, and finding that the market couldn't take advantage of it, they decided to try to develop the market by themselves.

      1. Yet Another Anonymous coward Silver badge

        Re: Missed point.

        There are a lot of clever engineers inventing clever CPUs that ended up doing nothing in the market - the Transputer, the Transmeta, Intel i860. We spend most of the 90s fending off salesmen from the latest "direct compile into hardware will solve all your problems" outfit.

        If you want to encourage hobbyist programmers into parralel computing then an Amazon account on a Tesla is better than a $99 homebrew board

        1. Eddie Edwards

          Re: Missed point.

          There is no "parallel computing". Like cancer, there are many varieties, and they're all different.

          GPU programming is kind of a solved problem, and has been for a while. To a first and second approximation, It's just very wide SIMD. The kinds of things you can do well on GPUs have been done well on GPUs already.

          This is a MIMD unit for learning MIMD programming, which is very different in approach and in what can be efficiently programmed (available local store is very much increased, for instance). The things you can do well on MIMD + fast interconnect are pretty much unknown so far, except in theory.

          I mean, sure, hobbyists can learn last decade's techniques, but it would be much more fun to learn next decade's. And it's not like a hobbyist doesn't already have a GPU he can play with anyway. But he probably gave up on that because shoe-horning stuff onto GPUs that doesn't fit is kind of tedious.

          The only problem I can see with this particular MIMD chip is it's not even competitive with a modern CPU, so why bother?

  2. Anonymous Coward
    Anonymous Coward

    NIce concept I reckon

    But if they are after developers, why do developers have to pay more for the SDK, or have I misread the article?

    As someone who works on GPU's, it seems to me that some GPU's already have minimal instruction set cores on them for particular tasks - One I work on which has a couple of RISC like vector cores, but additionally multiple other very RISC like other cores with tiny instructions sets. Not arranged like this concept though, but similar in other aspects.

    1. vagabondo

      Re: NIce concept I reckon

      > why do developers have to pay more for the SDK, or have I misread the article?

      It says "you get early access to the Parallella SDK.", i.e. Feb rather than May.

      There are actually more options available. Look at the column at the right of the Adapteva Kickstarter page:

  3. Mage Silver badge

    Intel & IBM & MS

    This idea goes back to the 1970s. Early implementation in 1980s with Transputer.

    The x86 PC, MSDOS, UNIX, Windows and Linux have held back parallel RISC SW development and mainstream adoption for 20 years.

    1. Mage Silver badge

      Re: Intel & IBM & MS

      Occam was a good idea for programming stuff like list. As was Co-routines in Modula-2

      Instead we have been stuck with the portable Assembler for porting UNIX, aka C

      1. Captain TickTock

        Re: Intel & IBM & MS

        Occam was "mathematically proveably correct" at the cost of being very static. This made it ok for fixed embedded tasks like radar processing, but very difficult for anything more general purpose, unless you built your own dynamic memory management on top of it, like our project did, or used the C compiler and libraries.

        By the time they ironed out the H1 (or T-9000, or whatever the next big thing after the T-800 was) Intel x86 performance had left it in the dust, and has done until now, when the mainstream is running out of ways to improve it economically, and are being forced back into considering these old ideas.

        The article is right. What is holding back progress is ultimately soflware to take advantage of all that horsepower. Put enough cheap hardware into the hands of the hobbyist masses, and we should see some interesting things come out of it.

      2. Mage Silver badge

        Re: Intel & IBM & MS

        I think some people that have never done parallel programming don't understand the differences between a language where the compiler or run time time takes care of co-routine / thread / process to core mapping and a language that only has libraries for the host OS (NT, LInux, Unix etc). Or don't understand the difference between OS based Multiprocessing and Language based. Or between multi-core and distributed system.

        Intel's x86 multi-core are not hugely different to a Motherboard with 2 or 4 single Core CPU sockets. In both cases NT or Linux hard sees the difference. Most actual programs can't tell the difference at all and only run on one core. Beyond 4 cores the speed improvement is marginal for NT Windows or Linux.

        Systems with 100s to 1000s of cores (ideally each with enough RAM to run instructions in parallel without a memory bottleneck) need a different way of thinking and programming to take advantage of them. While people have done clever stuff with patched custom Kernels and exotic C and C++ and Fortran compilers and clever libraries, that's a dead end for huge numbers of cores/cpus.

        While all this is well known to those working in HPC, Parallel systems etc for 30 years, it seems not well known even to many programmers.

        1. Destroy All Monsters Silver badge
          Paris Hilton

          Re: Intel & IBM & MS

          Whatever happened to the Parallel Inference Machines and PARLOG and whatnot of the 5th Generation Computer Project (back in the early 80's, the time of SDI and War Games). Sank without a trace?

          1. BlueGreen

            Re: Intel & IBM & MS

            It's a good question. I guess that most people couldn't be bothered with learning anything unfamiliar so these languages didn't get critical mass and C won by default.

            Although these languages did have drawbacks such as very high overheads which when coupled with low processing power at the time and what sometimes appears to be an indifference by academics to efficiency (mathematical purity of semantics is great but not enough) didn't help at all. Throw in utter inertia by industry as well, and the fact that very few people had access to anything resembling multicore so naturally parallel languages were wasted on them.

            I do think functional programming is starting to stick it's little-bird beak out from the shell at last.

    2. Roo

      Re: Intel & IBM & MS

      You need to remove UNIX & Linux from that list of things holding back 'parallel RISC SW development' because that is utter bollocks. In fact the UNIX community has invested a vast amount of effort into parallel computing over a very long period of time, here's four vendors off the top of my head that have shipped production boxes with a shed load of cores running a UNIX variant : IBM, SGI, Cray, Sicortex.

      OTOH if you are expecting the mainstream/Windows community to "grok" it you will be waiting a very long time - many of them are currently following the fad of running one app per (multitasking) OS image and running multiple OS images on a box under VMs. The problem with our mainstream brethren is that they keep getting bigger faster single cores to play with so they don't feel the need to progress. Truth told that ain't really going to change - although single cores probably won't get a lot faster, but they will get smaller & burn less juice - so folks can get slimmer iPhones next year.

      I think some progress has been made in my lifetime, it's slow though. There are a lot of distributed systems out there now, and the push to do this "Cloud" crap is dependent on getting distributed systems to work... Personally I dislike the Cloud stuff, seems like a waste of perfectly good hardware.

      I'll get my coat, it's the one with the Transputer Instruction Set manual in it. :)

      1. Christian Berger

        Re: Intel & IBM & MS

        Unix is great for distributing multiple processes onto multiple CPUs, but the C/Unix System (and I believe we should look at both in unison) sucks at distributing a task into multiple parallel processes. If you write a loop in C, it is a single process. Distributing it among multiple processes is hard there, the compiler cannot do it for you.

        FORTRAN for example can analyze your program much more easily. Therefore more FORTRAN compilers are able to distribute your loop among multiple processors.

        Then there are special parallel languages which behave like tiny lightweight processes strung together by pipes. Those make it quite easy to run your software on 10k cores.

        1. prefect42

          Re: Intel & IBM & MS

          C as a language simply doesn't deal with parallel, but that's hardly unusual. I'm not really convinced Fortran has a leg up though, as the two most common ways of parallel programming in Fortran are OpenMP and MPI, both of which are equally available with C.

          1. -tim

            Re: Intel & IBM & MS

            C required nonstandard keywords as hints to the early multi-CPU attempts to get it to work across cores. There were hacks such as a replacement for statement such as parallel_32(a=0;a<1024;a++) ... which would run the loop on 32 cores (a=0..31 on 1st, a=32..63 on the 2nd etc) but you have to be very careful about memory and variable scope since the system had to ensure that the code, heap, stack and malloced vars had been moved to each core. You would run into problems with a non reentrant libraries and the thing would just die if any system calls were used.

            FORTRAN on the other hand had a history of vector processing that worked very well if you followed the normal procedures and the compilers could move looped code onto several cores while dealing with a much simpler memory model.

            /Mines the one of a ray traced self portrait of a 286 intel hypercube in the pocket

  4. Anonymous Coward
    Anonymous Coward

    It's the Transputer all over again.

    1. Anonymous Coward
      Anonymous Coward

      A Great British Invention

      Sadly screwed up by incompetent Government, under investment and bad management. Sums up the last 50 years of UK technical prowess.

      1. Anonymous Coward
        Anonymous Coward

        Re: A Great British Invention

        Did you try making a product with it? It was both years ahead of its time in concept, and years behind its competitiion in delivering actual usefulness. It would have been a waste of tax payers' money to shore it up, because the developers of the Transputer themselves were not focussed on making something that could be economically used in making competitive products.

        1. Pete 2 Silver badge

          How screwups happen

          > the Transputer themselves were not focussed on making something that could be economically used in making competitive products

          Which is exactly WHY the transputer team needed some intervention. It's all very well being a hairy-arsed techy, but for every HAT you need someone to turn the technical solution into a marketable product - and then someone else to actually make and sell the gubbins at a reasonable price.

          It's not realistic to expect people who wave soldering irons around to be able to commercialise the fruits of their labours, nor for them to know what the "market" will be looking for in the next year or two. Those are the areas that needs helping - not the scientific innovation. Fortunately, a lot of universities have woken up to this and a lot of them are getting good at turning academic developments into commercial products. Sadly, they do seem to be hampered by lack of funding, archaic interactions with government and an inability to find and cultivate people who know how to make items by the million.

  5. Roland 2

    What about memory bandwidth ?

    Most HPC real world workload require significant memory bandwidth to keep the core busy.

    Even more so for Hadoop style parallel processing.

    How much memory bandwidth does the 64 core part have ?

    1. Mage Silver badge

      Re: What about memory bandwidth ?

      Seriously multicore systems need a lot of memory per core. This is advantage of fine grained muliple core RISC etc vs 1 billion transistor Chipzilla coarse grained multiprocessing cores. You have space for RAM for each core on chip. With coarse grained cores using OS threads, you have a serious memory bottleneck beyond 4 cores. More than 8 off x86 cores is silly unless you have multiple external buses to multiple RAM.

    2. BlueGreen

      Re: What about memory bandwidth ?

      Endorsed. Thanks for asking for me.

      re. @Mage "This is advantage of fine grained muliple core RISC etc vs 1 billion transistor Chipzilla coarse grained multiprocessing cores. You have space for RAM for each core on chip. " that doesn't add up. However you achieve your computation, ants or elephants, lots of it needs lots of memory, typically.

      Another way: 100 tons of ants or 100 tons of elephants need lots of food.

  6. No, I will not fix your computer

    It's all about timing

    While it may well be true that the thermal envelope is stopping a revolution, it would only take a traditional CPU revolution to stop the potential revolution (again), transputer would have been a big hit if it was *needed* but no, we shrunk dies, upped transistor counts and upped cores (along with exponential memory increases), so we didn't need to "think smarter".

    It's certainly possible that we'll need something of a revolution soon, but that could well be moving from Silicon on Insulator to diamond or graphine based chips instead, it will push the thermal envelope to new boundaries, 50Ghz chips etc. and RISC goes back to niche until we approach the boundries again.

    1. Mage Silver badge

      Re: It's all about timing

      Actually the problem with Transputer was it needed a different way of programming and thinking. Traditional OS and programming and libraries and existing applications can't be sensibly run on such highly parallel systems.

      So not about the timing. Transputer was only ever a niche product in a market that Intel with hugely more resources has struggled with. If the Transputer had been an Intel or US invention, animation Render Farms would be running Transputer based 1000s of cores per chip and 1000s of chips per rack today.

      1. No, I will not fix your computer

        Re: It's all about timing

        You're missing the point, it's *need* I'm stressing here, you're absolutely right that *a* big problem with the transputer is that it's fundamentally different, but it's only *a* problem fitting in with the existing paradigm, if there's some crossover possible, to allow migration (a slot in RISC accelerator, like a slot in GPU) and apps start to appear for it, it's a hideous chimera perhaps but it's still a step, like using PS3's to crack encryption there will be some niche markets which will expand.

  7. Sir Alien
    Thumb Up

    If I were AMD...

    If this article and the technology hold true then if I were AMD I would so buy this guy and his tech.

    AMD x86 processor for everyday things like running Windows/Linux/Unix and then use this as a 4000 core coprocessor. This would blow both current and intel chips out the water for floating performance. This would be a good market seller to multiple markets looking for raw parallel performance.

  8. This post has been deleted by its author

  9. Alan Brown Silver badge


    The problem with occam isn't the language.

    It's that humans can't write parallel code very well as well as race conditions developing when the results of several parallel operations depend on each other.

    This isn't a problem for embarassingly parallel problems (Which is what i have to deal with in dayjob), but in a lot of instances the fundamental issue is a marked lack of programmer ability/flexibility. Most people simply can't wrap their minds around the complexities of tight parallelisation and those who can are unsurprisingly expensive to hire.

    BTW, 35 instructions is somewhat more than the CHIPOS command set I cut my teeth on 35+ years ago.

    1. Graham Bartlett

      Re: Occam

      +1 on that.

      As a one-time Occam/Transputer coder though, the main issue isn't race conditions and livelocks - it's deadlocks. A waits on B, B waits on A, neither gives way. But finding the silver lining, it's totally possible to statically detect this at compilation and warn the coder about it. At least, it is if you use a language like Occam where you can easily see what's going on. By the time you've baked the parallelism into a ton of nasty interface code though, it's all pretty gnarly and you're basically on your own.

      1. Nick Ryan Silver badge

        Re: Occam

        "gnarly"... now that's a word that I've not heard for a while. May have to drop it in randomly from now on... :)

  10. Tom 7 Silver badge

    If I can get one of these in the UK without fin about

    I think I might go for some of it.

    There are actually quite a lot of things that could really benefit from simple multiprocessing like this. And they're already out there. And they're not Office of any form.

  11. matt_p

    Erm, why don't I just buy a GPU?

    From your article it looks like they have just re-invented the GPU and are desperately trying to distance themselves from them.

    GPUs have more or less the same architecture, sure they might be a little bit more power hungry, but they are here and now and everything you need to start programming them is available for free and backed by the biggest players in the industry. Anyway, sit tight and the power/watt will get better and better each year.

    In the table of performance they seem to have picked a pretty slow GPU, perhaps something I'd find in a laptop. Even a small amount of cash will buy you a GPU with in excess of 200 cores and a bit more money will take you up to 2000+ cores.

    Need $10K to get going? Rubbish, all you need to do is download a CUDA toolkit and head on over to Amazon.

    1. melt

      Re: Erm, why don't I just buy a GPU?

      That's nice, but hobbyists seem to like some actual hardware to hold in their hand, which runs out-of-the-box.

      Look at the success of the Raspberry Pi - to play around with Linux you could have rented a VM from anywhere, or set up VMware Workstation, but somehow the act of having something real to hold in your hand has sparked a bit of excitement.

      Look at the success of the Arduino - before then sure, grab a PIC or a Stamp and code away, learn everything from utter bare-bones, but somehow the act of having something real to old in your hand that runs out of the box with a cross-platform IDE and some good real-world examples seems to have sparked a bit of excitement.

      These guys have got their heads around the first idea; if they get the second idea sorted they might well spark another Saturday-afternoon makers' revolution.

  12. Harry Kiri

    Its not about the chip...

    The issue is the basic paradigm - not all problems map well to parallel. It introduces new problems, bottlenecks and deadlocks. Introducing parallel threads/cores does not give a proportional increase in performance so the whole effort can be dubious.

    What is needed are better tools and approaches as opposed to more hardware. These tools need to identify whether parallelising is worth the effort.

    And they've been in the business 4 years? Wow.

  13. Nick Ryan Silver badge

    Parallel programming's not hard (at least not for me), but it's not taught well either and while there are quite a lot of very good tools for it the mainstream ones, such as MS Visual Studio, do not provide the level of support that is required to really do a good job. If you've used the Intel Parallel (studio) tools, you'll see just how good the tools can (or should if polished) be - they're not perfect but for anybody that's attempted to debug parallel problems in Visual Studio (or just multi-process/threaded apps), you'll appreciate the difference against plain old Visual Studio.

    For years the mainstream of Windows/PC has been single thread algorithms running on faster and faster processors, the transition to a mind set where you can distribute load is quite a shift in a way of working for the majority of developers who are used to A > B > C > D code and nothing else. Not that there will ever be no need for this kind of code, and even in massively parallel systems there's a need for it, it's just that it is not the only way to do it. Just the switch to A > [B|B|B|B|B|B|B|B|] > C > D is enough to cause many developers to run for the hills.

    But then I bought a dual processor (real processors, not cores) AMD-M board as soon as it was available and started playing with that years ago. I also vividly remember (even more years ago), rather upsetting a Uni lecturer who taught both neural networks and concurrent programming courses by inferring that he was a bit of a plank for not using double buffering in his neural net simulations: with a moderate increase in storage requirements (not really a problem even then), the solution removed almost all of the coding stupidity and algorithm induced problems in his network simulations.

This topic is closed for new posts.

Biting the hand that feeds IT © 1998–2022