back to article Any fool can write a language: It takes compilers to save the world

Here's a recipe for happiness. Don't get overexcited by the latest "C is not a language" kerfuffle. Proper coders have known since its inception that C is as much a glorified library of assembler macros as anything else. Don't sweat it. That business with operating systems being infected by their old C genes, crippling all the …

  1. fg_swe Silver badge

    Too Scary, Too Complicated

    As with every nontrivial technical endeavor, one should apply the KISS principle. Demonstrate your innovative idea in a way that is as straightforward as possible. You can always perfect it later.

    What does this mean ?

    Don't use the gcc infrastructure, instead generate C or C++ code and let existing compilers do the optimization and architecture-specific work. Debugging a compiler that emits C++ is way much easier than debugging ASTs and machine code. As an added benefit, your compiler will be able to run on almost any CPU, because almost any CPU has a C++ compiler.

    This is what I did here

    http://sappeur.ddnss.de/

    The innovative aspect is that this language is a memory safe C++ variant. All the goodies of C++ plus single- and multi-threaded memory safety, due to a type system which "knows" of single- and multi-threaded sections of the program.

    There exist Eiffel compilers which use a similar concept of emitting memory-safe C.

    1. fg_swe Silver badge

      GCC vs C++ / ELBRUS

      With the mentioned approach; the compiler can also run on ELBRUS, which does not have a gcc compiler. In fact, ELBRUS has a secret instruction set, but that does not stop me generating Sappeur programs for this CPU.

    2. TeeCee Gold badge
      Facepalm

      Re: Too Scary, Too Complicated

      So.

      You took an article based on "It doesn't matter which language you use, compiler technology is where it's at", with sound reasons why so. You turned that into an advert for your favourite niche language and how you never think about compilers anyway.

      Just one question: Exactly how far above your head was the entire point of the sodding article when it passed you by?

      1. Anonymous Coward
        Anonymous Coward

        Re: Too Scary, Too Complicated

        I haven't worked out if it's the same language that was last updated on Sourceforge ten years ago, or a later version, or a completely different language, but the website has a commercial licence for it, so all these messages could be classified as spam.

        1. fg_swe Silver badge

          So ?

          I am not interested in "own nothing and be happy".

          1. Anonymous Coward
            Anonymous Coward

            Re: So ?

            As it is a commercial product I'm sure El Reg would be happy to agree terms for advertising or a sponsored article, because judging the downvotes I'm not the only one who finds every C/C++ article getting derailed by tenuous segues into this language is spoiling their enjoyment of this esteemed organ.

    3. Cederic Silver badge

      Re: Too Scary, Too Complicated

      Take your nice clean code from your language of choice and translate it into an archaic language just so that you can then compile that other language?

      I fail to see how this adds value to the process. The intermediary mangling must to benefit from the strengths of the intermediary language perform a level of optimisation and restructuring of your code.

      In other words, be a compiler. You're recommending we use a compiler so that we can use a compiler.

      It feels rather simpler to me to follow the rather more traditional approach, which lets you more quickly and easily benefit from existing compiler technology.

      1. fg_swe Silver badge

        Re: Too Scary, Too Complicated

        Sappeur is adding value by making your C++ code memory safe. Including multi-threaded code !

        And it is indeed a compiler which performs type checking and has a very special type system. It just does not have an optimizer and a machine code generator of its own.

        Compared to Java, Sappeur programs are lean and efficient. For example, you can write small Unix command line utilities with Sappeur. You cannot do this in Java, because of the excessive startup time of Java programs.

        1. Cederic Silver badge

          Re: Too Scary, Too Complicated

          Now you've completely lost me. Random tool makes code I haven't written memory safe by translating my memory safe language into a language renowned for memory management bugs?

          No.

          1. Yet Another Anonymous coward Silver badge

            Re: Too Scary, Too Complicated

            Your language is only memory safe because it checks for access

            If your random tool translates that into C with checks for access it then compiles this into machine code with checks for access

            1. Loyal Commenter Silver badge
              Facepalm

              Re: Too Scary, Too Complicated

              If you take code that might or might not be thread safe, and translate it into another language, with added memory-safety checks around everything that might or might not have needed them, you have achieved exactly three things:

              1) Given the compiler's optimiser a hell of a lot of work to do, because that translation is sure to have incurred costs. Hopefully though, the compiler is going to be smart enough to undo most of the damage in the optimisation phase and fold away unnecessary loops and so on.

              2) Made your code unmaintainable, because I'll bet that even if it has carried comments through, they're now either meaningless, or misleading.

              3) Made your code non-performant. Thread safety checks where they are not needed are going to be achieved by some sort of locking, because sooner or later, all multithreading comes down to locking, and exclusive code paths. The art of writing multithreaded code that is both performant and thread-safe comes from a hell of a lot of detailed knowledge and experience, not from running your code through a magic tool. The only way such a tool is going to achieve this is either by locking stuff up everywhere to enforce critical paths, or by actually understanding what you meant your code to do. Since the latter implies that you have managed to single-handedly solve the "hard AI problem", I'm going to go ahead and assume the former. Ever heard of deadlocks? Because I'm willing to bet anyone using your "tool" will quickly enough.

          2. fg_swe Silver badge

            Explanation

            The Sappeur language defines a type system which enforces memory safety. The compiler parses the Sappeur code and checks the rules of the type system. If all is good, then memory-safe C++ code will be generated. Just try it out and see for yourself.

    4. drankinatty

      Re: Too Scary, Too Complicated

      "Don't use the gcc infrastructure, instead generate C or C++ code and let existing compilers do the optimization and architecture-specific work."

      That is a novel concept, but how is that different from simply adding one additional layer of abstraction to the compiler process? Where existing C/C++ compilers translate what you have written into assembly, you want an additional layer that translates from whatever language you have to C++.

      I see the benefit, you buy yourself portability in the compilation process, but then wouldn't the additional layer make optimization less reliable depending on the choices made by the new compiler's translations of language "A" in to C++?

      That's a tough one, the more abstract C++ becomes in C++20/23, writing the set of choices that would need to be made to make the translation optimize well becomes harder and harder as well.

      1. fg_swe Silver badge

        Sappeur vs C++

        Conceptually, Sappeur is very similar to C++, except for the unsafe stuff such as unsafe casts, pointer arithmetic and so on. The good things such as stack allocation, RAII, destructors, ARC smart pointers, complex aggregate data structures are preserved.

        Transforming Sappeur code to C++ is straightforward.

        Please see page 7 of the following for an example:

        http://sappeur.ddnss.de/SAPPEUR.pdf

        1. Anonymous Coward
          Terminator

          Re: Sappeur vs C++

          Conceptually, Sappeur is very similar to C++

          So what you mean is 'my language which is very much like C++ can be compiled well to C'. Gosh, that is really big surprise. To no one.

          1. fg_swe Silver badge

            No

            Sappeur programs are transformed into *memory safe* C++.

            It's a simple predecessor of Rust, actually.

      2. Anonymous Coward
        Alien

        Re: Too Scary, Too Complicated

        That is a novel concept

        It is very not novel. Kyoto Common Lisp compiled to C in 1984, 38 years ago. Certainly it was not the first language implementation to do this. Descendants of KCL still today exist: at least GCL and ECL. Is significant that they have not driven out or even made any real impact on implementations with native compilers which perform much better.

        1. Someone Else Silver badge
          Holmes

          Re: Too Scary, Too Complicated

          Originally, C++ compiled to C; it was called Cfront, appeared in 1983 (from predecessor work that started in 1979), and you can read about it in Wikipedia here.

          1. Mage Silver badge
            Coat

            Re: Originally, C++ compiled to C

            To give fast access to the C++ language in the 1980s. Glockenspiel C++ used back-end C compilers, such as MS C. But it was only a stepping stone.

            The other approach was used in UCSD p-system. An ideal instruction set for the language and then a Virtual Machine to execute the p-code. Later used for Visual Basic, Java, MS J++ which became C# and also Android's VM, which is using source like Java. Maybe some versions of Forth too.

            GCC is a good idea. Having a Pre-processor that emits C, C++, Java or Javascript is fine to prototype and test a new language before porting it to either GCC or a compiler for a VM.

            JAL and some BASIC for PIC micro use an intermediate code like the idea of p-code, but a final pass creates the machine code. A little like GCC.

      3. Marco van de Voort

        Re: Too Scary, Too Complicated

        There are more issues, like e.g. anything that defies the procedural regime (e.g. home made exceptions, advanced forms of nested functions (e.g. with displays)) etc etc is hard.

        Own debug code generation (and an own debugger to interpret them): hard.

        The problem is that if you chose the transpiler way, it forever will remain bound to transpiler (read target language) limitations.

    5. Peter D

      Re: Too Scary, Too Complicated

      What you are describing is the equivalent of old fashioned cfront. When C++ came out there were no native compilers so cfront emitted C and the C was compiled. It was an unholy mess. For example, when templates came out cfront would create a .pty directory and emit often hundreds of C source file to speculatively represent each specialisation possible of the parameterised types. LLVM is a vast improvement on that approach.

    6. JulieM Silver badge

      Re: Too Scary, Too Complicated

      I can only seem to find pre-compiled versions there.

      1. fg_swe Silver badge

        Yes

        I am not giving away the source, as I no longer believe in giving away work for free.

        1. JulieM Silver badge

          Re: Yes

          Fair enough. There are plenty of other people out there who do want my business.

        2. Loyal Commenter Silver badge

          Re: Yes

          Just in free advertising, eh?

  2. fg_swe Silver badge

    KISS 2: Generic Code

    Some people say you need a dedicated generics mechanism, similar to C++ templates.

    This is not true. A proper macro processor such as m4 can (for all practical purposes) do the same. Debugging is much easier, as the "instantiated" code can/will exist in a file to be inspected with a standard editor. No insane, cryptic C++ template error messages.

    Not my invention, saw it when I worked for D'Assault on CATIA.

    1. Arthur the cat Silver badge

      Re: KISS 2: Generic Code

      To mangle an old trope:

      You have a problem. You decide to use m4. You now have two problems and a maintenance nightmare that will prematurely age you.

      [Speaking from experience. You may know what it's doing when you write it but the moment someone else touches it (including you 6 months later) you're in for a world of pain.]

      1. fg_swe Silver badge

        Re: KISS 2: Generic Code

        m4 worked nicely for me, but I never dared to use all of its features.

        1. bombastic bob Silver badge
          Meh

          Re: KISS 2: Generic Code

          M4 - I just deal with it for autoconf (etc.) and that's about it

          Let's not complicate things TOO much ok?

        2. Alan Brown Silver badge

          Re: KISS 2: Generic Code

          This is exactly the problem with m4, you dare not use all its "features" - doing so is very likely to unleash an eldritch horror

          But what do I know? I always handcoded my sendmail.cf, because I could and m4 usually didn't do what I wanted

          1. jake Silver badge

            Re: KISS 2: Generic Code

            Little known fact: Sendmail's configuration language is Turing Complete.

            So naturally, I had to write a C compiler in it, just to prove to myself that I could.

            It's slow, ugly, cantankerous, and has few polished edges ... but it works.

            I'm sure you'll pardon me for never updating it past C89 ...

            1. Arthur the cat Silver badge
              Pint

              Re: KISS 2: Generic Code

              So naturally, I had to write a C compiler in it, just to prove to myself that I could.

              OK, that's perversion above and beyond the call of insanity(*). A major tip of the hat and enough alcohol so you never do it again.

              (*) I've only written a calculator in it. I'm bloody jealous.

            2. JulieM Silver badge

              Re: KISS 2: Generic Code

              It's my conjecture that any sufficiently-sophisticated IT project ends up implementing a Turing-complete interpreter. (PHP is an example of one such that escaped and became feral.)

              One day I'll write a SPICE deck for that microprocessor I designed, and see if I can implement an interpreter on it .....

    2. Mage Silver badge
      Devil

      Re: KISS 2: Generic Code

      Macros are evil. Unexpected side effects. Great for Assemblers, daft for a new high level language.

  3. Marco van de Voort

    I miss a critical note and some figures.

    I totally miss any form of criticism i the article. It is just a article about baseless glory story. No analysis of how it actually works out for non corporate 3rd party users. No numbers, not even names of successful independent frontends and how they do/did it. It could have been an old glory story from the days of the GIMPLE introduction that got polished up a little.

    I agree with fg_swe, actually in recent years we saw a decline in attempts to use the backends directly, and go to last resort C backend (which often means giving up the hope on a speedy compiler-run-debug cycle). Only corporates with vast manpower can afford to tangle with these beasts.

    On GCC the major version transitions are notorious and break your frontend all the time, and you can only get an old version into a Linux distro for so long. LLVM has so much undocumented behaviour (other than look how C/C++ frontend does it, that you don't even get that far. Both teams are notorious for ignoring bug reports and merge requests for issues that don't touch the dominant frontend(s) and/or their corporate sponsors, even for multiple major cycles. So basically if you run into a problem you are fscked.

    Performance issues are often papered over with parallel compilation that is also not that easy with a new frontend.

    1. Alan Brown Silver badge

      Re: I miss a critical note and some figures.

      Speaking from experience with the commercial rivals for GCC/LLVM, they aren't much better and come with an added dash of long-term abandonment

      1. Marco van de Voort

        Re: I miss a critical note and some figures.

        Just curious: which ones allow direct access to their internal structure ? (direct rather than the source way)

    2. Torben Mogensen

      Re: I miss a critical note and some figures.

      Web Assembly (WASM) is better defined than GCC and LLVM (it even has a formal specification), and it is simpler too, so it makes an easier target language than GCC or LLVM intermediate representations. Unfortunately, WASM is intended for running in browsers, so it is sandboxed and requires an external program to handle i/o and everything else you would normally get from an OS, so it is not suited for all purposes. Also, the existing implementations of WASM are not super fast.

      Speaking browsers, several languages use JavaScript as target language, allowing them to run in browsers. So much effort has been put into making the travesty called JavaScript run fast in browsers, that these languages can have competitive performance. But since JavaScript is very ill defined, this is not a good solution.

      The article also fails to mention JVM and .NET as compiler targets. While farther from silicon than both the GCC and LLVM intermediate languages, they can provide decent performance and (more importantly) access to large libraries (which, unfortunately, in both cases require you to support the object models of Java and C#, respectively).

      And while I agree that you need parallelism for compute-intensive applications, not all applications are that, so there is still plenty of room for languages and compilers that do not support parallelism.

  4. Pascal Monett Silver badge
    Trollface

    "Any fool can write a language"

    Many have.

    1. fg_swe Silver badge

      Indeed

      We should have never touched fire and continuing to live on the trees would be like a paradise. Except when the yellow-black devil cat shows up...

    2. Version 1.0 Silver badge
      Happy

      Re: "Any fool can write a language"

      I started using assembler and then moved to FORTRAN in a new job, and since then I've had tasks that have involved writing in many new languages to fix problems for years now. Essentially when a new language appears, if I have to use it then I will ... I've always laughed at the old joke; "A FORTRAN programmer can write FORTRAN programs in any language" but LOL, it's a fact.

      1. Arthur the cat Silver badge

        Re: "Any fool can write a language"

        I've always laughed at the old joke; "A FORTRAN programmer can write FORTRAN programs in any language" but LOL, it's a fact.

        Oh god yes. I've seen FORTRAN in Lisp:

        (BEGIN

        (SETQ X ...)

        (SETQ Y ...)

        ...

        (RETURN ...))

        [And why does code markup not keep indents but add blank lines?]

        1. Someone Else Silver badge

          Re: "Any fool can write a language"

          [And why does code markup not keep indents but add blank lines?]

          This.

      2. Stumpy

        Re: "Any fool can write a language"

        In a similar vein, I once had to unpick a C program written by an ex-COBOL programmer.

        Christ, that was a nightmare. Around 1500 lines of COBOL-like C ... all in a single main() function.

        1. Alan Brown Silver badge

          Re: "Any fool can write a language"

          Trust me, it's easier to debug C written by a COBOL prgrammer than the inverse....

          1. jake Silver badge

            Re: "Any fool can write a language"

            Only because most COBOL programmers have been coding a lot longer than most C programmers. One tends to learn some things along the way.

            This used to be true ... these days, I'm not so sure anymore.

    3. Ozan

      Re: "Any fool can write a language"

      I am one of many. I should not be allowed to touch languages.

      1. jake Silver badge
        Pint

        Re: "Any fool can write a language"

        Don't be silly. Of course you should. Look at how much you learned in the attempt!

  5. Will Godfrey Silver badge

    Too many have!

  6. Anonymous Coward
    Anonymous Coward

    Ken Thompson had something to say about compilers.....in 1984....

    Link: https://wiki.c2.com/?TheKenThompsonHack

    So......the subject of "trust" probably needs an airing......before we all start to think that "...it takes compilers to save the world..." Development environments (and that includes compilers) are under attack.

    Lest you think that I'm just some paranoid old fool:

    - Link: https://www.npr.org/2021/04/16/985439655/a-worst-nightmare-cyberattack-the-untold-story-of-the-solarwinds-hack?t=1649071881395

    - Link: https://www.theregister.com/2021/05/11/solarwinds_ceo_orion_build_system/

  7. Peter Gathercole Silver badge

    C of the '80s

    I dispute that the major C compilers for UNIX platforms were fragmented and difficult to port to.

    Bell Labs./AT&T produced the Portable C Compiler for UNIX Edition 7 in the late 1970's, which was adopted as the basis for the C compiler for most UNIX vendor's platforms, and was written precisely to prevent what you described.

    This worked by breaking the C compiler into two parts (with an optional optimizer), and only the last part was involved in generating machine specific code, normally assembler code which would be passed to the machine specific assembler program. The first part could be pretty much common to all platforms.

    If I remember it correctly, it implemented something akin to a virtual "P" machine that was not related to the target platform, that the program would be initially coded to, and the code generator would take that and generate the actual machine specific code. I believe that it was possible to tune some aspects of the intermediary code to give some hints about number of registers argument passing to functions etc.

    This made the C itself portable. The other bits of the systems, like the included libraries, system calls and other aspects of the API were not so common between systems (even UNIX ones!). This is what Posix was designed to fix.

    So your comment about C hell was really not about C, but about the systems that you used C to code for.

    Of course, if you used the vendor specific compiler extensions, rather than limiting yourself to the standardized common subset, they you were asking to be locked in!

    1. Phil O'Sophical Silver badge

      Re: C of the '80s

      Indeed, the idea of compilers having front/back ends that could be individually adapted long predates GCC. As we see all too often, people who grew up during the Linux years assume that everything in Linux is new, because it's different to Windows. They don't realise that it was almost all done before, both in the various Unix flavours and in the OSes that predated Unix.

      1. fg_swe Silver badge

        Even Worse

        Many software developers are of the myopic opinion that C and Unix are the pinnacle of practical computer science. Because that is what they heard in a slogan.

        These folks are too lazy to do some research into the history of computers and software. They would find interesting systems such as the Algol Mainframes, HP MPE, S/360 and successors, Modula/2, Oberon, LISP workstations, Smalltalk.

        S/360-derived computers are still a mainstay of financial and business transaction processing. The Algol mainframes had some sort of type safety in their instruction set. MPE was a very successful "small mainframe" written in a Pascal variant. Oberon has many great ideas and is very elegant.

        Knowing history is important for software engineers, too.

        1. Ozan

          Re: Even Worse

          Also I wanted to mention Multics. Did so many so long ago.

        2. Peter Gathercole Silver badge

          Re: Even Worse

          I hope I didn't come across as myopic. I used the UNIX portable C compiler as an example, because it's one I know off the top of my head.

          But I am sufficiently aware of other companies back in the day, and had direct experience of some of Digital's compilers, and also worked supporting some of IBM's cross-platform compilers to know that other good things were going on outside of UNIX space.

          I don't think I've ever claimed that UNIX was the pinnacle of OS development, it was just earlier and better in some ways than much of what has happened since. It was a product of it's time, and has had a major influence on the digital world we now know.

          But the time of genetic UNIX has now passed, and while I'm sad about this, I just hope the world does not forget all of the principals that made it so successful for so long.

    2. vincent himpe

      Re: C of the '80s

      So we'll build it for an imaginary computer with an imaginary instruction set and then shoehorn that imaginary binary onto real hardware ? -FAIL-

      THAT is the problem with all those catch-all compilers (and languages) we have today.

      They compile for something with an architecture that does not exist and an instruction set that does not exist.

      I've picked apart output of several compiler for embedded systems. Sending "Hello world" to a serial port. strings are null terminated like c.

      in pseudo code :

      string = {"hello world",0x00} // 12 bytes of rom space , null terminated string (11 chars + null)

      MOV DPTR #string. // load the address of the string into the data pointer

      :next_char

      SNZ @DPTR // skip next instruction if not zero

      JMP exit

      :wait_for_port

      SBC SCON,02 // Skip next instruction if bit clear register Serial CONtrol , bit 2

      JMP wait_for_port

      MOVI TX @DPTR // send character to TX register Move with post-increment DPTR

      JMP next_char // see if there is something else

      :exit

      I've seen compilers that produce this : ( because they did not know the MOVI instruction)

      MOV TX @DPTR // send character to TX register Move with post-increment DPTR

      INCR DPTR

      or this (because they did not understand DPTR is a hardware register and treat it as a variable)

      MOV TX @DPTR // send character to TX register Move with post-increment DPTR

      MOV A, DPTR // move DPTR to accumulator

      INCR A // increment accumulator

      MOV DPTR,A // move accumulator back

      On a harvard machine you have dedicate mov operation that change depending if you are moving

      ram to ram

      ram to register

      register to ram

      io to ram

      ram to io

      rom to ram

      rom to io

      I've seen compilers that take that string in rom , allocate 12 bytes of ram , copy the string from rom to ram and send it out. Slightly smarter ones do it character by character but still need a rom to ram copy first.

      Those are clear examples of the misery with generating intermediate machine code. They have one print function that is designed to take a ram pointer. Try to get a string from rom and it needs copying over first.

      Optimization is not something to be done afterwards. Optimization needs to be done first. see what instructions are in the machine and map the source in the most efficient way.

      Jump tables are a prime example of that. Switch case statements can easily be translated into jump tables. Depending on the selected branch all you do is add an offset to the code pointer so it lands in the jump table . There it finds a single move operation with the new target.

      Your switch case statement translates to a constant speed operation (one ADD , one JMP), no need for testing anything.

      Now, i do realize this is different on machines with different architecture or dynamically loaded programs. The above is just an example of how bad compilers can be when doing the compile to imaginary machine"

      It pays to take a look at compilers that make code for one and one system only. They can heavily optimize for the architecture. Portability is in the source code, No need for an intermediate layer.

      1. Peter Gathercole Silver badge

        Re: C of the '80s @vincent

        It will always be more efficient to directly generate code for the target platform. That's not really what the author was talking about.

        But having different compilers for every platform can result in different behavior of the same code compiled on the different platforms. And this often only shows up as corner cases that are either not defined in the language specification, or were just not considered when the compiler test suites were written

        At least using a common syntax and intermediate code generation, these parts of the compiler will generate common results. You then just have to worry about taking the intermediate code to the final machine code for the target platform.

        Yes, the compiler will have to go through more stages than direct code generation. Yes, the resultant code may not make best use of all of the features of the target platform, but yes again, the code you write is more portable.

        It's one of these swings and roundabout problems. While you gain in compiler development and code portability, you lose in code efficiency, and compiler speed. And over all of this, you have the various associated costs.

        And coupled with this, target platforms change. While code written for, say, an original IBM POWER processor, Intel 8086 or ARM 1 may still run on the latest Power 10, Xeon or M1 Pro processor, it won't take any advantage of new features in the later processors. And the compilers need to reflect this.

        If you can leave the first pass syntax checking and code analysis of the compiler alone, and just concentrate on the code generation and optimization, you can speed up the development of compilers to target the latest, greatest processor innovations.

        If you ever read about any of the standardization committees for languages, you will find that they get bogged down in the nitty gritty bits of exactly what the results of certain code generated by a compiler should be. For the portable C compiler, the people writing the compiler were also very close to the people initially defining the language, which meant much of these discussions never needed to happen.

        But then again, these were simpler days, and the languages themselves and the compilers were much less complex.

      2. Justthefacts Silver badge

        Re: C of the '80s

        Well, although I didn’t read all of your examples of compiler error, several at least of these may be coder-error, not compiler:

        “[the compiler] did not understand DPTR is a hardware register and treat it as a variable”. You need to use the C-standard keyword “volatile” to let the compiler know the hardware can change the value underneath it. Unless you’ve found a compiler that doesn’t respect keyword “volatile”. TLDR “some compilers have non-standards-conforming subtle bugs. Use a professionally-written and commercially-maintained one”

        “there are good reasons to want jump-tables, e.g. constant-time which reduces testing”. Yes, and security implications. That’s why “implement switch statements as jump-tables” is a compiler option in every *professional* compiler I know. TLDR “Use a professionally-written and commercially-maintained compiler and RTFM”

        “Compilers that take that string in rom , allocate 12 bytes of ram , copy the string from rom to ram and send it out.” Not sure whether you mean ROM (which you might) but more usually Flash nowadays, whether embedded or SPI Flash. But Flash is usually significantly slower than RAM, which makes it more energy-efficient to read blockwise into RAM before use. I get your point that it’s a tradeoff which a layered compiler can’t make optimally, but equally it *is* usually the best implementation

    3. oldtaku Silver badge
      Black Helicopters

      Re: C of the '80s

      I'm sure he's referencing things like the PC side, which had monstrosities like Microsoft C, which was based on Lattice C and was not K&R. Later on they made it K&R compatible (breaking back compat), but it was still a stupendously slow, buggy, piece of dog crap. There was also Watcom C (similar, though less buggy, but slow). Then Turbo C for PC came along (which was re-branded Wizard C) came along and kicked everyone's ass. These were all incompatible with each other unless you stuck to the barest of bare bones C.

      So yes, it was somewhat about the systems you were forced to code on. Things were better on the UNIX side. But if you weren't in academia then it was hard to ignore the PC market in the late 80s, even if your company had some UNIX as well.

      Nor was 'UNIX' a complete panacea at the time. Various vendors like HP and IBM were busy perverting things with abominations like HPUX and AIX. You couldn't just download, compile, and run anything but the simplest standard UNIX C applications on these. Almost anything needed tweaking or, god forbid if it had a GUI component, major hacks.

      1. Roland6 Silver badge

        Re: C of the '80s

        Yes, there were a variety of compilers for the PC, I seem to remember Aztec-C being one of the better ones.

        Not only were there language differences to trip up the unwary - I suspect some were there due to them taking differing approaches to handling the x86 segmentation model but there were also important differences in the libraries, such as what happens when you moved a file pointer beyond the end of the physical file - a condition not defined in K&R.

        However, the author is just showing their ignorance. Yes porting C intended for Unix to another platform such as the PC/MS-DOS PC wasn't simple (in fact just getting the source off the Unix box on to the PC wasn't trivial) . Porting a C program from say SunOS (BSD) to NCR (System V) in the 1980's wasn't a simple recompilation and things weren't much better in the 90's, for example the Bull DPX/2 200 and 300 both ran System V on 68030 but as the 300 was SMP, everything had to be recompiled and retested (for one project we used a DPX/2 200 as a comm's processor as it had a certified X.25 card unlike the 300 at that time.

        Similar issues arose with other languages such as Cobol, Fortran, Algol-60 and Algol-68, as all had vendor introduced differences and extensions.

        1. jake Silver badge

          Re: C of the '80s

          Mark Williams Company's "Let's C" was a complete professional caliber C development environment for the IBM PC. At $179, it even came with an early IDE (a slightly hacked variation of MicroEMACS). Coded almost entirely in assembler, was it tiny ... You could fit the whole thing, including compiler, assembler, linker, libraries, IDE, etc. on a single, DOS-bootable 1.44 floppy. If you needed it, you could fit csd, their C source debugger, and a small inline help system, onto a second floppy along with your project files.

        2. Alan Brown Silver badge

          Re: C of the '80s

          As an aside, IIRC the original Atari ST roms were coded/compiled using Lattice C

          in the early 2000s someone recompiled them using a more modern compiler (I don't know how much source mangling was required), resulting in them occupying less than half the ROM space as well as running significantly faster

          1. Roland6 Silver badge

            Re: C of the '80s

            I remember the size of output being a big talking point in the articles comparing C compilers. If memory serves me correctly one compiler gave a surprising result - an executable of almost zero bytes. On investigation the optimiser had determined that the source code performed no function as it took no input and returned no output and thus had optimised out the entire module...

            1. Someone Else Silver badge
              Pint

              Re: C of the '80s

              Nice! - - - ->

      2. Peter Gathercole Silver badge

        Re: C of the '80s @oldtaku

        Most incompatibilities between different UNIX vendors were caused by people writing to a particular vendors extensions.

        Keeping to Posix once that had been defined meant that code became much more (but not completely) portable.

        The biggest problem in the UNIX space was the split between genetic, AT&T derived UNIXes, and BSD derived ones. Things like AIX and SunOS 4 (SVR4) tried to be all things to all people, providing alternate libraries and include files, plus a number of macros that allowed you to take code from another platform and compile it against the alternate universe by setting #ifdef and compiler flags, but some things were just too different. This was particularly true when it came down to fundamental things changing due to progress (I particularly remember the pain that switching from 16 to 32 bit UIDs caused all over the place when some vendors had made the change, and others hadn't), and people writing endianess sensitive code (not really a UNIX issue) was always a problem when porting.

        There was an art and skill to writing portable code. Most developers didn't have it.

    4. Anonymous Coward
      Anonymous Coward

      Re: C of the '80s

      pcc was a good compiler ... if you either did not care about performance or were using a machine that was enough like a PDP-11 that it worked reasonably well. So it was fine on VAXen, and OK on 68ks. If you had instead, in the late 1980s, a computer which was fast, like a SPARC (never very fast but a lot faster than a 68k) or PA-RISC (actually fast) or Alpha (very fast, but 90s not 80s), then pcc was just terrible. So you were stuck with either the vendor's compiler which was probably good but perhaps quirky and perhaps expensive, or very slow code.

      Until gcc came along. After gcc no-one used pcc if they could possibly avoid it, and there is a reason for that (and it's not gnu fandom).

      1. Peter Gathercole Silver badge

        Re: C of the '80s

        It's funny. One of the platforms Bell ported UNIX to was the Interdata 8/32.

        According to the Wikipedia article on the Interdata, "Bell chose the 8/32 for their port because it was as different from the DEC PDP-11 as possible"

        The C compiler used was the Portable C Compiler, so I would contend that similarity to the PDP-11 was not required for the PCC to work.

        1. jake Silver badge

          Re: C of the '80s

          Thus the "Portable" component of the name.

          All these mewling naysayers attempting to downplay C by claiming "its just the PDP11's architecture" tend to forget that C was designed to port UNIX to pretty much any platform. It is still doing a rather nice job of (almost) precisely that, 50 years on.

          Not all things need retiring due to age ... The wax seal under your bog is probably over forty years old, are you planning on replacing it any time soon? When was the last time you replaced your soil-stack? The pipe that supplies freshwater to your house? Your upstairs plumbing? When stuff is built right the first time, for the job at hand, with properly selected parts, there is little need to replace it, until things really go pear-shaped.

          Contrary to popular belief, C hasn't yet gone pear-shaped.

          The quality of programmers being churned out by the STEMinistas, on the other hand, has changed the ratio of good to bad programmers into a mockery of what it once was.

          It ain't the tool that's b0rken, it's the tool wielders.

  8. Anonymous Coward
    Anonymous Coward

    Does really a single application promote innovation?

    if everything is compiled by GCC, where's the innovation? In the past in the PC world we have seen compilers that could beat Microsoft ones easily, and that meant real competition. MS did compete the wrong way because it crippled those companies and only later improved VC++, because, yes, there was still GCC.

    It is true that at least there is some competition between GCC and LLVM - still the landscape on development tools is worse today than it was twenty years ago. The integration among the many tools needed to deliver a well working application is worse - and the quality of applications does show it - regardless of all the devops beliefs - running another application to tie everything together is just like regex - if you think to solve a problem with devops now you have two problems.

    It's the smooth flowing from source code (and UI design, for interactive applications) to fully and properly debugged and profiled application that deliver great software

    1. Alan Brown Silver badge

      Re: Does really a single application promote innovation?

      "if everything is compiled by GCC, where's the innovation?"

      In a commercial environment it's driven by competition and the risk is monopolism (embrace, extend extinguish)

      In a GNU environment it's driven by the commons itself. If something's better it WILL be folded into the mainstream, Sucessful forks flourish and become the mainstream, Unsucessful ones have the useful parts incorporated into mainstream anyway

      1. jake Silver badge

        Re: Does really a single application promote innovation?

        And sometimes successful forks and the mainstream merge into a better for everybody whole.

  9. Anonymous Coward
    Anonymous Coward

    There used to be an adage:

    Pascal - if it compiles it will run ok.

    C - it always compiles.

    1. DarkwavePunk

      "C - it always compiles." - I get your point although not entirely true. There is as always the question of "Should it do what you tell it to do, or what it thinks you want it to do?". I'm mostly in the former camp despite having trashed many many things in my life.

      1. Nick Ryan Silver badge

        Computers tend to do what they are told to do. It's where there is a difference between what is, or was originally, intended is where the problem comes.

        That and just unthinking developers who struggle with a singular binary state and are unable to comprehend that there could be more states beyond this, let alone how to handle them safely.

    2. fg_swe Silver badge

      In Other Words

      Proper Software Engineering should avoid C.

    3. vincent himpe

      provided you have enough closing parentheses for opening ones , got all your semicolons in line and told the compiler every time when you means assign and when you mean compare. ( = versus == )

      I still cannot understand how so many languages cannot figure this simple thing out. This was solved in many older languages than C. Even every , often scoffed on, Basic compiler/interpreter can do it.

      1. Loyal Commenter Silver badge

        It's because languages like BASIC don't return a value from an assignment, so they can overload the operator.

        They know that IF A=1 is different to LET A=1 because of the syntax.

        In C-type languages though, assignment returns a value, so you can write things like a=b=c=d=e=f=g=1; and it knows this is a string of assignments, just as it knows that if (a = 1) {...} means "if the result of assigning 1 to a is non-zero" (always true unless something bad™ has happened), and not "if a is 1". Just as if (a==1){...} is unambiguously a test to see if a equates to 1.

        Yes, it's a huge pit-trap to the unwary, but it's also where a lot of the flexibility of modern languages comes from, and the IDE and compilers of things like C# are smart enough to stop you doing something you pretty obviously didn't mean to do.

        I'd lay the blame at languages that conflate equality with assignment by using the same operator, rather than at those which use different operators for equality and assignment and then behave unexpectedly when you use one in place of the other because the assignment also returns a result which can be treated as the terms for a switching statement such as if or while.

        1. drankinatty

          "it's a huge pit-trap to the unwary"

          should be modified to:

          "it's a huge pit-trap to those who don't enable compiler warnings..."

          1. Nick Ryan Silver badge

            I once had a complete flip at a developer whose approach to the hundreds of compiler warnings in his code was to explicitly turn off compiler warnings.

            1. Loyal Commenter Silver badge

              If you turn off compiler warnings, you deserve to get them replaced with one warning. A "final warning" followed by termination of employment if you don't get the message.

              It's the professional equivalent of a surgeon not bothering to scrub up because "he knows he has no bacteria on his hands".

            2. John PM Chappell
              Thumb Up

              /Wall /WX

              Completely agree. I compile with all warnings on and "treat warnings as errors"; those warnings almost always do warrant some attention, either in improved, or more explicit code and the truly harmless ones, where you really do mean what was coded (and no, it is not a problem) can be dealt with on a case by case basis, with a #pragma or similar.

              @Loyal Commenter - totally agree with your sentiment, too.

        2. Torben Mogensen

          Regarding = and ==, I fully agree that assignment and equality should use different operators. But I much prefer the Algol/Pascal choice of using = for equality and := for assignment. = is a well-established symbol for mathematical equality, whereas mathematics generally doesn't use assignment to mutable variables, so there is no a priori best symbol for this. So why use the standard equality symbol for assignment and something else for equality? I suspect to save a few characters, since assignment in C is more common than equality testing. But that is not really a good reason.

          Note that I'm fine with using = for definition of constants and such, because this implies mathematical equality (and = is used in maths for defining values of named entities). Context is needed to see whether = is used to declare equality or to test for equality, but the context is usually pretty clear.

          1. jake Silver badge

            "But that is not really a good reason."

            As soon as I invent my time machine I'll be sure to go back and let K&R know that you'll disapprove at some point in the future. I'm absolutely certain they will ignore me. As they should.

    4. Yes Me Silver badge
      Coat

      If it compiles it will run ok

      Did you never read "How to avoid getting SCHLONKED by Pascal" then?

      ACM SIGPLAN Notices, Vol. 17, No. 12, 1982.

      https://dl.acm.org/doi/10.1145/988164.988167

      1. fg_swe Silver badge

        Male Cow Output

        Just looked at the paper you reference. They claim PASCAL would be unsuitable for numerical processing, process control and operating systems.

        I once wrote a planetary simulation in Turbo Pascal and found it perfectly fine for this purpose. The compiler was lightning fast on a 16Mhz 80286 (three seconds for a project of 1000 LOC).

        HP wrote the MPE OS kernel in a Pascal dialect and the result "mini mainframe" went on to be highly successful in business settings such as corporate email, manufacturing management, inventory mgmt etc.

        Currently I write auto control unit code in C and I see no reason why we could not use Pascal or Ada as a replacement for C. Pascal would be an improvement, because one can specify value domains for variables. This has to be grafted on top of C (using special comments), if you want to use static checkers for variable domains.

        Arianespace, Leonardo and many others use Ada for aerospace control units and the issues they had were related to project execution (not doing a HIL Test for the Airanve V first flight, for example). Never heard they had issues with the language.

        The lack of index checking in C is also a very real problem in real-world auto control units. Can be fixed to some degree using PC-Lint and PolySpace. Is also "fixed" by memory protection units. At least you can contain the cancer...

        1. Peter Gathercole Silver badge

          Re: Male Cow Output

          I started reading the paper, but didn't finish it, but...

          When you talk about Pascal, you need to be careful about what you talk about. 'Standard' Pascal, of either an ISO or ANSI variant was pretty useless at everything, save as a teaching language (what it was written for). This is because it was deliberately so limited in what it could do.

          It was a perfectly good language for teaching the concepts of functional programming, was good in that it had strong typing (although it was a bit short of pre-defined types), but if you tried to do anything like casting from one type to another, there was almost nothing in standard Pascal to allow you to do it (you had to use variant records, which is clunky).

          The I/O was limited, and did not even really cope with anything other than record or line based I/O, and there were no structured file handling built in other than fixed length records, or variable length text lines.

          When people talk about "writing an OS in Pascal", what they really mean is "writing an OS in an extended version of Pascal". You could argue that all "extended" Pascal's were not really Pascal at all, but another variant of an ALGOL like language.

          And here we run into the authors point. No two extended Pascals were alike, and porting between them was hard, probably harder than from the various C implementations.

  10. PhilCoder

    The most important feature of any language

    IMHO the most important feature of any computer language is the quality and scope of its libraries and frameworks. Part of the success of the likes of C and Microsoft VB was the huge hinterland of 3rd party libraries and components which were readily available for purchase.

  11. amanfromMars 1 Silver badge

    Amen to that .... and IT aint Simply Pocket Rocket Science.

    It’s Quite Designedly Diabolically Tricky and Fundamentally Revolutionary

    Compiling the future, however, is where the real fun's at.

    And amen to all of this ...... Words create, command and control and destroy worlds .... and nobody owns them but beware those somebodies who would disown and deny certain streams of them the light of further days and 0days, fearing what is revealed of them to more than just themselves via their catalogues of terrors virtually planned for practically everything else/nearly all and sundry.

    Real Fun? ....... Yeah, I do suppose IT can be, and therefore inevitably eventually it certainly is.

  12. Anonymous Coward
    Anonymous Coward

    One problem with the article..its mostly fanstasy..

    Just how familiar is the author with C compilers in the 1980's? Did he use any of them? Ship any product with them? Was he even alive at the time? Because his description bears not the slightest resemblance to the world of C software development for commercial shrink-warp software back then.

    Shipped my first C commercial mass market application in 1984. Compiled on a custom in-house C compiler. K&R compilers are easy to roll you own. After that shipped lots of products over the next decade written in C until C++ compiler became stable enough around 1992/93. The C compiler quality was variable. The Green Hills C compiler than shipped with MPW 1.0 in 1986 is still the best codegen output I've seen. Beautiful output. The Watcom C compiler which was the (non MS) x86 industry standard back then was also very good. Its open source now. So you see how to write a great small fast C/C++ compiler.Watcom C /C++ was put out of business by MS spending large amounts of money to "buy" their best customers. Long story. Sad ending.

    So by the late 1980's it was Watcom or Borland for very good C compilers in x86 land. Green Hill's for superlative (but expensive ones). On the Mac there was the OK compiler shipped with MPW 2/3, the very good Think C one which was the defacto standard in Mac-land. And a whole bunch of honorable also-rans.

    Microsoft C chugged along in MS-DOS land but only used when no other choice. Very serious linker bugs for large sym-tables. Hence MFC. And no one in my expereince used GCC 1 or 2 for mass marker software development because it was garbage. Just never came up as a platform. First time I saw GCC in a commercial dev project was for an embedded system. In 2001. And GCC was eventually dumped because it was so buggy. After GCC 3 was released it was not longer terrible but still usually only used when no other alternative. Or when people did not know any better.

    LLVM got such a fast takeup because GCC was/is so terrible and LLVM is easy to port to new platforms. After 15 mins of reading the GCC source code you just want to start hitting your head with a hammer. LLVM source code is not so toxic.

    The dirty little secret of usable compilers that produce the best code output is that their internals bear little resemblance to what you see in those academic compiler books. Even the optimizers. I can think of two compilers that have unusable optinmization features (important ones like instruction scheduling) because they allowed some grad student to try to implement their thesis. Same goes for front ends. I just finished up a IR retargetable codegen. Very small, very fast. And pretty much the same code as the roll your own compilers back in the 1980's. Because back then there was not a large body of academic literature than tried to make what was very straight forward look very complex.

    Thats how it actually was in the trenches back then. For those of us who had to ship product.

    1. fg_swe Silver badge

      gcc Experience

      As a user of gcc I cannot support your observations. I used gcc to good effect for at least 15 years now. It seems to be a bit slow, but the generated code was always sufficiently efficient for me. Also, I did not experience bad bugs. Using static code checks is highly recommended with ANY c compiler, though.

      Current gcc is also the backend of the GNAT Ada compiler, which is successfully used in demanding aerospace projects such as Jäger 90.

      1. vincent himpe

        Re: gcc Experience

        15 years. you have not even broken the 2000 barrier. the OP is talking mid 80's ! that's 35+ years ago !

      2. Anonymous Coward
        Anonymous Coward

        Re: gcc Experience..so post 3.0

        So you only used GCC 3.0 and beyond. Which was after the throwaway and rewrite. It was the pre 3.0's that were total garbage.

        Post 3.0 are mostly usable and not too buggy but I will always use an alternative if available. Usually LLVM based now. GCC breaks far too many basic principles of compiler tool sets. GCC was a mess because Stallman knew f*ck all about compilers and writing reliable software. And a lot of that incoherence has survived because of some terrible bad decisions in the early days.

        I was referring to the pre 3.0 universe. Which was pre early 2000's. To those of us who used traditional and properly designed compiler tool sets long before GCC (and after) working in GCC land is like getting into a Lada after driving a Range Rover. Very basic stuff. Like having a well defined separate linker. There is a very good reason while all CLI compiler tool sets apart from GCC have a separate linker. It makes life much easier for people who have to do bare iron work. And those who dont Make takes care of housework. A properly designed traditional tool chain makes for full flexibility and customization. Stuff that is simple and straight forward in other compiler tools sets is usually drudgery and grief in GCC.

        Saying that. I have been a very happy user of IDE's as well since the mid 1980's. A good source level debugger is worth it weight gold. But a JTAG debugger with full CPU hardware debug control is heaven. But when I have to do the real systems level work, out comes the relevant tool chain. To be tweaked to get the best results. Often by writing a new tool for the chain or modifying an existing one. Its amazing what one can do with that universal software duct tape, TCL_Interp().

        1. that one in the corner Silver badge

          Re: gcc Experience..so post 3.0

          Don't have a reference to hand, but my understanding about the vileness of the GCC 1 and hence 2 source was a deliberate move on Stallman's part. That he didn't want people to be able to split out the various compiler phases, say to use the parser to write out an AST that could be used as the basis for, say, a refactoring tool or to feed an automatic documentation generator (how much better Doxygen and the like could have been right from the start...)

          Post-3.0 we got Gnat (Ada) and g77 (FORTRAN) rolling into GCC and the move to calling it the compiler collection, as opposed to C/C++/other-C-with-classes-variants. But still a pain to pull out a decent program representation: GCCXML did manage enough for their purposes but they've moved away from using GCC as a frontend.

          1. Anonymous Coward
            Anonymous Coward

            Re: gcc Experience..so post 3.0..yeup

            GCC 1./2 was a mess because of Stallman and his manifest personal failing. He had a toxic reputation even before GCC 1.0 was released. Ex MIT people were everywhere..

            I looked at the source code for an early release, shook my head at the mess, and pretty much ignored it for the next decade plus. As did every other serious commercial dev project I saw or heard of. The whole MSDOS/WIn16/Win32/MacOS shrinkwrap world. Some game console guys tried using custom GCC tool chains which kinda worked. For weird processors like the SH-2. But a nightmare to use.

            I remember a long dev team meeting about GCC 3.0 in the summer 2001 when it came out. Once I heard it was a throwaway / rewrite I gave it fresh look and it turned out to be pretty stable and usable the next time I used it in an embedded project.

            So now if GCC is part of a platform toolchain and I just want to do straight forward low complexity dev work then I see no reason to change. But if the project requires anything special, custom or high complexity then GCC is chucked and a more flexible compiler is used. LLVM compilers have their own problems but the problems I find rarely have me saying - why the f*ck have you done that. Which tends to be a constant refrain in GCC land when trying to do anything non trivial.

            But if you are not very familiar with traditional tool sets and how they work the GCC world might seem the only way to do things. For most dev work, its good enough. But there are better more efficient ways. Even though I am an hard core IDE guy I still really miss MPW Shell. Which showed how it should be done. Command O tool integration has never been bettered.

            .

    2. Anonymous Coward
      Anonymous Coward

      Re: One problem with the article..its mostly fanstasy..

      I found greenhills complier to produce buggy code for the 68332 processor around 95-96. It felt like I was doing the quality control testing for them. I have to admit they did fix the bugs reasonable quickly.

      Now a days I compile everything with gcc and clang on Linux with all warnings on. I quite like the latest visual studio and compilers. I use to dread having to get code to compile on Windows, but with boost and the expansion of the standard libraries porting is significantly easier.

      I now very rarely use vendor specific compilers unless the contract specifically requires them to be used.

      1. Anonymous Coward
        Anonymous Coward

        Re: One problem with the article..its mostly fanstasy....microcontrollers?

        That was a 68K microcontroller? With the usual weird add ons? So a very low volume product.

        Well the story with the MPW 68000/20 Green Hills compiler under MPW is that I had it compile some of the biggest application codebases of the day on MacOS without issue. Due to some linker bugs in the Apple C compiler shipped with MPW 3.0 which gagged on large symbol tables I was still compiling with the GH compiler until MPW 3.2 shipped. So well into the 1990's.

        Never any problems with the GH's 68k codegen. I spent a lot of time in 68K asm. And disassembly. Can still write pages of 68K asm with little provocation. Can still look at 68k dissassem hex and feel completely at home.

        The GH compiler shipped with MPW was so good and predicable that I mostly stopped writing 68k asm and just wrote simple unrolled C which produced a final codegen output that was the same instruction count as safe asm written by hand. Stuff like replacing someones high maintenance asm code tailored to fit in 680x0 instruction caches with low maintenance C code that had same final instruction count and size.

        Still using the GH compiler years later to do quick turn arounds on Sega Genesis code, asm speed with C codebase, and it almost got used for a quick and dirty PamOS project in the late 1990's. Still have a copy somewhere.

        The only other compiler that came close was the Watcom C compiler. Good enough output to avoid the hell that is writing x86 asm.

  13. Arthur the cat Silver badge

    gcc and LLVM are good but …

    They are huge and take their time compiling, especially LLVM. The optimisation can be superb and is getting better all the time but it takes masses of time and space. Arguably 90% of optimisation can be done using just the techniques Frances Allen listed in her 1972 paper “A catalogue of optimizing transformations".

    1. fg_swe Silver badge

      tcc

      If you want a free and lighting fast C compiler, use tcc. It does not optimize, though.

      https://en.wikipedia.org/wiki/Tiny_C_Compiler

      1. Loyal Commenter Silver badge

        Re: tcc

        No problem, simply* write optimal code to start with.

        *"simply" here is a NP-hard problem.

        1. rafff

          "No problem, simply* write optimal code to start with."

          The way we used to when I were a lad back in the 60s and 70s. If you have a von Neuman architecture (not always true nowadays) then nearly all the possible optimisations can be done at source language (or IR) level. Even cache coherence can be handled largely (not totally) in source code. Register allocations/spillage is about all you need to worry about at the machine level.

          If the compiler has to work hard to optimise your program you probably chose a bad data structure or algorithm. Don't blame the compiler for your own poor work.

    2. Roland6 Silver badge

      Re: gcc and LLVM are good but …

      https://research.ibm.com/interactive/frances-allen/

      Another lady to add to ElReg's Geeks Guide to Women in Computing.

  14. Chris Gray 1
    Meh

    I prefer simple

    I can be called lots of things ( :-) ), but in terms of what I do, I'm a compiler writer. Ignoring my first efforts in a compiler course at University and another project I was hired and paid to do, all of my compilers have been straightforward monolithic programs. No passes, no phases. Read source code and emit optimized object files. (My current project is all one program, but does create a detailed internal representation of the program.)

    Makes for fast execution and minimal I/O - very important back in the days of floppy disks.

    Also makes for a minimum of total code involved, hence reducing the opportunity for bugs.

    Sure, if you have multiple phases/passes you can in theory manually examine the intermediate stuff, but do you really want to? It's a whole other language and set of conventions you now have to understand. Simple, specific test code can nearly always trigger any bugs you are looking for, and following the path of what's going on is far simpler in a monolithic compiler.

    As for some of the other points folks have raised...

    Yes, you can implement a language with more checking (array bounds, arithmetic overflows, ...) by emitting C source code. The checks are simply there in the source you emit. A good C compiler can likely optimize those into something closely resembling what a monolithic compiler would generate.

    Sorry to hear that both gcc and llvm are hard to work with. I vaguely recall friends trying to send me in the direction of using one of them, but I think now I'm thankfull that I didn't.

    Early compilers were often split up simply because of memory constraints. With the K & R compiler, one of the things they didn't have to deal with in the compiler proper was branch shortening. Most CPUs have branch instructions with both long and short offsets. You want to use the short-offset forms wherever possible, but its hard to know where the target of a forward branch is, because you haven't generated code for all the stuff between the branch and the target yet. The Unix PDP-11 assembler handled that for them. It also handled the details of emitting valid object files. It wasn't all about not dealing with the target instruction set.

    The code generator for X86-64 in my current project is still under 8000 lines of code. There are a few things left - currently I'm doing my 'bits' types. The CPU has single bit extract/insert, but nothing more, so I'll have to special case to use those instructions. I badly need at least a move/move optimizer, and doing that for this CPU will need an abstract representation of the instructions, unfortunately - the binary format is just too complex to do directly. The AMD CPU manuals are large and complete, but as in any such technical endeavour, you have to keep in mind stuff you read a few hundred pages back when figuring out exactly what can happen in encodings, semantics, etc. Sigh.

  15. nautica Silver badge
    Boffin

    THE secret as to why there are SO MANY useless computer "languages".

    "...My early work clearly treated modularisation as a design issue,‭ ‬not a language issue.‭ ‬A module was a work assignment, not a subroutine or other language element.‭ ‬Although some tools could make the job easier,‭ ‬no special tools were needed to use the principal,‭ ‬just discipline and skill.

    When language designers caught on to the idea,‭ ‬they assumed that modules had to be subroutines,‭ ‬or collections of subroutines,‭ ‬and introduced unreasonable restrictions on the design.‭ ‬They also spread the false impression that the important thing was to learn the language‭; ‬in truth,‭ ‬the important thing is‭ ‬to learn how to design and document.

    "We are still trying to undo the damage caused by the early treatment of modularity as a language issue and,‭ ‬sadly, we still try to do it BY INVENTING LANGUAGES AND TOOLS.‭"

    ‬--David L.‭ ‬Parnas

    1. fg_swe Silver badge

      Really ?

      Whatever nice design you have, you will have a bug then and now. If your language is not memory safe, chances are "good" you have an exploitable bug to be used by criminals and even higher powered adversaries.

  16. DS999 Silver badge

    It isn't C that was/is fragmented

    It was the layers over the top. I had some experience porting software between different RISC Unixes in the early/mid 90s, the kind of stuff that would bite you was mostly due to where the code was originally written. For example, code written for SunOS could be and therefore often was cavalier about memory allocation - its free would allow freeing a NULL pointer, and freeing an already freed pointer worked too. Those could wreak havoc on stricter systems (and while not the strictest, Solaris was strict enough that plenty of SunOS code required fixes before it would compile)

    Enabling the compiler switch on HP-UX's C compiler that caused references to zero page to fault (gcc didn't have that back then) would expose a lot of brokenness, and often fix real bugs that had been present in the code forever that couldn't be tracked down due to SunOS being so permissive with the crap you could get away with. Being allowed to dereference NULL pointers - and therefore hiding real bugs - was a problem everywhere, and should never have been allowed.

    There were subtle differences in stuff like network and other system libraries, different include files would include different things so you might need to add a few include files to get things to compile on one system. That doesn't even get into bigger differences more hardware based, like mmap implementations, or that were behind the times on some platforms like X Windows libraries. And just forget about getting something built for one windowing library ported to another, the best you could hope for was running it as a raw X Windows app.

    The actual implementations of C, even when using gcc on some platforms and the vendor C compiler on another, were way way way WAY down the list of issues you had to handle.

    1. fg_swe Silver badge

      In Other Words

      Real-world programs are full of memory management errors and you should better have a type system plus runtime checking, which will find these errors for the developer.

      valgrind and purify are only partial fixes, as they cannot find the problems which come from "creative input", as long as such input is not generated by an intelligent and well-resourced test engineer.

      Also, valgrind will slow down by a factor of 100 and therefore does not expose multithreading issues under high load.

      1. DS999 Silver badge

        Re: In Other Words

        You can easily argue there are better languages than C from an ease of producing correct / secure code standpoint, but any language that does its own memory management behind the scenes is unsuitable for systems programming, so C will always have a role even if most programmers start using stuff like Rust and Swift.

        1. fg_swe Silver badge

          Indeed

          C might be of use inside the kernel and drivers, but should be avoided for almost anything above the kernel. Even inside the kernel, the case for many memory safety features such as bounds checking can be made.

          1. DS999 Silver badge

            Re: Indeed

            No one is stopping you from writing C code that bounds checks. You don't think languages that provide automatic bounds checking do it without the exact same overhead explicitly doing it in C causes, do you? You can add those checks to C and code will perform just as well as in languages that implicitly do bounds checking. And writing the code for that takes almost no time at all compared to the overall coding task, because this is something you can do without thinking.

            The fact that people write code that doesn't bounds check is their failure, and the failure of those who approve their code, not the language. Just like no one can't blame the language if your assignment is to create a blue dialog box on screen with an "OK" in it and you produce a red dialog box with "Okay I guess" in it.

    2. Bitsminer Silver badge

      Re: It isn't C that was/is fragmented

      ...porting software between different RISC Unixes...

      In those 1980s/1990 days $WORK had a corporate database called Ingres. (It was a commercial predecessor to Postgres.)

      The vendor had major issues with reliability and competitiveness with Oracle. I believe this was because they had decided to support too many variants of Unix (DEC Ultrix, SunOS, HP UX, AIX, DG UX, and Windows and VMS too). Each had their own compiler variant with different library details, compiler switches and so on.

      Bugfixes and new features were severely constrained by the sum of all compiler bugs they had to deal with. Because they had so many platforms to support, a bugfix took "N" times longer than the competition.

      They knew this, and did a manage to complete a re-write. They had learned their lesson and recoded for the least common denominator of C. But it didn't save them because it came too late. They had been too reliant on the sum of all vendor C compilers, bugs, features and all.

      I consider it a counter-argument to those who complain about technology monocultures. Having either LLVM or GCC is nowadays a panacea compared to the incompatible diversity of the past.

      1. fg_swe Silver badge

        1990s Oracle

        You could do

        $ telnet oraserver 1521

        Then type random characters into the keyboard.

        The Oracle listener (Oracle 8 ?)would crash in a matter of less than a minute.

        That demonstrates how much quick+dirty IT has been just 20 years ago.

      2. Roland6 Silver badge

        Re: It isn't C that was/is fragmented

        >I believe this was because they had decided to support too many variants of Unix (DEC Ultrix, SunOS, HP UX, AIX, DG UX, and Windows and VMS too).

        In addition to bugs this caused another problem: position in the porting queue.

        I remember bidding with one vendor's Unix box and then switching to another due to slippage in the DBMS porting. This also meant platform vendors were caught in a difficult situation, particularly on big bids, where Ingres/Oracle etc. were necessary to win bids, vying with each other to encourage suppliers to give priority to their port, so that they could pass the demonstration stage.

        I suspect once you get outside the Wintel/Linux PC platform, these considerations still matter.

      3. Peter Gathercole Silver badge

        Re: It isn't C that was/is fragmented @Bitsminer

        Before Ingres was a commercial product, it was a free (within the confines of BSD license) database, initially written on UNIX at UC Berkeley in the 1970's. I was taught relational databases using it in 1979.

        The initial commercial Ingres was marketed by Relational Technology Inc. before rebranding to the Ingres Corporation, then bought by ASK and then Computer Associates, Actian and now appears to be owned by HLC. None of these had any involvement in Postgres.

        As far as I am aware, Postgres was a result of an academic project to take the original non-commercial Ingres to the next level, and was kept under the BSD license, allowing it to be used for free.

        1. Bitsminer Silver badge

          Re: It isn't C that was/is fragmented @Bitsminer

          All true. Thanks for the history reminder.

          Except.

          I was trying to keep my blood pressure regulated by not naming C***** A****** or their corporate history or the .....there goes my blood pressure.....I'll stop now.

          1. dinsdale54

            Re: It isn't C that was/is fragmented @Bitsminer

            LOL!

            A friend worked on databases at UCB, Ingres, ASK, Postgres anf others. He was at ASK when the takeover by CA was announced. More different corporate cultures you could not imagine. Apparently Oracle had a recruitment van parked outside within a couple of hours, Sybase a bit after that.

            By the end of the day over 90% of the development team had quit. That was the final nail in the coffin of Ingres as a leading product.

            I worked extensively on Oracle & Ingres back in the day. By the mid 90s Ingres was reliable and low maintenance but was looking VERY out of date by then.

            1. Bitsminer Silver badge

              Re: It isn't C that was/is fragmented @Bitsminer

              I remember reading and re-reading and re-reading the installation and configuration guides for both Oracle (delivered to customers) and Ingres (used in-house).

              Eventually I grokked how these software beasts worked.

              One day the Ingres remote support guys dialed in (9600 baud modems!), and even through such a filter, they remarked on how fast our Ingres installation was. Very snappy.

              Well, the log file was on a separate disk from the database files, and both were mirrored. On a VAX no less. Boss was impressed by the comments, from the vendor no less.

              It really was just "read the instructions".

              1. jake Silver badge

                Re: It isn't C that was/is fragmented @Bitsminer

                Not just separate disks, but separate spindles entirely.

                Works with low-end systems, too. Proper use of disk spindles and partitions eludes most people. For example, when I setup my one remaining Windows box to run ACad2K back in 2000 (yes, over 22 years ago), I did it like this:

                OS on controller1, spindle1, partition1 (with a bootable backup on partition2) ... Registry on controller1, spindle2, partition1 (with a rolling, usable backup on partition2) ... Swapfile and tempfiles (TEMP and TMP[0]) on controller2, spindle1, partitions 1 and 2 (WinSwap can also be used as a Linux swapfile, but that's another story) ... and last but not least, user data on controller 2, spindle 2, partition1 (odd day backups to partitions on the other three spindles, external backups on even days, off-site backup on Sunday).

                The OS isn't slowed down by the second drive (spindle) being accessed or written to for registry contents, and the swapfile and temp files are rarely called for by the OS at the same time. User data being on its own spindle just makes sense. The whole kludge separates the cluster-fuck that Windows insists on for its filesystem into four completely separate drives.

                It's ugly, but it works. My old installation of Win2K has never once crashed, lost data, or otherwise given me any file-system headaches in 20 years of near daily operation. (I've physically lost drives, but that's a hardware issue not a file system logic error ... and I've always been able to recover quickly with the above setup.)

                The old girl is airgapped, so fuhgeddaboudit.

                [0] On DOS one of the first things I did was set TMP=D:\TMP and TEMP to D:\TEMP ... most folks probably still don't know it, but Microsoft uses TEMP for user temporary files, and TMP for development temporary files. Pointing them at separate directories can save headaches occasionally. This includes "development" tools like Excel & etc. Try it, you might like it.

  17. DaemonProcess
    Trollface

    Groan

    The gnu version of yacc is bison - flawless open source pun.

  18. Confused of Tadley

    YALs to the right, YALs to the left

    Every time I hear someone has solved a problem by inventing a new language, I groan 'A YAL, why is the solution a YAL'

    Inventing yet another language may be fun, but by the nature of a language most will never be complete and bug-free entities, they merely drive the need for a new YAL. Surely all the innovation time needed for a YAL could better help the community by improving a class library or compiler for an existing language.

    In the past I wrote a lot of COBOL (and also a lot in other languages) and remain convinced that language syntax is next to irrelevant in deciding how useful a language is. The magic is all in the translation of the human-written code into machine instructions. If only the industry could get that done well people might stop feeling the need for a YAL for each new problem domain.

  19. Anonymous Coward
    Trollface

    You'd think

    With all the article's descriptions of wondrous compilers and the commenters descriptions of wondrous new languages and compilers, we wouldn't have so many wondrous new patches.

    1. jake Silver badge

      Re: You'd think

      If you're looking for a dearth of wonderful new patches, you'll have to start with releasing a plenitude of wonderful new programmers into the world.

      But we're no longer teaching that because programming should be easy and no programmer should be left behind. All Devs get participation trophies. Instead of yelling at devs for insisting on their right to fuck things up, they get unlimited do-overs. EVERY DEV IS EQUAL!!!!

      How about a nice round of Kumbaya? Fitting music, as we all converge into a large, bottomless pool consisting of the grey goo of sameness.

  20. Blackjack Silver badge

    Compilers are a blessing, manually checking your code is s huge chore and can be an outright nightmare if the thing is more that a thousand lines of code.

  21. Henry Wertz 1 Gold badge

    ROCm

    Yup, AMD ROCm for one takes full advantage of LLVM. From the user perspective, you have AMD ROCm (an ATI-developed programming system for running data processing on card) that I think is "deprecated" but their own utilities some use it as well as any end-user code written for it still working; a CUDA frontend, so CUDA code should run with it (recognizing how much code is for CUDA that was not going to be available for their cards since nobody was going to port it to ROCm); and a (at least partial) openCL frontend. But all three feed into LLVM so the user doesn't need to care. Good thing too...

    Based on the internals, it appears it uses LLVM to turn into intermediate form, and then a seperate backend for each and every card it's going to spit out code for. The chips within a series are generally similar, but do have the ability to handle more or fewer pieces of data per instruction, apparently with specific code (i.e. the same code won't run on the one that does data 64 at a time versus the one that does 128 at a time). Some series appear to use completely different architectures, like there's mentions of VLIW (Very Long Instruction Word) instructions internally for one series while the next series doesn't mention VLIW at all. I.e. they appear to be, at will, using completely different instruction sets internally on their cards as they see fit, and let LLVM take care of it.

    Mesa does this too, the shader compilers now use LLVM to optimize things before they spit out bytecode for whatever intel, nvidia, or ati/amd card (or I suppose ARM Mali or whatever else) the shaders are going to.

  22. Anonymous Coward
    Anonymous Coward

    Agitation seems mostly to miss the point!

    So....commentards here seem to get agitated about C:

    (1) my version of C was much better than the one you mentioned

    (2) K&R C is so 1970's......much better stuff came afterwards

    (3) UNIX is so 1970's......much better stuff came afterwards

    Well....each of these types of comment might actually be true.....

    .....and also quite beside the point!!

    EXAMPLE ONE

    In my limited (but recent) experience I've been writing processing-intensive applications for my own use.

    Often I've found that Python3 is a much better place to try out various algorithms (simpler code, easier to modify and improve).....

    .....followed by a re-implmenetation in C (modelled on the Python prototype), where the C re-write can be as much as 20 times faster.

    Not only that....but having a way of validating an algorithm in a simpler language makes writing the C version so much easier!

    EXAMPLE 2

    Where parsing is needed, another two step process has proved to be helpful.

    Write a yacc (or bison) description of the proposed parsing needed (but with no "action code" specified).

    Rework the description until there are no conflicts (or at least no reduce/reduce conflicts).

    Then write the final parser in C, structured per the language description in yacc (or bison).

    Once again, having a way of validating an algorithm in a simpler language makes writing the C version so much easier!

    In this case, you also avoid the maintenance effort with a traditional use of yacc, namely having to deal with both

    yacc AND C.

    The point is simple: C is FAST in execution, but it may not be the best place to START implementing a design!

    1. Roland6 Silver badge

      Re: Agitation seems mostly to miss the point!

      >The point is simple: C is FAST in execution, but it may not be the best place to START implementing a design!

      That applies to all programming, hence why professional software development organisations will have adopted a Structured Design Methodology and toolset.

  23. Anonymous Coward
    Alien

    If your code is limited by its OS interactions, you should probably go write a kernel.

    So: if language has poor impedance-match with interfaces designed for C then you should either not write programs which actually use these interfaces in language, means that programs in language can not really do anything and whole thing is just is academic toy. Or you should write an OS for your language which is great deal of work and will almost certainly see no use and die.

    In other words: don't ever consider languages which are poor impedance-matches for C: what a stupid, stupid thing to say.

    Instead perhaps existing operating systems should grow interfaces which are well-specified without assuming C? Perhaps that is option? Gosh.

    1. Roland6 Silver badge

      >if language has poor impedance-match with interfaces designed for C then you should either not write programs which actually use these interfaces in language

      This was a problem, and probably still is a problem when combining code from different languages. Eg. COBOL, Fortran and C. I suspect combining Rust and Swift with C, given their common form doesn't present the same challenges.

  24. Anonymous Coward
    Anonymous Coward

    compilers are part of software supply chain

    There's something missing from this. No article talking about the future of compilers can afford to ignore security and auditability of the supply chain.

  25. Steve Channell
    Thumb Down

    Is El-Reg going down-market?

    "any fool" can barely understand the rules of grammar and operator precedence, writing an unambiguous language grammar is not trivial (even with ANTLR).

    'C' is a lower-level language than {Fortran, Cobol, Algol, PL1, etc} because it has increment/decrement operators and pointer arithmetic - that does not make it a glorified macro assembler.. it might be a punchy line for an opinion piece for a tabloid, but has as little relevance to 'C' programming as it does to macro assemblers.

    The case again 'C' is the lack of intrinsic bounds checking and automatic conversion, the case against C++ is complexity: ABI compatibility is a facet of its age and backward compatibility (e.g. name mangling) not some fundamental brain-fuzz- Rust would have the same issues if it was as old.

  26. Paul Hovnanian Silver badge
    Boffin

    "Not a language" debate

    If it doesn't allow multiple line statements with an explicit statement delimiter, it's not a language.

    1. Roland6 Silver badge

      Re: "Not a language" debate

      So COBOL and Fortran only became languages when they allowed the use of continuation punched cards?

    2. Loyal Commenter Silver badge

      Re: "Not a language" debate

      If it's Turing-complete, it's a language. If it's not, it might still be, if it can be made to do useful work. Move on.

  27. Anonymous Coward
    Anonymous Coward

    compilers are part of software supply chain

    No article talking about the future of compilers can afford to ignore security and auditability of the supply chain

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like