back to article Any fool can write a language: It takes compilers to save the world

Here's a recipe for happiness. Don't get overexcited by the latest "C is not a language" kerfuffle. Proper coders have known since its inception that C is as much a glorified library of assembler macros as anything else. Don't sweat it. That business with operating systems being infected by their old C genes, crippling all the …

Page:

  1. fg_swe Silver badge

    Too Scary, Too Complicated

    As with every nontrivial technical endeavor, one should apply the KISS principle. Demonstrate your innovative idea in a way that is as straightforward as possible. You can always perfect it later.

    What does this mean ?

    Don't use the gcc infrastructure, instead generate C or C++ code and let existing compilers do the optimization and architecture-specific work. Debugging a compiler that emits C++ is way much easier than debugging ASTs and machine code. As an added benefit, your compiler will be able to run on almost any CPU, because almost any CPU has a C++ compiler.

    This is what I did here

    http://sappeur.ddnss.de/

    The innovative aspect is that this language is a memory safe C++ variant. All the goodies of C++ plus single- and multi-threaded memory safety, due to a type system which "knows" of single- and multi-threaded sections of the program.

    There exist Eiffel compilers which use a similar concept of emitting memory-safe C.

    1. fg_swe Silver badge

      GCC vs C++ / ELBRUS

      With the mentioned approach; the compiler can also run on ELBRUS, which does not have a gcc compiler. In fact, ELBRUS has a secret instruction set, but that does not stop me generating Sappeur programs for this CPU.

    2. TeeCee Gold badge
      Facepalm

      Re: Too Scary, Too Complicated

      So.

      You took an article based on "It doesn't matter which language you use, compiler technology is where it's at", with sound reasons why so. You turned that into an advert for your favourite niche language and how you never think about compilers anyway.

      Just one question: Exactly how far above your head was the entire point of the sodding article when it passed you by?

      1. Anonymous Coward
        Anonymous Coward

        Re: Too Scary, Too Complicated

        I haven't worked out if it's the same language that was last updated on Sourceforge ten years ago, or a later version, or a completely different language, but the website has a commercial licence for it, so all these messages could be classified as spam.

        1. fg_swe Silver badge

          So ?

          I am not interested in "own nothing and be happy".

          1. Anonymous Coward
            Anonymous Coward

            Re: So ?

            As it is a commercial product I'm sure El Reg would be happy to agree terms for advertising or a sponsored article, because judging the downvotes I'm not the only one who finds every C/C++ article getting derailed by tenuous segues into this language is spoiling their enjoyment of this esteemed organ.

    3. Cederic Silver badge

      Re: Too Scary, Too Complicated

      Take your nice clean code from your language of choice and translate it into an archaic language just so that you can then compile that other language?

      I fail to see how this adds value to the process. The intermediary mangling must to benefit from the strengths of the intermediary language perform a level of optimisation and restructuring of your code.

      In other words, be a compiler. You're recommending we use a compiler so that we can use a compiler.

      It feels rather simpler to me to follow the rather more traditional approach, which lets you more quickly and easily benefit from existing compiler technology.

      1. fg_swe Silver badge

        Re: Too Scary, Too Complicated

        Sappeur is adding value by making your C++ code memory safe. Including multi-threaded code !

        And it is indeed a compiler which performs type checking and has a very special type system. It just does not have an optimizer and a machine code generator of its own.

        Compared to Java, Sappeur programs are lean and efficient. For example, you can write small Unix command line utilities with Sappeur. You cannot do this in Java, because of the excessive startup time of Java programs.

        1. Cederic Silver badge

          Re: Too Scary, Too Complicated

          Now you've completely lost me. Random tool makes code I haven't written memory safe by translating my memory safe language into a language renowned for memory management bugs?

          No.

          1. Yet Another Anonymous coward Silver badge

            Re: Too Scary, Too Complicated

            Your language is only memory safe because it checks for access

            If your random tool translates that into C with checks for access it then compiles this into machine code with checks for access

            1. Loyal Commenter Silver badge
              Facepalm

              Re: Too Scary, Too Complicated

              If you take code that might or might not be thread safe, and translate it into another language, with added memory-safety checks around everything that might or might not have needed them, you have achieved exactly three things:

              1) Given the compiler's optimiser a hell of a lot of work to do, because that translation is sure to have incurred costs. Hopefully though, the compiler is going to be smart enough to undo most of the damage in the optimisation phase and fold away unnecessary loops and so on.

              2) Made your code unmaintainable, because I'll bet that even if it has carried comments through, they're now either meaningless, or misleading.

              3) Made your code non-performant. Thread safety checks where they are not needed are going to be achieved by some sort of locking, because sooner or later, all multithreading comes down to locking, and exclusive code paths. The art of writing multithreaded code that is both performant and thread-safe comes from a hell of a lot of detailed knowledge and experience, not from running your code through a magic tool. The only way such a tool is going to achieve this is either by locking stuff up everywhere to enforce critical paths, or by actually understanding what you meant your code to do. Since the latter implies that you have managed to single-handedly solve the "hard AI problem", I'm going to go ahead and assume the former. Ever heard of deadlocks? Because I'm willing to bet anyone using your "tool" will quickly enough.

          2. fg_swe Silver badge

            Explanation

            The Sappeur language defines a type system which enforces memory safety. The compiler parses the Sappeur code and checks the rules of the type system. If all is good, then memory-safe C++ code will be generated. Just try it out and see for yourself.

    4. drankinatty

      Re: Too Scary, Too Complicated

      "Don't use the gcc infrastructure, instead generate C or C++ code and let existing compilers do the optimization and architecture-specific work."

      That is a novel concept, but how is that different from simply adding one additional layer of abstraction to the compiler process? Where existing C/C++ compilers translate what you have written into assembly, you want an additional layer that translates from whatever language you have to C++.

      I see the benefit, you buy yourself portability in the compilation process, but then wouldn't the additional layer make optimization less reliable depending on the choices made by the new compiler's translations of language "A" in to C++?

      That's a tough one, the more abstract C++ becomes in C++20/23, writing the set of choices that would need to be made to make the translation optimize well becomes harder and harder as well.

      1. fg_swe Silver badge

        Sappeur vs C++

        Conceptually, Sappeur is very similar to C++, except for the unsafe stuff such as unsafe casts, pointer arithmetic and so on. The good things such as stack allocation, RAII, destructors, ARC smart pointers, complex aggregate data structures are preserved.

        Transforming Sappeur code to C++ is straightforward.

        Please see page 7 of the following for an example:

        http://sappeur.ddnss.de/SAPPEUR.pdf

        1. Anonymous Coward
          Terminator

          Re: Sappeur vs C++

          Conceptually, Sappeur is very similar to C++

          So what you mean is 'my language which is very much like C++ can be compiled well to C'. Gosh, that is really big surprise. To no one.

          1. fg_swe Silver badge

            No

            Sappeur programs are transformed into *memory safe* C++.

            It's a simple predecessor of Rust, actually.

      2. Anonymous Coward
        Alien

        Re: Too Scary, Too Complicated

        That is a novel concept

        It is very not novel. Kyoto Common Lisp compiled to C in 1984, 38 years ago. Certainly it was not the first language implementation to do this. Descendants of KCL still today exist: at least GCL and ECL. Is significant that they have not driven out or even made any real impact on implementations with native compilers which perform much better.

        1. Someone Else Silver badge
          Holmes

          Re: Too Scary, Too Complicated

          Originally, C++ compiled to C; it was called Cfront, appeared in 1983 (from predecessor work that started in 1979), and you can read about it in Wikipedia here.

          1. Mage Silver badge
            Coat

            Re: Originally, C++ compiled to C

            To give fast access to the C++ language in the 1980s. Glockenspiel C++ used back-end C compilers, such as MS C. But it was only a stepping stone.

            The other approach was used in UCSD p-system. An ideal instruction set for the language and then a Virtual Machine to execute the p-code. Later used for Visual Basic, Java, MS J++ which became C# and also Android's VM, which is using source like Java. Maybe some versions of Forth too.

            GCC is a good idea. Having a Pre-processor that emits C, C++, Java or Javascript is fine to prototype and test a new language before porting it to either GCC or a compiler for a VM.

            JAL and some BASIC for PIC micro use an intermediate code like the idea of p-code, but a final pass creates the machine code. A little like GCC.

      3. Marco van de Voort

        Re: Too Scary, Too Complicated

        There are more issues, like e.g. anything that defies the procedural regime (e.g. home made exceptions, advanced forms of nested functions (e.g. with displays)) etc etc is hard.

        Own debug code generation (and an own debugger to interpret them): hard.

        The problem is that if you chose the transpiler way, it forever will remain bound to transpiler (read target language) limitations.

    5. Peter D

      Re: Too Scary, Too Complicated

      What you are describing is the equivalent of old fashioned cfront. When C++ came out there were no native compilers so cfront emitted C and the C was compiled. It was an unholy mess. For example, when templates came out cfront would create a .pty directory and emit often hundreds of C source file to speculatively represent each specialisation possible of the parameterised types. LLVM is a vast improvement on that approach.

    6. JulieM Silver badge

      Re: Too Scary, Too Complicated

      I can only seem to find pre-compiled versions there.

      1. fg_swe Silver badge

        Yes

        I am not giving away the source, as I no longer believe in giving away work for free.

        1. JulieM Silver badge

          Re: Yes

          Fair enough. There are plenty of other people out there who do want my business.

        2. Loyal Commenter Silver badge

          Re: Yes

          Just in free advertising, eh?

  2. fg_swe Silver badge

    KISS 2: Generic Code

    Some people say you need a dedicated generics mechanism, similar to C++ templates.

    This is not true. A proper macro processor such as m4 can (for all practical purposes) do the same. Debugging is much easier, as the "instantiated" code can/will exist in a file to be inspected with a standard editor. No insane, cryptic C++ template error messages.

    Not my invention, saw it when I worked for D'Assault on CATIA.

    1. Arthur the cat Silver badge

      Re: KISS 2: Generic Code

      To mangle an old trope:

      You have a problem. You decide to use m4. You now have two problems and a maintenance nightmare that will prematurely age you.

      [Speaking from experience. You may know what it's doing when you write it but the moment someone else touches it (including you 6 months later) you're in for a world of pain.]

      1. fg_swe Silver badge

        Re: KISS 2: Generic Code

        m4 worked nicely for me, but I never dared to use all of its features.

        1. bombastic bob Silver badge
          Meh

          Re: KISS 2: Generic Code

          M4 - I just deal with it for autoconf (etc.) and that's about it

          Let's not complicate things TOO much ok?

        2. Alan Brown Silver badge

          Re: KISS 2: Generic Code

          This is exactly the problem with m4, you dare not use all its "features" - doing so is very likely to unleash an eldritch horror

          But what do I know? I always handcoded my sendmail.cf, because I could and m4 usually didn't do what I wanted

          1. jake Silver badge

            Re: KISS 2: Generic Code

            Little known fact: Sendmail's configuration language is Turing Complete.

            So naturally, I had to write a C compiler in it, just to prove to myself that I could.

            It's slow, ugly, cantankerous, and has few polished edges ... but it works.

            I'm sure you'll pardon me for never updating it past C89 ...

            1. Arthur the cat Silver badge
              Pint

              Re: KISS 2: Generic Code

              So naturally, I had to write a C compiler in it, just to prove to myself that I could.

              OK, that's perversion above and beyond the call of insanity(*). A major tip of the hat and enough alcohol so you never do it again.

              (*) I've only written a calculator in it. I'm bloody jealous.

            2. JulieM Silver badge

              Re: KISS 2: Generic Code

              It's my conjecture that any sufficiently-sophisticated IT project ends up implementing a Turing-complete interpreter. (PHP is an example of one such that escaped and became feral.)

              One day I'll write a SPICE deck for that microprocessor I designed, and see if I can implement an interpreter on it .....

            3. John Smith 19 Gold badge
              Coat

              "Sendmail's configuration language is Turing Complete."

              Wow.

              It's like "Free development language with every utility. "

    2. Mage Silver badge
      Devil

      Re: KISS 2: Generic Code

      Macros are evil. Unexpected side effects. Great for Assemblers, daft for a new high level language.

  3. Marco van de Voort

    I miss a critical note and some figures.

    I totally miss any form of criticism i the article. It is just a article about baseless glory story. No analysis of how it actually works out for non corporate 3rd party users. No numbers, not even names of successful independent frontends and how they do/did it. It could have been an old glory story from the days of the GIMPLE introduction that got polished up a little.

    I agree with fg_swe, actually in recent years we saw a decline in attempts to use the backends directly, and go to last resort C backend (which often means giving up the hope on a speedy compiler-run-debug cycle). Only corporates with vast manpower can afford to tangle with these beasts.

    On GCC the major version transitions are notorious and break your frontend all the time, and you can only get an old version into a Linux distro for so long. LLVM has so much undocumented behaviour (other than look how C/C++ frontend does it, that you don't even get that far. Both teams are notorious for ignoring bug reports and merge requests for issues that don't touch the dominant frontend(s) and/or their corporate sponsors, even for multiple major cycles. So basically if you run into a problem you are fscked.

    Performance issues are often papered over with parallel compilation that is also not that easy with a new frontend.

    1. Alan Brown Silver badge

      Re: I miss a critical note and some figures.

      Speaking from experience with the commercial rivals for GCC/LLVM, they aren't much better and come with an added dash of long-term abandonment

      1. Marco van de Voort

        Re: I miss a critical note and some figures.

        Just curious: which ones allow direct access to their internal structure ? (direct rather than the source way)

    2. Torben Mogensen

      Re: I miss a critical note and some figures.

      Web Assembly (WASM) is better defined than GCC and LLVM (it even has a formal specification), and it is simpler too, so it makes an easier target language than GCC or LLVM intermediate representations. Unfortunately, WASM is intended for running in browsers, so it is sandboxed and requires an external program to handle i/o and everything else you would normally get from an OS, so it is not suited for all purposes. Also, the existing implementations of WASM are not super fast.

      Speaking browsers, several languages use JavaScript as target language, allowing them to run in browsers. So much effort has been put into making the travesty called JavaScript run fast in browsers, that these languages can have competitive performance. But since JavaScript is very ill defined, this is not a good solution.

      The article also fails to mention JVM and .NET as compiler targets. While farther from silicon than both the GCC and LLVM intermediate languages, they can provide decent performance and (more importantly) access to large libraries (which, unfortunately, in both cases require you to support the object models of Java and C#, respectively).

      And while I agree that you need parallelism for compute-intensive applications, not all applications are that, so there is still plenty of room for languages and compilers that do not support parallelism.

  4. Pascal Monett Silver badge
    Trollface

    "Any fool can write a language"

    Many have.

    1. fg_swe Silver badge

      Indeed

      We should have never touched fire and continuing to live on the trees would be like a paradise. Except when the yellow-black devil cat shows up...

    2. Version 1.0 Silver badge
      Happy

      Re: "Any fool can write a language"

      I started using assembler and then moved to FORTRAN in a new job, and since then I've had tasks that have involved writing in many new languages to fix problems for years now. Essentially when a new language appears, if I have to use it then I will ... I've always laughed at the old joke; "A FORTRAN programmer can write FORTRAN programs in any language" but LOL, it's a fact.

      1. Arthur the cat Silver badge

        Re: "Any fool can write a language"

        I've always laughed at the old joke; "A FORTRAN programmer can write FORTRAN programs in any language" but LOL, it's a fact.

        Oh god yes. I've seen FORTRAN in Lisp:

        (BEGIN

        (SETQ X ...)

        (SETQ Y ...)

        ...

        (RETURN ...))

        [And why does code markup not keep indents but add blank lines?]

        1. Someone Else Silver badge

          Re: "Any fool can write a language"

          [And why does code markup not keep indents but add blank lines?]

          This.

      2. Stumpy

        Re: "Any fool can write a language"

        In a similar vein, I once had to unpick a C program written by an ex-COBOL programmer.

        Christ, that was a nightmare. Around 1500 lines of COBOL-like C ... all in a single main() function.

        1. Alan Brown Silver badge

          Re: "Any fool can write a language"

          Trust me, it's easier to debug C written by a COBOL prgrammer than the inverse....

          1. jake Silver badge

            Re: "Any fool can write a language"

            Only because most COBOL programmers have been coding a lot longer than most C programmers. One tends to learn some things along the way.

            This used to be true ... these days, I'm not so sure anymore.

    3. Ozan

      Re: "Any fool can write a language"

      I am one of many. I should not be allowed to touch languages.

      1. jake Silver badge
        Pint

        Re: "Any fool can write a language"

        Don't be silly. Of course you should. Look at how much you learned in the attempt!

      2. John Smith 19 Gold badge
        Coat

        "I am one of many. I should not be allowed to touch languages."

        A person who knows the limits of their own competence.

        This is very wise, and very rare.

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like