back to article In Rust we trust: Shoring up Apache, ISRG ditches C, turns to wunderkind lang for new TLS crypto module

At almost 26 years old, the Apache HTTP Server, known as httpd, has a memory problem: it is written in C, a language known among other things for its lack of memory safety. C requires programmers to pretty much manage computer memory themselves, which they don't always do very well. And poor memory management can lead to …

  1. Tom 7

    So they going to write a Rust asp module for Apache too?

    Just wonderin.

  2. Pascal Monett Silver badge

    I knew it

    "We currently live in a world where deploying a few million lines of C code on a network edge to handle requests is standard practice, despite all of the evidence we have that such behavior is unsafe "

    I knew it. This is hacker heaven waiting to happen.

    1. Version 1.0 Silver badge
      Joke

      FTFY

      "We currently live in a world where deploying a few million lines of code on a network to handle requests is standard practice, despite all of the evidence we have that such behavior is unsafe "

      1. bombastic bob Silver badge
        Pint

        Re: FTFY

        nice use of the joke icon to make a point. I wholeheartedly agree!

        /me points out that since that "few million lines of lines of code" is relatively STABLE and WELL TESTED by time, there's really no need to play the "Arthur C. Clarke's 'Superiority'" gambit only to end up the loser and ALSO not knowing how it happened...

        1. Snake Silver badge

          Re: FTFY

          "STABLE and WELL TESTED"

          Ah. You mean, like the code found in hundreds of modules, for example Libcrypt just this week??

          What exactly *is* "stable and WELL TESTED"? Does "Well tested" equate to "No security holes found. Yet"?

          So, when will we learn that just because it's old and known, doesn't mean it's safe in today's modern world???

          1. Roland6 Silver badge

            Re: FTFY

            >So, when will we learn that just because it's old and known, doesn't mean it's safe in today's modern world???

            Be interesting to see what they say about code written in Rust in 20 years...

            1. Anonymous Coward
              Anonymous Coward

              Re: FTFY

              Except for the fact that :

              1. Rust's memory safety has been proven (https://plv.mpi-sws.org/rustbelt/) (technically a large subset, but big enough to guarantee safety for most use cases)

              2. A car that is 20 years old and respects the safety standards of 20 years ago is obviously better than a 40 year old car, made back when there wasn't even really any safety standard. So even if, in 20 years, we discovered a critical flaw in Rust, better already have a lot of (security) ground covered and have to transition from Rust to a 20-years-in-the-future language rather than from a language that will be almost 70 years old in that hypothetical future 20 years from now (and we all know how we treat Algol... although tbf that's for different reasons)

              1. Version 1.0 Silver badge

                Re: FTFY

                I agree, but this means that new code is going to be written - airplanes have been flying safely for about 80 years now, but building a new 737 max illustrates that moving a design into a new area can have issues. Sure, Rust is safe but it's new code, that doesn't mean that there are will be no issues.

                1. StrangerHereMyself Silver badge

                  Re: FTFY

                  The 737 MAX become unsafe BECAUSE they added a computer to it.

                  The code was rubbish too, and so was the design. Written by Indian programmers who got $5 an hour IIRC.

                  1. Anonymous Coward
                    Anonymous Coward

                    Re: FTFY

                    The MAX became lethal because of the computer they added to it. But that wasn't the programmers' fault.

                    Apparently the developers contracted by Boeing did question the design of the system that Boeing had sent them. They were so concerned in fact that they queried it with Boeing, "Are you really sure?" Boeing replied, more or less, "Of course we are you idiots, get on with it". They took care to preserve this email exchange, possibly the most valuable piece of email archival work ever done.

                    The code was actually perfectly OK in the sense that it met the design spec. Boeing screwed up the overall system design and couldn't shift the blame.

                    Another aspect to consider is the safety of the 737 in general. Whilst aside from the MAX it's got a fairly good record, it's a 1960s fuselage design. They're not as crash worthy as a more modern design, the regulatory standards having been updated a couple of times since. 737s will fall to pieces in situations where more modern designs will remain intact. The BA 777 that landed short and hard at Heathrow was remarkable in the damage it took but, staying intact, resulted in I think nothing worse than a single broken leg. A 737 in the same situation would likely have come to pieces, causing a lot of deaths.

                    There's also been other 737 crashes that were blamed on pilots at the time, but are being reconsidered now that we know more about Boeing' s approach to engineering and its capturing of regulatory processes.

                    Boeing have been desperate to avoid making changes to the 737 that means it's diverged too far from the original design because then they'd have been forced to recertify it from scratch, meet the latest crash worthiness standards. That in turn would force a total redesign and also full conversion training for the pilots. That training is very expensive and makes it unappealing to existing 737 operators, who may buy Airbus instead.

              2. Anonymous Coward
                Anonymous Coward

                Re: FTFY

                1. Rust's memory safety has been proven (https://plv.mpi-sws.org/rustbelt/) (technically a large subset, but big enough to guarantee safety for most use cases)

                So the safety guarantee have corroded by the end of the sentence.

                1. Anonymous Coward
                  Anonymous Coward

                  Re: FTFY

                  Did you even read the abstract or the conclusion of the paper ?

                  Sure, it'd not the *entirety* of the language, but again a very large subset of it is completely safe, including `unsafe` blocks (because the safe parts are already, you know, safe). There's always ways to completely mess up memory in almost every (general-purpose/non-esoteric) languages, there's just some where it's really easy whereas others are wayyyy harder and almost require intentional sabotage

          2. needmorehare
            Joke

            I'm sick of this...

            Let's just port FreeDOS to Xen and embrace a world of single-process VMs load-balanced by Cloudflare!

        2. dajames

          STABLE and WELL TESTED

          It's a dilemma that we all face. We have a piece of code that was written years ago when software development tended to be a less rigorous process than it can be (but usually isn't) today. The code mostly works and doesn't normally fall over, but it can misbehave in extreme circumstances (such as when it is given deliberately incorrect inputs).

          Do we patch it, and continue to benefit from the large body of mostly-good code but also continue to live with the possibility that it may contain still more flaky parts ... or do we take a chance on rewriting it from scratch which will enable us to benefit from modern programming languages and techniques but possibly take a lot of iterations to get right (particularly if the original software was not well documented and not written to a well-defined set of interfaces)?

          The green-field approach is seductive, but to adopt it courageous indeed.

      2. Adam Inistrator

        Re: FTFY

        cue up the hilarious youtube rust satire "ex rust dev interview leaked"

  3. martinusher Silver badge

    But///but...this is routine programming

    Don't blame the language for the failings of programmers. Memory pool manangement is an important part of programming service libraries and operating system components. 'C' itself does allow you to take liberties with allocation and freeing but only if you don't test your code with a wrapper that provides diagnostic and statistical information, the implication being that the code that uses the basic functions has either been extensively tested or is so simple that its really can't fail.

    My guess as to what's gone wrong isn't 'malloc' as such but 'new'. This can create quite large objects either from pool memory or on the stack and most programmers are so schooled in 'I've got a huge computer' that they rarely stop to think about the impact they're having on the system. Since these objects are often shared between threads and they're invariably part of a chain (linked list) there's plenty of scope for an error to cause the entire edifice to come crashing down. Rewriting the component in Rust will probably help but its probably the 'rewrite' bit that will make the difference rather than the 'Rust'.

    1. guyr

      Re: But///but...this is routine programming

      "My guess as to what's gone wrong isn't 'malloc' as such but 'new'. This can create quite large objects either from pool memory or on the stack"

      "new" does not allocate objects on the stack. But the larger point is that while I haven't looked at the httpd code recently, most code written in the last 10-15 years doesn't use "new" directly, but instead relies on libraries to manage dynamically allocated memory. The problem of memory not being freed has been well-known for decades, so projects of any significant size use libraries to ensure that doesn't happen.

      1. Steve Channell
        Flame

        slow news day?

        they're porting a Mozilla TLS library to the Apache httpd mod api and exporting a C api. The fact that the Mozilla library is implemented using Rust seems pretty incidental.

        "In Rust we trust" is catchy, but given that all shared buffer management will continue to be done in C it's not justified. Ironically C++ shared_ptr<> and other smart pointers can't be used with this mod without compiling httpd using CLang

    2. Anonymous Coward
      Anonymous Coward

      Re: But///but...this is routine programming

      only if you don't test your code with a wrapper that provides diagnostic and statistical information,

      Yes, the languages that are touted as protecting lazy programmers from themselves have this built-in so it runs all the time, with the corresponding performance hit. Other languages make it optional so that it can be used when testing, but isn't there when not needed.

      Even back in the 1970s you could compile FORTRAN programs with array bounds checking during the test/debug cycle.

      1. Phil Lord

        Re: But///but...this is routine programming

        Rust does neither. You can access arrays either safely (with bounds checking) or unsafely (without). The unsafe portions are marked syntactically and semantically and clearly identifiable as such.

        In actual use it favours richly functional iterators. This fulfil 95% of the use cases where C would use array access. This high level functionality is bounds safe. The underlying low level code uses non bounds checked array access, and so is unsafe, as a result of which has been heavily checked and audited.

        Result: Most rust code cannot array out of bounds, and this is achieved without runtime bounds checking. It's a key example of a Rust high-level, zero cost abstraction.

    3. DrXym

      Re: But///but...this is routine programming

      "Don't blame the language for the failings of programmers."

      Yes, actually blame the language. Programmers ARE fallible and if the language allows them to make stupid mistakes then they WILL make stupid mistakes. Those mistakes will make it into production and cause bugs, crashes and exploits.

      Even if you mitigate for all the things you can do in C (or C++) but shouldn't do with training, coding guidelines, code reviews, you will STILL have bugs caused by the language.

      " its probably the 'rewrite' bit that will make the difference rather than the 'Rust'."

      If you're going to rewrite then why would you choose a language that opens you up to all kinds of bugs and flaws that have nothing to do with the problem you're trying to solve?

      1. Anonymous Coward
        Anonymous Coward

        Re: But///but...this is routine programming

        "Yes, actually blame the language." - I guess you've never written anything in assembler?

        1. DrXym

          Re: But///but...this is routine programming

          Yes I have - Zilog Z80A & Motorola 68000 assembly language a very long time ago. C is a step up but that doesn't mean I'm some kind of technological Amish who draws a line and says no further if there is a better way to do something.

    4. AVee

      Re: But///but...this is routine programming

      Don't blame the language for the failings of programmers.

      Don't blame the car for the failings of the driver! Ban airbags and seat-belts!

    5. The Man Who Fell To Earth Silver badge
      Boffin

      Re: But///but...this is routine programming

      While I agree that programmers are responsible for writing safe code, if the compiler can add guard rails that will force the writing of safer code (or automatically add safety overhead checks to the binary) to get through compilation, that should be taken advantage of whoever possible.

    6. AndrueC Silver badge
      Boffin

      Re: But///but...this is routine programming

      What's wrong is

      1) Programmers not using new/dispose when it's available to them. You ought to write your own wrappers for C).

      2) C++ programmers defaulting to the heap for storage instead of the stack, statics or fields. The heap should be your storage of last resort.

      3) C++ programmers not using RAII and smart pointers for those situations where the heap is the only reasonable choice.

      4) Not enough use being made of copy ctors (especially private ones).

      5) Not enough use being made of references, especially const references.

      6) Not enough use being made of const full stop.

      7) Programmers using a language that requires them to choose whether or not to follow the above rules (and others) in order to write safe code.

      I loved programming in C++ but it requires too much knowledge and too much care and attention. I prefer C# these days. That too has its issues but at least you have to try (or be deliberately stupid) to write dangerous code.

    7. Anonymous Coward
      Anonymous Coward

      Re: But///but...this is routine programming

      operator new is from C++ not C

      // these obtain memory from global_arena_allocate()

      // 1) called as new Arena

      static void* operator new(size_t nbytes) throw();

      // 2) ? - returns memory unmodified as everyone seems too

      static void* operator new(size_t nbytes, void* locality_hint) throw();

      // 3) called as new Arena[count]

      static void* operator new[](size_t count, size_t nbytes) throw();

      The rest of what you wrote is equally incorrect.

      Using placement new or an explicit allocator it is indeed possible to use stack memory, but by default operator new in C++ will back onto malloc from the platform libc.

      The rewrite hopefully won't be undertaken by people who cannot understand that threads don't share stacks, a thread is little more than an independant set of registers and it's own stack.

      TL;DR C and C++ are not the same language. You don't have "memory pools" in C or in C++.

      You have memory raw untyped memory of given alignment and size.

      But Rust is the issue, not the howlers "informing" the problem with <strikethrough>C</strikethrough> pesudocode with curly braces.

  4. jvf

    Real problem mentioned first

    “C requires programmers to pretty much manage computer memory themselves, which they don't always do very well. “ So, what you’re really saying is that it takes skill to write code. Twentysomethings and script kiddies please exit left.

    1. Mark #255
      Coat

      Re: Real problem mentioned first

      A bad worker blames their tools.

      A good worker decides that

      (a) their hammer has a loose handle, and needs throwing away; and

      (b) the correct tool to use is actually a screwdriver

      My coat is the one with the metaphor-torturer in the pocket.

      1. John Miles

        Re: Real problem mentioned first

        I think a better anaology is

        A good worker realises a sharp knife is dangerous so extreme care is needed but they also know that trying to maintain the level of vigilence required for long periods/large projects is hard/impossible so start looking whether there is a more appropriate tool for the job at hand. Once found they will put the knife back in draw for times they need the fine cutting ability not possible with the tools at hand (which will likely reduce as they learn new tool). They also understand the fancy new tool isn't fullproof, it just stops you cutting yourself on the blade but doesn't stop you dropping it on your foot.

        1. Roland6 Silver badge

          Re: Real problem mentioned first

          >A good worker realises a sharp knife is dangerous so extreme care is needed...

          I've always promised myself to actually get my Grandfathers cut-throat razor out-of-the draw and learn how to use it for the job it was originally intended, however, for everyday purposes an electric razor does just fine...

          A concern with Rust has to be that people not only get used to Rust's security but start to write code that depends on those features being present, to the extent that they are dangerous if they were ever to attempt to write code that did not have bounds checking for example; but that is part of progress...

          1. Anonymous Coward
            Anonymous Coward

            Re: Real problem mentioned first

            A straight razor is pretty easy provided you remember it's a wet shave, so need to keep adding fresh moisture, as shaving yourself takes a bit longer than having a barber do it.

            I've been using one for years, still can't sharpen or strop the blades properly without ruining the edge.

            So end up using a shavette, which is a disposable razor blade in a straight edge razor, like the barbers would use.

            A sharp razor requires much less pressure so you don't cut yourself, it takes longer, but a much better finish than a safety razor.

            A programming language has an ecosystem of users, vendors, tools, and literature.

            What is progress about rewriting the code in a different font?

            Software needs to be reliable as a systemic whole, the idea that a "perfect" programming exists, or would even help in the journey is never proved, due to the obvious flaw.

            There is a limit on how much difference your software can make to the problem as a whole, so, by all means use Rust, or any other language of your choice.

            The design of the system into simple stupid code that runs in supervision trees, will give you far more robust systems with much more fault tolerance than the "Beware of bugs in the above code; I have only proved it correct, not tried it. " assertion that rust is being sold as.

          2. ssokolow

            Re: Real problem mentioned first

            From what I've seen in various discussions, Rust seems to act like training wheels for idiomatic C++ for many people who use both, because it enforces architectural decisions considered good modern C++ practice using error messages and, as you become habituated to not triggering the error messages in the first place, those habits transfer over to C++.

            It's just a bunch of anecdotes, so I can't say if it has any statistical significance, but that's what I've seen.

    2. Anonymous Coward
      Anonymous Coward

      "what you’re really saying is that it takes skill to write code."

      No, it means C requires to write a lot of potentially unsafe code to handle basic tasks like string management. If C wasn't written in a time when perforated cards and line printers were all the I/O you had, it would have been a better language from the beginning.

      "It takes skills" remember me of workers who don't want to follow proper safety practices and remove safety protections while working with dangerous tools - and all of them now have some piece of their body missing or badly damaged. Hey, bat men don't need safety, right?

      1. Anonymous Coward
        Anonymous Coward

        Re: "what you’re really saying is that it takes skill to write code."

        No, it means C requires to write a lot of potentially unsafe code to handle basic tasks like string management.

        It really doesn't, you need to reframe your thinking about C. C is a language that allows you to express concepts, you may have to squint a little to see the concept but it gives you immutablity with const, functions as a data type, a preprocessor to build a correct DSL.

        You can write correct by construction code by using explicit data structures, a string is essentially a vector. Iterators help.

        see from the last tile https://forums.theregister.com/post/4196844

        The issue with C is you need supporting libraries of data structures to be productive, and that seems to be lost on people who insist on behaving as if explicit inline ad-hoc data structures are not just poor structure.

        It's also best used in small doses rather than forced for every task, but that's not a failing of the language itself.

        1. Anonymous Coward
          Anonymous Coward

          "C is a language that allows you to express concepts"

          Yes, it's the very definition of any language.

          The issue with C is the concept are expressed in ways that are very unsafe, a little mistake and you have a RCE. Especially for tasks where there are really no reasons to be so unsafe fifty years later.

          1. Anonymous Coward
            Anonymous Coward

            Re: "C is a language that allows you to express concepts"

            https://en.wikipedia.org/wiki/Concepts_(C%2B%2B) with ASSERT and macros, but same idea.

            RCE stands for Remote Code Exploit. It's a system vulnerability not a language flaw.

            C's flaw is that people slating it are not comparing it to assembler. Write a decent set of wrappers and job a lot easier.

            You literally have a built in preprocessor so you have no excuse to not write a nice DSL that is a check variant of what you are trying to do, over explicit data structures.

            I would like to see where I'm supposed to find the paragon of good practise.

            Is it Java? Golang? Rust, C#, Erlang, Prolog, Haskell, Pascal, Basic, F#, Cobol, Fortran, Swift

            Which language is it that I'm supposed to look to for my salvation?

            C and C++ and a decent knowledge of data structures, provide everything I need.

            Including the ability to build interpreters of other languages to ease my task.

            Not using the freedom afforded by process boundaries to do polyglot programming, or introduce fault tolerance into systems, is not a language issue, its a design flaw.

    3. Adam Azarchs

      Re: Real problem mentioned first

      Making something possible to do right is very different from making you go out of your way to do it wrong. When you're taking about a few million lines of code, written by a large number of contributors, some of them who may have passed away before the youngest contributors were even born, then even if all of them are highly skilled the odds of a mistake creeping in are very high. Human beings aren't perfect, and it only takes one mistake to get a serious security vulnerability in something directly exposed to the internet like https.

      This of course leaving aside the fact that though 85% of people think their driving is safer than average, over a million people die each year in car accidents. Frankly, if you say only other people make mistakes I'm going to stay as far away from you as I can. I say this as an expert in the Dunning-Kruger effect.

    4. DrXym

      Re: Real problem mentioned first

      Well if you're such a manly programmer, why aren't you using Brainfuck?

    5. lostinspace

      Re: Real problem mentioned first

      What's more depressing is that the old farts can't see that languages have improved in the last 30years, and that maybe the thing they learnt 30 years ago and haven't learnt anything new since, isn't the best way to do things anymore.

      1. Electronics'R'Us
        Windows

        Re: Real problem mentioned first

        I (and many others like me) who have been programming for a few decades are fully aware that new languages are available and use them where they make sense.

        C is perfect in embedded microcontrollers for numerous reasons (C constructs map very nicely onto such hardware) but it wouldn't be appropriate in some other contexts. It is telling to note that the (mostly free) IDEs that are available for microcontroller development support C and C++ out of the box (or download).

        When doing 8 bit microprocessor / microcontroller designs (yes, those are still very much alive and well) I may even drop into assembly for some stuff.

        Many modern languages are in response to a particular set of use cases that other languages were not really suited to; there is no universal language (I may have just started a holy war there).

        So we are very much aware of new languages, but using any language in an application it is not really suitable for is a poor design choice. That will be as true for Rust as it is for any other language..

        1. ssokolow

          Re: Real problem mentioned first

          Not just alive and well, the hobbyist world has discovered them thanks to the sub-$1 STM8S Minimum Development System boards available on Aliexpress, which an enthusiast wrote an sdcc-compatible C99 approximation of the Arduino API for (Sduino), and the 3¢ Padauk microcontroller (for loose SMD chips in the one-time programmable version.) that EEVBlog brought attention to and enthusiasts then reverse-engineered open-source support for.

      2. Anonymous Coward
        Anonymous Coward

        Re: Real problem mentioned first

        What's depressing is you don't think that 30 years of experience might inform your opinion.

        What's your argument, that I can't take a kid with access to stack overflow and make them useful in C or C++. You say that like you presented an actual argument rather then "you old, so you dumb"

        I will happily hire good people, old or young, but old and still in this game should give you a little bit of advance warning..

        Being young is not a qualification for knowledge work. Having An iPhone doesn't make you clued up.

        Tell you what, make something from descrete components on a bread board.

        Attach a MCU to it, don't wear a hair shirt, you can use a PI or some other massive mcu if you like.

        Write some code to do some basic control, see how long it takes before the hardware starts to disagree with your "academic" model of how software works.

        You will be back, begging to be allowed C, once it almost works, but you've used the entire hw budget for that device. Because - "productivity"...

        1. Electronics'R'Us
          Thumb Up

          Re: Real problem mentioned first

          Well said.

          I have been doing electronics for money for over 50 years and programming for 40 - and I am still very much in the game.

          Before others say something like "but you are stuck in old methods" I will mention a couple of things (the list could be longer but just as a starter):

          1. We have not yet repealed Ohms Law and some might be surprised at just how many faults can be analysed with that alone. Mr. Kirchoff's rules are also rather useful to this day. Ohm's law was proposed by Georg Ohm in 1827 and eventually accepted in the 1850s. Does the new crowd think that is outdated?

          2. I have my name on a couple of high speed (multi-gigabit) networking standards (one of which is used by so much stuff that it is ubiquitous now) so I am not 'stuck in the past' - I have moved with the industry.

          Old and still in the game should indeed give a bit of advanced warning.

          1. Anonymous Coward
            Anonymous Coward

            Re: Real problem mentioned first

            I regret I have but one upvote to give.

        2. ssokolow

          Re: Real problem mentioned first

          The point of Rust isn't to force some "academic" model of how software works onto C. The point is to provide a clear boundary between what needs to be audited for memory safety and what can only do memory-unsafe things if the stuff within the auditing boundary fails to uphold its promises.

          That's why Rust makes it so easy to call into C code, and why they put in the work to design a syntax for embedding blocks of inline assembly that's disentangled from the quirks of LLVM's specific implementation of it, despite also allowing modules to be annotated with `#![forbid(unsafe_code)]`.

          As long as you recognize that you can't cast away constructs from the safe subset of the language that map to "noalias" when translated to LLVM IR (i.e. You can't cast from a unique pointer to a raw pointer to a shared pointer and use that to bypass safe Rust's aliasing restrictions), you can do all the C-esque stuff you want inside functions or blocks marked `unsafe`...you're just expected to expose an API that is either marked `unsafe` or constrains/validates its arguments so they can't cause memory-unsafe behaviour.

          Heck, the safe primitives of the Rust standard library are written in `unsafe` Rust by design. The goal of the language is to not need anything lower level as long as you've got intrinsics and, for guaranteeing that optimizers won't break constant-time guarantees for crypto, inline assembly.

          1. Anonymous Coward
            Anonymous Coward

            Re: Real problem mentioned first

            So I get to sufffer and then do in assembler.

            Again with the safety. It's a vague undefined term. The code works or it doesn't. If you can't find anything with AFL and valgrind, and asan/ubsan stick a fork in it, it's done.

            These have specific meanings, AFL means I fuzzed it with mutation input and couldn't make it deviate from expected behaviour. valgrind / asan prove my memory is accounted for.

            diverse compliation on multiple platforms check for different warnings, and not a single line of assembler needs.

            Barriers are inserted where I need barriers. Cryptographic code is the last place, I want Rust.

            CBC mode has a look when implemented in C.

            CTR mode has a look when implemented in C.

            That's the community who measure things by how easy it is to implement in hardware.

            Safety is yet to be defined, except in contradiction to C.

            I don't want to trade essential liberties for temporary safety, and then doing in assembler.

            Why? I can do it with C++ and ruby and never need to touch assembler.

            I have portable threads, mutexs, condition variables.

            I have a decent data structure library,

            1. ssokolow

              Re: Real problem mentioned first

              Rust has a very specific definition of safety:

              If the code inside your `unsafe` blocks is sound and in the absence of compiler bugs, nothing you write outside the `unsafe` blocks will be able to cause a dangling pointer, a use-after-free, or any other form of undefined behaviour.

              AFL and Valgrind and ASan/UBSan only catch problems in code paths they manage to execute, and may need to run for a long time to do so.

              If you can rule something out in the type system, you've ruled out entire classes of bugs in a much shorter span of time, and it's much easier to prevent regressions.

              Type systems are essentially a very limited form of proving, which is what Dijkstra was talking about when he wrote “Program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence.”

              (In the same way that C and C++ enable more compile-time guarantees than assembly languages did.)

              If you want to call your C++ from Rust or vice-versa, both speak C ABI very easily and Google's Chrome team are interested in efforts to make C++ interop even easier. If you *want* to write something in Rust, Rust also has portable threads, mutexes, condition variables, decent data structures, access to intrinsics, etc.

              Rust is just trying to be the best language it can be in a problem space where there haven't really been any viable alternatives to C or C++ for a long time. It's not Rust's fault if other people decide they want to rewrite C or C++ things in it.

              1. Anonymous Coward
                Anonymous Coward

                Re: Real problem mentioned first

                If you can rule something out in the type system, you've ruled out entire classes of bugs in a much shorter span of time, and it's much easier to prevent regressions.

                So essentially you are arguing for C++ concepts, https://en.cppreference.com/w/cpp/language/constraints

                Btw that is actually mentioned in D&E, so it took twenty years to go from book to being standardized.

                So far it's "strict static typing" which is available in C and C++ for some years now.

                If the code inside your `unsafe` blocks is sound and in the absence of compiler bugs, nothing you write outside the `unsafe` blocks will be able to cause a dangling pointer, a use-after-free, or any other form of undefined behaviour.
                So use std::unique_ptr<T>. not possible to dangle it's a move only data type, can't use after free, and doesn't invoke UB. https://en.cppreference.com/w/cpp/memory/unique_ptr Very useful for making C functions much safer.

                // for popen only

                namespace std

                {

                template <>

                class default_delete< FILE >

                {

                public:

                void

                operator()(FILE* ptr)

                {

                pclose(ptr);

                }

                };

                } // namespace std

                std::string

                popen_result(const std::string& command)

                {

                std::unique_ptr< FILE > pipe;

                pipe.reset(popen(command.c_str(), "r"));

                std::ostringstream ss;

                if(pipe)

                {

                char output[100];

                ssize_t length;

                while((length = fread(output, 1, sizeof(output), pipe.get())) > 0)

                {

                ss.write(output, length);

                }

                }

                return ss.str();

                }

                Trivially make a C api impossible to misuse by simple composition.

                Rust is just trying to be the best language it can be in a problem space where there haven't really been any viable alternatives to C or C++ for a long time.

                Rust is largely the people at Mozzilla NIH another language rather than fixing the horror that is the Firefox C++ codebase.

                I'm happy to welcome Rust to the fold, it will slowly take up the sort of usage at AWS that I'm happy to avoid. I don't object to it's use.

                1. ssokolow

                  Re: Real problem mentioned first

                  Yes and no.

                  Yes, in that Rust is trying to be minimally revolutionary, being more of an effort to bake verification for various established best practices into the compiler.

                  No in that I'm skeptical that C++ can be retrofitted to achieve a comparable level of compile-time verification, given the massive ecosystem of existing legacy code that doesn't tell the compiler enough about the data flow.

                  In that respect, Rust cheats by disguising "fill in all this information" as part of "write language bindings".

                  Also, I'd be interested in seeing your take on a blog post from 2019 named "Modern C++ Won't Save Us" by Alex Gaynor, which gives some examples of how modern C++ constructs are still very prone to misuse.

                  As for Mozilla NIH, it's more that they supported Graydon Hoare's experimental project which, until surprisingly close to the v1.0 freeze, was more Go-like, with built-in support for green threading and a reserved sigil for future plans for garbage-collected pointers.

                  1. Anonymous Coward
                    Thumb Up

                    Re: Real problem mentioned first

                    https://alexgaynor.net/2019/apr/21/modern-c++-wont-save-us/

                    It's an interesting article and it's hard to argue against his examples.

                    I'd be surprised to see them in the wild tbh, take std::string_view this came in in c++17.

                    It's not intended to be used in this way as it's a non-owning view on to an existing buffer.

                    I would expect a recent compiler to warn on this, https://godbolt.org/z/Er94nh

                    It seems that Clang catches this but GCC lets it through. Still this is really easy to get wrong, and compiler support is lagging.

                    The majority fall into the case of trying to be too clever, but yes in the usage described there are some footguns, and the only mitgation is don't do that.

                    For example capturing a shared pointer by reference is not done in that way, it's done by using CRTP and inheriting from enable_shared_from_this<T> so that shared_from_this() can be used in the case where you are passing your reference in this way..

                    include <memory>

                    #include <iostream>

                    struct Good: std::enable_shared_from_this<Good> // note: public inheritance

                    {

                    std::shared_ptr<Good> getptr() {

                    return shared_from_this();

                    }

                    }; https://en.cppreference.com/w/cpp/memory/enable_shared_from_this

                    #include <memory>

                    #include <iostream>

                    #include <functional>

                    std::function<int(void)> f(std::shared_ptr<int> x) {

                    return [&]() { return *x; };

                    }

                    int main() {

                    std::function<int(void)> y(nullptr);

                    {

                    std::shared_ptr<int> x(std::make_shared<int>(4));

                    y = f(x);

                    }

                    std::cout << y() << std::endl;

                    }

                    The previous example is much less defensible, it goes out of it's way to do something contrived. When you remove the noise it's just doing this..

                    #include <iostream>

                    #include <functional>

                    std::function<int(void)> f(int x) {

                    return [=]() { return x; };

                    }

                    int main() {

                    auto y = f(4);

                    std::cout << y() << std::endl;

                    }

                    And rewritten with idomatic C++

                    #include <iostream>

                    #include <functional>

                    #include <iomanip>

                    void foo(std::ostream & out, int x){

                    out << x << '\n';

                    }

                    int main() {

                    auto y = std::bind(foo,std::ref(std::cout),4);

                    y();

                    return 0;

                    }

                    It's generally a fairly accurate article in that Modern C++ hasn't made it impossible to shoot yourself in the foot. I would tend to file most of that as being coding standards issues tbf, but mostly examples of where the tooling is lagging or it's possible to mess up easily are solid arguments for unit testing - would catch the string_view/std::span.

                    I think it's more an argument for being conservative with new language features rather than an Anti-C++ article.

                    That said, I would have likely missed the std::string_view in a code review, and I would have only caught it with Clang > 10. My setup is clang version 11.0.0 / GCC 9.3.0

                    Thank you for pointing me at the article.

                    1. ssokolow

                      Re: Real problem mentioned first

                      I'll agree to that. Whatever point people are meant to take away from it, the one thing that's not in dispute is that it's a "Modern C++ doesn't obviate the need to code responsibly" article.

                      I wouldn't code in C++, but that's because, as a responsible person, I recognize my own limitations.

                      1. Anonymous Coward
                        Anonymous Coward

                        Re: Real problem mentioned first

                        See I don't think the language matter that much tbh.

                        Partly because I think the C and C++ have first class tooling for verification.

                        Partly as the verification of a system is required regardless of the code, so its mostly irrelevant to the language.

                        I wouldn't code in C++, but that's because, as a responsible person, I recognize my own limitations.
                        Would you say that's more about C++ or your level of comfort with the language.?

                        I echew golang but that purely a matter of taste. I dislike java intensely, that's actually more about the java build ecosystem than the language. (I use M4 as an external Java preprocessor, which helps)

                        C++ is about taking C and wrapping it in a light sprinkling of correct by construction idioms.

                        For example, RAII - meaning that Resources naturally map to object lifetimes, and are released in inverse order of acquisition, even in the presence of exceptions.

                        Also C++ is standardized, that's a major factor in investing time in C++.

                        Why bother with someone else "lets build a language runtime" project until at launch, I'm offered something better than C++, by people who write good C++.

                        C++ people have the best of all worlds, you can make very easy to use tooling, for example, if I get a bug or a crash, I get a stack trace. This is written to work in the case where you get a segfault,

                        a much nicer version is used for debug operation, but is not async signal safe.

                        // writes a minimally formatted backtrace to a descriptor in a async signal safe

                        // manner

                        void

                        pretty_backtrace_fd(int fd)

                        {

                        typedef void *stack_frame_list_t[50];

                        // grab the call stack, which might well be damaged

                        stack_frame_list_t stack_frames = {0};

                        size_t depth = backtrace(stack_frames, (sizeof(stack_frames) / sizeof(stack_frames[0])));

                        for(size_t call_site = 1; call_site != depth; ++call_site)

                        {

                        // Format the current depth of call-site for presentation in decimal

                        char imsg[16 + 1] = {0};

                        char prefix[7 + 1] = "Caller[";

                        char suffix[5 + 1] = "]: 0x";

                        char msg[sizeof(prefix) + sizeof(suffix) + sizeof(imsg) + 1] = {0};

                        char *p = &msg[0];

                        async_safe_to_decimal(call_site, imsg, sizeof(imsg));

                        p = async_safe_strcpy(prefix, prefix + sizeof(prefix), p);

                        p = async_safe_strcpy(imsg, imsg + async_safe_strlen(imsg), p);

                        p = async_safe_strcpy(suffix, suffix + sizeof(suffix), p);

                        // Write a nicely formatted stack trace

                        async_safe_fprintln(fd, msg, ((uintptr_t)(( void * )stack_frames[call_site])));

                        }

                        // write the stack trace

                        backtrace_symbols_fd(stack_frames, depth, fd);

                        }

                        1. ssokolow

                          Re: Real problem mentioned first

                          As someone who judges tooling by what I can afford to use in my hobby projects, I'd say that, for practical purposes, Rust unavoidably has better tooling, because I'm comparing based on free and open-source tooling for Linux users to run offline.

                          As for comfort level, I'd say that I *could* get more comfortable with C++, but I just generally don't feel comfortable being responsible for correctly using languages that make accidental memory-unsafety so easy.

                          (With one exception. Using C for retro-hobby programming meant to run in DOSBox or on airgapped DOS machines. I do have a DOS retro-hobby project I'm working on in C using Open Watcom C/C++.)

                          Most of my programming experience is 20 years of Python, with the rest being in other scripting languages like JavaScript (and some CoffeeScript in the past and TypeScript now), PHP, Lua, Bourne shell script, and some now forgotten Visual Basic 6 from high school and QBasic and DOS Batch files from my childhood.

                          While I do have some C, C++, and Java experience (some high school and university courses and the aforementioned hobby project), it's important to understand that I came to Rust, not as a way to write low-level code, but as a way to get more static guarantees than MyPy could add to Python because I was tired of burning out trying to meet my own standards writing automated tests.

                          I eschew golang and Java for basically the same reasons as you, though, last time I was using Java (in university), it also didn't have any kind of equivalent to C++'s "auto", and I found that added to the pile of things which sapped my ability to enjoy writing it.

                          As for C++'s strengths, I agree that RAII is good, but Rust does RAII and adds a layer of extra compiler verification to it beyond anything I'd be able to afford for C++ for hobby-programming.

                          I've never *had* a segfault in Rust, but it's been designed to work with the same debugging infrastructure C++ uses if I ever do. (Plus, its internal assert system can generate backtraces if in the default "unwind the stack and run destructors on panic" configuration. All I have to do is set the RUST_BACKTRACE environment variable to either 1 or full, depending on how much detail I want.)

                          C++'s strengths just don't line up with my needs enough to justify becoming proficient in it.

                          (If the copy of Microsoft C/C++ 7.0 my father brought home when I was in elementary school had come with tutorial materials, or I hadn't been driven away from Macmillan Publishing's Game Programming Starter Kit 2.0 in high school by being thrown face-first into the verbosity of a Win32 Hello, World! application after having tasted Visual Basic 6 and wxPython... in the days before devices to read big "learn to program" books comfortably in PDF form, I might look at it differently now.)

                          1. Anonymous Coward
                            Anonymous Coward

                            Re: Real problem mentioned first

                            I think you might be surprised, C++ is evolving into a functional language one step at a time.

                            We have lambda, pattern matching, algebraic data types, co-routines, concepts.

                            Come in an dip a toe, it's so much easier and more fun these days, it almost feels like cheating.

                            https://godbolt.org/ online compilers for every compiler you can shake a stick at.

                            Linux box to try a quick compile and run on https://coliru.stacked-crooked.com/

                            So long as your are open source, https://scan.coverity.com/ is free - and it's one of the few tool, I would recommend paying for.

                            UBSAN/ASAN/TSAN are all built on llvm and are free, but you need to compile them yourself, as you need to jump through some hoops.

                            But basically, that's a decent chunk of undefined behaviour / memory / threading statically verifiable.

                            The code should be simple and clear not clever.

                            so

                            std::vector<int> foo = {1,2,3};

                            int sum=0;

                            for(int i=0; i!=3; ++i){

                            sum += foo.at(i); // checked variant of foo[i]

                            }

                            std::cout << "sum of elements is: " << sum << '\n';

                            Now, I'm a big fan of Sean Parent's C++, it's clean and very readable, yet extremely efficient and he would advocate.

                            std::vector<int> foo = {1,2,3};

                            int sum = std::accumulate(foo.begin(),foo.end(),0);

                            std::cout << "sum of elements is: " << sum << '\n';

                            1. ssokolow

                              Re: Real problem mentioned first

                              Honestly, it's not the functional parts that are the big problem. They're very nice to have, but I'm writing a hobby project in what is essentially C89 and it's not really grating on me for a use-case closer to what it was designed for.

                              It's that, for the situations where C++ would be suitable (ie. where I have access to a compiler more modern than Open Watcom C/C++), Rust meets my needs better.

                              Coverity is good, but I don't like my code's correctness to depend on tools that are only available as proprietary SaaS. (I prefer all that to be stuff I can run offline.)

                              UBSan/ASan/TSan etc. are all dynamic analyzers. Code written in the safe subset of Rust gives me the same guarantees at compile time without me needing to fuzz it or achieve full branch coverage to get the same level of confidence.

                              The promise of Rust's safe subset is that *all* undefined behaviour, memory, and threading bugs are statically verifiable. (i.e. That, if you see them, the blame lies either with the `unsafe` blocks or a compiler bug... which means that applying `#![forbid(unsafe_code)]` to the top level of your project limits your responsibility on that front to choosing and managing your dependencies well.)

                              ...plus, I like how expression-oriented Rust is.

                          2. Anonymous Coward
                            Anonymous Coward

                            Re: Real problem mentioned first

                            I came across this, it's C99 not even C11, this caught my eye

                            Thanks to Rust and ML for their implementations of sum types.

                            https://github.com/Hirrolot/datatype99

                            // Sums all nodes of a binary tree.

                            #include <datatype99.h>

                            #include <stdio.h>

                            datatype(

                            BinaryTree,

                            (Leaf, int),

                            (Node, struct BinaryTree *, int, struct BinaryTree *)

                            );

                            int sum(const BinaryTree *tree) {

                            match(*tree) {

                            of(Leaf, x) {

                            return *x;

                            }

                            of(Node, lhs, x, rhs) {

                            return sum(*lhs) + *x + sum(*rhs);

                            }

                            }

                            }

                            #define TREE(tree) ((BinaryTree *)(BinaryTree[]){tree})

                            #define NODE(left, number, right) TREE(Node(left, number, right))

                            #define LEAF(number) TREE(Leaf(number))

                            int main(void) {

                            const BinaryTree *tree = NODE(NODE(LEAF(1), 2, NODE(LEAF(3), 4, LEAF(5))), 6, LEAF(7));

                            printf("%d\n", sum(tree));

                            }

                            1. ssokolow

                              Re: Real problem mentioned first

                              Now that could be useful.

                              I wouldn't use it for all of the parts of my DOS hobby project, given that part of the project is an installer stub where I've been replacing various standard library and graph.h functions with more compact snips of inline assembly, but it'd be very convenient for the support tooling that doesn't need to fit on a floppy disk without crowding off the actual content.

                              Thanks. :)

                              1. Anonymous Coward
                                Anonymous Coward

                                Re: Real problem mentioned first

                                Turns out I've got some catching up todo.

                                some one posted this problem, and people are code golfing.

                                Most candidates cannot solve this interview problem:

                                Small orange diamond

                                Input: "aaaabbbcca"

                                Small orange diamond

                                Output: [("a", 4), ("b", 3), ("c", 2), ("a", 1)]

                                Write a function that converts the input to the output

                                I suspect you'll like this, https://godbolt.org/z/cr8j63

                                This is c++20, and some clever chap bested my solution by some distance.

                                Mine is fairly old fashioned and *much, much, much longer*,

                                https://coliru.stacked-crooked.com/a/f1bfb7b65fd4d413

                                Give it a spin in Rust, its @Al_Grigor on twitter

                                1. ssokolow

                                  Re: Real problem mentioned first

                                  To be honest, I don't usually code golf because I have so many hobby projects I could be working on instead. Why solve a toy problem when I can solve a real problem and have useful code at the end?

                                  (Same reason I don't really game any more. I'm either in the mood to read or in the mood to solve problems and, if I'm in the mood to solve problems, why not do something useful?)

                                  That said, since you offered, here's a shot at it... with the caveat that, since the addition of group_by to Rust's standard library has been delayed, and Rust's standard library is intentionally lean (eg. random number generation and regular expressions are kept out of the standard library so they can be versioned independently), I implemented what *will* be possible via the standard library using the popular itertools library which implements it now.

                                  https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=25caf8ba1f1df57d14dc9887803516d0

                                  That initial version returns group keys of type `char` because I was unsure what the *specific* requirements were.

                                  That said, since I was going from prior experience with the itertools module from Python's standard library and didn't look at the godbolt link you provided until after, I didn't realize the intent was to get that exact string representation, rather than to get a data structure and demonstrate that you'd done so.

                                  Here's a revised version which matches the output exactly while still using Rust's debug formatting operation... though it is a bit more verbose to still keep the assert_eq! in.

                                  https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4b3c5c702ec9e9056d8de68ffe4e4d04

                                  Finally, here's one where I took out the assert_eq! and *just* produce the desired output:

                                  https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=b2a48b801504be3ea51fae717a9ddd65

                                  ...and, since it was so easy, here's one that actually takes advantage of how strings are the expected return type to iterate over extended grapheme clusters instead of code points, so it'll work properly with emoji and scripts that use combining codepoints:

                                  https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=44bc06c45965ba0bcec1df1015175698

                                  Also, is @Al_Grigor you or the person who posted the code golf problem?

                                  1. Anonymous Coward
                                    Anonymous Coward

                                    Re: Real problem mentioned first

                                    @al_grigor posted the problem. I follow some tech related content on twitter and it made its way into my TL. FWIW I think the requirements are unclear and should have been produce a data structure, then render the output.

                                    But interestingly there is a decent split between a function which parses into some data structure, and function which manipulate the string representation to derive the answer.

                                    It seemed by replies to the thread that a data structure was sought, (critical replies to string only answers) but, personally if I asked for output in an test, and I got a function that didn't produce output, I'd be unimpressed.

                                    That does seem to be a minority view as few other people submitted entire programs, or checked against the example data (I didn't check the entire thread..)

                                    I likewise don't code golf much but this is such a toy problem, it struck me as light relief ;)

                                  2. Anonymous Coward
                                    Anonymous Coward

                                    Re: Real problem mentioned first

                                    let result: Vec<_> = input.chars()

                                    .group_by(|x| x.to_string()).into_iter()

                                    .map(|(k, group)| (k, group.into_iter().count()))

                                    .collect();

                                    So this is nice and high level, but it's write only code.

                                    Comment out any single line, and the entire thing fails to compile with an error that offers no clue to the issue.

                                    That limited the ability to play with that expression and try modify it to do something else..

                                    I suspect some more familiarity with the language would help, but, well, I'm a little lost..

                                    Compiling playground v0.0.1 (/playground)

                                    warning: unused import: `itertools::Itertools`

                                    --> src/main.rs:1:5

                                    |

                                    1 | use itertools::Itertools;

                                    | ^^^^^^^^^^^^^^^^^^^^

                                    |

                                    = note: `#[warn(unused_imports)]` on by default

                                    error[E0308]: mismatched types

                                    --> src/main.rs:16:15

                                    |

                                    16 | .map(|(k, group)| (k, group.into_iter().count()))

                                    | ^^^^^^^^^-

                                    | | |

                                    | | expected due to this

                                    | expected `char`, found tuple

                                    |

                                    = note: expected type `char`

                                    found tuple `(_, _)`

                                    error: aborting due to previous error; 1 warning emitted

                                    1. ssokolow

                                      Re: Real problem mentioned first

                                      It's definitely a matter of familiarity... though it's more about familiarity with how functional APIs approach the problem. Syntax aside, that would look more or less the same in any functional iterator API, and it would be the way to approach the problem in a pure functional language like Haskell.

                                      To explain what the compiler said:

                                      It complains about itertools::Itertools being unused because that's the trait/interface that adds `.group_by()` to the standard library iterator API and you removed or commented out the use of `.group_by()`, but that warning can be ignored as it's only indirectly relevant to why it won't compile.

                                      Here's what those operations do, including links to their API documentation, which contains examples:

                                      .chars() produces an iterator over chars from a string or string slice.

                                      .group_by(function) calls the given function on each item in the iterator to generate a grouping key and produces an iterator over `(key, iterator_over_group)`. (|args| expression is the syntax for a closure/lambda function)

                                      .map(function) applies the given function to each item in an iterator to produce a new iterator.

                                      The actual error is that, without the `.group_by` before it, the function inside the `.map()` expects a two-element tuple as input but is receiving a `char` instead.

                                      .collect() consumes an iterator and produces a collection. Because it's generic over its output, you'll usually need to either provide a type signature for the variable it collects into or, if you don't want to assign to an intermediate variable, call it using the "turbofish" syntax, which looks like this to avoid C++'s parsing complications with generic syntax and less/greater than:

                                      .collect::<Vec<_>>()

                                      (The `_` in `Vec<_>` is a wildcard and, except for function declarations, which must state everything explicitly so they can serve as the boundary of API stability, you can use it in type signatures wherever the type inference already has enough information to know what you meant.)

                                      Here's a solution in a more imperative style:

                                      https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=38d8782b8d42e1314339bc5cd7eef880

                                      1. Anonymous Coward
                                        Thumb Up

                                        Re: Real problem mentioned first

                                        Thank you, this going to a take a little while to work through but much reading needed I think.

                                        1. ssokolow

                                          Re: Real problem mentioned first

                                          Oh, actually, I should correct one thing. You can't use wildcards in type declarations like `struct` or `enum` either because those are also part of the code's external API. By "use them in type signatures", I meant things within functions like `let x: <type> = ...`

                                  3. Anonymous Coward
                                    Pint

                                    Re: Real problem mentioned first

                                    thanks again btw it's been a nice change..

                                    have one on me

      3. Anonymous Coward
        Anonymous Coward

        Re: Real problem mentioned first

        What's more depressing is that the old farts can't see that languages have improved in the last 30years, and that maybe the thing they learnt 30 years ago and haven't learnt anything new since, isn't the best way to do things anymore.

        I'm an old fart who has been doing C for well over 30 years, but even I'm a keen advocate of Rust. It's a sensible and well thought out improvement on the tooling, and we'll all be better off if it takes over.

        The question over whether or not automation vs manual is better was settled a long time ago, but people (especially in the software world) like to continue to argue about it. Carefully thought out automation (in this discussion, Rust's compiler) helps prevent all manner of ills that crop up without it (i.e. C, or even assembler). It's easy to find various examples of this. The commercial aviation industry has become objectively a whole lot safer with the introduction of fly-by-wire, even whilst growing in size prodigiously. There's been some wrinkles - Boeing and the MAX (an absolutely scandalous turn of events), and on the rare occasion when something gone wrong with the FCS some pilots have forgotten how to fly (Air France into the Atlantic) - but there is no doubt that cockpit automation has gone hand in hand with reductions in crash rates over the past 30 years. So my view is, if that's what's true in aviation, why on earth wouldn't it also be true in software development?

        So this old fart is all for Rust, in the same way I think that smart pointers in C++ are a significant improvement on raw pointers. It's good, there's very valid few reasons to not use it (especially given the ecosystem and library interfacing design that the Rust community has built up), and only a fool would preclude it from projects on points of principle.

        Come on Guys: Smell the Roses, Grow Up and Do Things Properly

        I think the software world is particularly prone to a lack of rigour persisting to everyone's detriment, with little motivation inside the industry to stamp that out and near zero motivation to make fundamental improvements across the whole industry.

        Take a look at JavaScript, and indeed all things web; the mistakes that browsers tolerate, accept as common practice, or simply gloss over with minimal external bleating is simply appalling. This ambiguity causes so many problems.

        And my pet favourite - interfaces. The number of bugs that are associated with malformed data being passed through interfaces unchecked, or at best slightly checked, is shameful. Very few can be arsed to write code to enforce interface specs. Now, wouldn't it be nice if that drudgery task could be automated?

        Turns out it can; people are perhaps familiar with JSON schemas these days which, properly used, can be quite effective and checking data for interface compliance. There's even some decent JSON validators. XML schemas too can at least have good, tight definitions on what's valid and what's not, but a lot of the tooling is truly rubbish and often ignores tags like <maxval> (or whatever it is).

        But both of those pale in comparison to ASN.1 which has been around for 30 odd years now, more or less ignored by the majority of the software industry unless people happen to have to tinker with certificates. Google have part reinvented it with Google Protocol Buffers (apparently in ignorance of ASN.1's existence; ironic, a search company not using a search engine to look for existing serialisation technologies), but have missed out the good bits.

        Most programmers have no idea how nice it is to have proper constraints checking of data that can be freely interchanged between anything ranging from a micro-controller running a bandwidth constrained radio link all the way up to something consuming JSON, in languages like C/C++, Java, C#, Ada even.

        It's been ignored I think for various reasons, starting off with the fact that it's basically a French idea, an ITU standard and doesn't emanate from the IETF. The extant tooling that's good costs money (but has saved me millions over the years), but for some reason outfits like Google that can and do put effort into free tools have chosen to develop their own inferior ones from scratch.

        Basically, the software industry is dominated by, and riddled with, the "macho programmer", and the fault rate is getting worse not better. The US Gov reckons it's costing the US economy $2trillion a year. But despite having the technologies available for 30 years to make hard and inarguable improvements, the industry still prefers to "hack it by hand", or "Agile it until we get paid". This is immature, and it's getting worse. If it continues, people will stop asking us to write software.

        In comparison, the aviation industry adopted automation pretty much as soon as it became technologically feasible. And look how their business has taken off (until Covid).

        So, Rust ought to become widely used, otherwise we're just refusing to improve. We're either going up or down; there's no such thing as staying still.

        The macho programmer is also behind most of the self driving car projects, none of which are actually succeeding (just tame demos that you'd never trust to look after your kids on a stormy night, pouring with rain, lots of busy traffic, etc). Sooner or later the investors are going to realise there's nothing coming of it, and then what's to encourage them to invest money in the next "big idea"? It's going to be looked on as an example of mis-selling by the software / tech industry's engineers and luminaries.

  5. jake Silver badge

    Is it just me ...

    ... or does anybody else see the similarity between the anti-C folks and anti-vaxers?

    1. sabroni Silver badge

      Re: Is it just me ...

      Where's the similarity? Anit-vaxxers have no evidence to back them up, anti-C folks have plenty of evidence that C devs can't do their jobs properly.

      Your reaction, evidence free and calling out to emotion over logic, is much closer to an anti-vaxxer than a scientist.

      1. Anonymous Coward
        Anonymous Coward

        Re: Is it just me ...

        Anti C folks running on a world built in C advocate faulty premise.

        That BSD socket and Linux kernel thing seems to be doing okay, as well as almost every webpage you've ever seen, every image you've ever seen, and almost every video game you'll have played.

        C and C++ built your world after assembler got too time consuming to do everything in.

        How's the lisp machine coming on?

        When will typescript react in jest. That's a stack that fills me with hope.

        Learn the tools, write stupider code, test it properly. That's going to work in every language.

        But you don't want to work in C or C++ so *nobody* is allowed to work in C or C++.

        I don't want to work in C# but I'm not out there boycotting "Anders Hejlsberg".

        Post some code that is "impossible to write correctly, cleanly" in C or C++ with the context.

        Because we have huge volumes of sw including all the major scripting languages in C.

        And we have boost, Qt which underpins a lot of commerical software, photoshop and lots of adobe is C++.

        Could it be that your issue with C and C++ is not really about C and C++ per say, but equal parts dogma (Memory bad) and fault reasoning, a poor workman blaming his tools.

        My electricity supply is sufficient to kill me, it's not the fault of the installation if I defeat the safety interlocks.

        1. doublelayer Silver badge

          Re: Is it just me ...

          I'm not an anti-C person. I use it with some frequency. Still, I don't see a problem with the core argument that C makes it easier to include certain types of vulnerabilities than other languages do, primarily buffer overflows and memory mismanagement. We see such vulnerabilities in code that's been tested and written by experienced people; this isn't just a problem of novices.

          People who wish to continue using C for its various advantages should either agree with this and state a reason why it's not a problem this time, disagree with this and explain why manual memory management isn't the cause of buffer overflow vulnerabilities as we have seen, or agree with this but explain a reason why alternatives aren't going to fix the problem or aren't suitable for the situation. For example, I frequently use C because of its memory efficiency, which makes it suitable for systems with limited specifications, and most alternatives lack that efficiency, making them unsuitable. I have to consider a few newer alternatives to determine if they have fixed this problem. Unfortunately, I don't see much of this here. I see people blaming all problems on bad coders, which is far too simplistic. I see people asking for perfection, using the fact that there is no foolproof language to excuse any and all arguments. And I see perfect analogies to prove this point. Let's look at yours:

          "My electricity supply is sufficient to kill me, it's not the fault of the installation if I defeat the safety interlocks."

          No, it's not. Unless the safety system is built wrong and there's a live phase where you're about to touch. That would be the system's fault. But even if it's not the system's fault, we might decide to replace the system if we find that the system is unsafe, because it's far too easy to accidentally defeat the safety components. The plug sockets which attempt to ensure that a ground connection is available before the other pins make contact were put in place because they were safer than the previous sockets. The decision was made that the previous equipment was sufficiently likely to cause a preventable problem that it should be replaced with a safer alternative. The system wasn't at fault for acting as designed, but it wasn't safe enough to defeat its alternative.

          1. Anonymous Coward
            Anonymous Coward

            Re: Is it just me ...

            I'm not an anti-C person. I use it with some frequency. Still, I don't see a problem with the core argument that C makes it easier to include certain types of vulnerabilities than other languages do, primarily buffer overflows and memory mismanagement. We see such vulnerabilities in code that's been tested and written by experienced people; this isn't just a problem of novices.

            https://forums.theregister.com/post/4196844 This is plain C.

            There is no manual memory management, or risk of buffer overflows.

            People who wish to continue using C for its various advantages should either agree with this and state a reason why it's not a problem this time, disagree with this and explain why manual memory management isn't the cause of buffer overflow vulnerabilities as we have seen, or agree with this but explain a reason why alternatives aren't going to fix the problem or aren't suitable for the situation.

            The idea that something is easy to get wrong, is not the resounding argument that you are suggesting. If you don't know what you are doing and you mess with live electricity you will end up hurt or dead. That's not to do with electricity but simply that the user was unqualified to handle it safely and lacked the experience to determine which circumstance was safe and unsafe. The issue, is you actually need to know C quite well to use it safely, that is the beginning and the end of the problem.

            Unless you can write const correct code, your code is broken. Unless you can write warn free code, your code is broken. In the rest of the programming world, this is considered an unfair restriction.

            Here's the thing, I'm still waiting to see some C or C++ code that is well written, and yet exhibits these flaws that are much lamented but not evidenced.

            Making life easier for people too lazy to do the work is not a winning move.

            https://www.theregister.com/2021/02/03/enigma_patch_zero/

            https://msrc.microsoft.com/update-guide/en-US/vulnerability/CVE-2018-8653

            A remote code execution vulnerability exists in the way that the scripting engine handles objects in memory in Internet Explorer. The vulnerability could corrupt memory in such a way that an attacker could execute arbitrary code in the context of the current user. An attacker who successfully exploited the vulnerability could gain the same user rights as the current user. If the current user is logged on with administrative user rights, an attacker who successfully exploited the vulnerability could take control of an affected system. An attacker could then install programs; view, change, or delete data; or create new accounts with full user rights. How do I fix that, Oh I can't I'm hosed because I used a "safe" language. Can you tell me why this argument is still advanced.

            Buts that's MS jS right, lets see google

            Google's V8 JavaScript engine. The exploit was possible because Google's code failed to handle a NaN (not a number) result from the addition of JavaScript values Negative Infinity and Positive Infinity when iterating over a list of integers.

            Yep, hosed, no mitigation and even "safe code" is bad now.

            For example, I frequently use C because of its memory efficiency, which makes it suitable for systems with limited specifications, and most alternatives lack that efficiency, making them unsuitable. I have to consider a few newer alternatives to determine if they have fixed this problem. Unfortunately, I don't see much of this here.

            What's wrong with a DSL implemented in preprocessor macros,

            that back onto a small support library in C. Expressive and easy to handle.

            You can template it and generated variants with a script as part of a compile step, if you'd rather not use macro explicitly.

            I see people blaming all problems on bad coders, which is far too simplistic.

            It's not that simplistic, people who understand C and C++ write markedly different code than people who don't, it's easy to see at a glance. It's the easiest discriminate, e.g difference between memcpy and memmove. The answer to that question is a really good heuristic for the sort of code produced. Warnings, some people know the additional warnings and turn on everything they can find.

            Others are content when the warnings don't exceed the compiler limit.

            I see people asking for perfection, using the fact that there is no foolproof language to excuse any and all arguments. And I see perfect analogies to prove this point. Let's look at yours:

            "My electricity supply is sufficient to kill me, it's not the fault of the installation if I defeat the safety interlocks."

            No, it's not. Unless the safety system is built wrong and there's a live phase where you're about to touch. That would be the system's fault. But even if it's not the system's fault, we might decide to replace the system if we find that the system is unsafe, because it's far too easy to accidentally defeat the safety components.

            It's not accidentally defeating the safety components (at least not in the UK with modern CU and RCD), it's me cosplaying as a sparks.

            Firstly, programming is not by accident, so these people are supposed to be qualified sparks in this analogy, a qualified sparks should be able to make safe, and work on dangerous equipment.

            Secondly

            The plug sockets which attempt to ensure that a ground connection is available before the other pins make contact were put in place because they were safer than the previous sockets. The decision was made that the previous equipment was sufficiently likely to cause a preventable problem that it should be replaced with a safer alternative. The system wasn't at fault for acting as designed, but it wasn't safe enough to defeat its alternative.

            That's great and all, but in the UK we use ring final circuts, which is a stupid design, and very very easy to turn from a safe design into an overloaded radial circuit, the plug sockets don't help. its that the 2.5mm t/w is rated at 23 amps and the typical breaker is a 32amp mcb.

            So the answer is use a 20amp fuse, on a radial circuit and call the sparks in to fit a CU with an RCD that is actually tested.

            So, knowing what you are doing, and getting a proffessional to do it are the key things.

            There is nothing actually wrong with a Ring Final Circuit, it has it's place but in the hands of someone who doesn't know how to test an installation correctly, it's a fire waiting to happen.

            So given we have people (qualified sparks) who can do the job safely, why exactly is that we are not blaming the bodge and scarper programmer on day release from js boot camp.

            1. doublelayer Silver badge

              Re: Is it just me ...

              "Here's the thing, I'm still waiting to see some C or C++ code that is well written, and yet exhibits these flaws that are much lamented but not evidenced. Making life easier for people too lazy to do the work is not a winning move."

              I'll grant you all of that. The problem, then, is that most of the core stuff, on which we rely, which is developed by many people who have experience, but evidently not enough, is not "well written". These are large projects, which have been tested to some extent, and they still do this wrong. Perhaps there are some people who can be absolutely trusted to never do that, but they don't seem to be writing this core code. We can't fix this problem by telling all the developers of libcrypt that they're rubbish and need retraining. They will ignore us.

              The electricity analogy is continuing to make my point for me. I made a point about safe or unsafe plug sockets. You countered that a different part of the system can also be risky, changing the subject. Similarly, you have successfully pointed out that you can get vulnerabilities in languages other than C, which nobody argued against. Security requires good practice in coding, and it especially requires it in C because bad practice in C leads more often to security vulnerabilities whereas bad practice in other languages leads more often to crashes. You can still get security vulnerabilities in those languages. If we removed C tomorrow, we wouldn't solve security. However, that point is not in itself a cogent argument for keeping C. Such arguments exist, and they're convincing, but you're not making one. You are not defending C. You are not really even attacking anything else. You're just trying to change the subject to point out that I can't get perfection and hard work is required to approach it. Which is correct and beside the point.

              1. Anonymous Coward
                Anonymous Coward

                Re: Is it just me ...

                I'll grant you all of that. The problem, then, is that most of the core stuff, on which we rely, which is developed by many people who have experience, but evidently not enough, is not "well written".
                Has bugs is not the same as not well written, now you are misrepresenting the point.

                How well or how bad the average bit of code is is not really a language issue, it's more to do with experience and tooling. Chronically underesource projects don't have the resource to spend on release engineering, is a little closer.

                We can't fix this problem by telling all the developers of libcrypt that they're rubbish and need retraining. They will ignore us.
                As well they should, the answer is to simply run tests including AFL. valgrind etc as part of the build. Not switch language to the still unnamed bug free future that awaits us, once we free ourselves from the shackles of being tied to a standardized portable language and use ?

                The electricity analogy is continuing to make my point for me. I made a point about safe or unsafe plug sockets. You countered that a different part of the system can also be risky, changing the subject.
                In point of fact I made the analogy that you need to know what you are doing.

                I said

                "My electricity supply is sufficient to kill me, it's not the fault of the installation if I defeat the safety interlocks.
                . You decided that "safety interlocks" meant plugs, I'm guessing not being that aware that a Ring Final Circuits are a little nutty and that the UK is the only user of them, it's special as your contention that the "plug socket" helps is exactly what's wrong with the debate about rust.

                The socket protects your appliance, the wires protect your house. It's not a different part of the system, if your wires endup burning the house down, the socket is hosed anyway, but I countered by proving that safer sockets didn't absolve you from the consequences of failure to understand the correct method of safe usage, or mean that your professional is still 100% responsible for detecting your dangerous solution by means of appropriate testing.

                Similarly, you have successfully pointed out that you can get vulnerabilities in languages other than C, which nobody argued against.
                Implicit in the thread and the article, that using another language, rather than using professional practises and test equipment, is the way to go.

                It's hard to see that the premise is not being argued.

                Security requires good practice in coding, and it especially requires it in C because bad practice in C leads more often to security vulnerabilities whereas bad practice in other languages leads more often to crashes.
                And you've decided this is better because? It seems like bad practise is statically detectable in C and C++ and not detectable in "other languages".

                You can still get security vulnerabilities in those languages. If we removed C tomorrow, we wouldn't solve security. However, that point is not in itself a cogent argument for keeping C
                Again, we don't need an argument for keeping it, you need an argument for removing it, you've yet to make one, that stands up to minor examination.

                Java, Ruby, Perl, Python, Erlang, Haskell written in C. Seems like managed to get working code out the door..

                Such arguments exist, and they're convincing, but you're not making one. You are not defending C. You are not really even attacking anything else.

                I don't need to defend it, as you have no case. A it's the alternative to assembler, you've not addressed this at all.

                B for higher level code you should be using C++ with scripting languages.

                So the use case for C is between Assembler and C++, and neither you nor anyone else has said a word about how the meaningful usecases are met in that arena.

                I've posted code in C, what exactly makes that code so hard to write, or maintain?

                It's fairly pythonic and frankly it could be ported to a scripting language easily enough.

                You're just trying to change the subject to point out that I can't get perfection and hard work is required to approach it. Which is correct and beside the point.

                It's not about perfection, but pragmatism. There a lot of shit code out there, if people insist in doing application programming in system programming languages, I don't see why I have to care.

                I know a wide community of C and C++ users, they don't have these problems, perhaps it's that they know what they are doing, or maybe they are just lucky.

                I don't see how tab and space swapping round which breaks my python is an advance.

                Or really what you are advocating other than don't use C, but all and any other issue with any other language is just fine, because the bar is lower.

                C and C++ are going nowhere, the last time this discussion came up, it was Java, then it was Golang, now it's rust. I wonder what we'll be ignoring next..

  6. Elledan
    IT Angle

    Valgrind says 'hi'

    Tracking down and resolving memory issues is pretty much a solved problem at this point, even for languages which provide you with all the rope you need to be your own worst enemy, like assembly, and C.

    Static analysis has become infinitely easier with C since C99 nailed down hard typing, instead of allowing for the weak typing that C89 still allowed. A similar weak typing as Rust allows, incidentally. Yes, the Rust compiler is supposed to check your homework, but much like in Python or JavaScript, why would the compiler or runtime know what you actually intended to do?

    C++ solved the issue of memory leaks and range issues with C++98 already, using smart pointers (much improved in C++11) and explicit range checks for buffers (when using the STL buffer types).

    The real issue with programming languages isn't the languages as such, but the way developers use them, along with the discipline in using them, or lack thereof. As said, a compiler or runtime does not know what you intended. That's why we have strong typing, or in the case of safety-critical languages like Ada super-strong typing. In Ada you have to explicitly confirm every action and every type conversion. Rust allows and even promotes implicit actions and conversions, whereas C++ has been discouraging this for years now (C++ style casts).

    The other issue is that of logic errors, which is arguably the largest problem. Who of us hasn't written a parsing routine or such with branching paths which went totally off into the woods due to a wrongful assumptions, typo or other oversight? Your compiler or runtime won't save you there.

    For my personal and professional projects I use CDD, or Comment-Driven Development, which puts a large focus on getting design and requirements documents ready before the first line of code is even written. These documents are maintained as their contents are first transformed into an architectural design document, then the code skeleton (including liberal commentary with intent and reasoning), which is filled in with code. Finally, the up to date design documents and code are used to write the final documentation.

    It is this coherent process which can prevent such logic errors. While languages like Ada also help a lot in preventing such errors by using clear English words instead of symbol soup that enable off-by-one errors (e.g. == vs =, which is = and := in Ada), this does not remove the need for a solid software development process.

    If someone hails a programming language as 'the solution' to all of one's problems with a project, then it is clear that they refuse to acknowledge that their development process is faulty. One saw this with Mozilla and their Firefox codebase (which was beyond horrid) and Apache as a whole is also infamous for shoddy, poorly thought-out code.That's not meant as an attack, but as an observation after blundering through a variety of codebases and working directly with the Mozilla one.

    1. Phil Lord

      Re: Valgrind says 'hi'

      Advocating for use of valgrind and static analysis is just suggested that people use better tools, to help them avoid problems. That makes sense. Choice of language is, likewise, just another tool.

      In terms of comment driven design, rust has an integrated documentation format, which allows inline, examples that compile down to tests. This is in addition to its unit testing features. And it's support for "examples" -- longer free form source files that again are compiled and run as tests. All doable in C, all standard in Rust.

      With respect to logic errors, indeed, no language can stop you from just coding the wrong thing. Although, it can stop you forgetting to handle specific cases, it can stop you from forgetting to handle return values, it can stop you from dead code, all of which are common sources of logic error.

      In the end, I think, contrary to your statement are acknowledging their development process is faulty. One solution is to switch parts of the code away from C.

      Not sure where you get the bit about Rust and casting. Do you have an example?

      1. Roland6 Silver badge

        Re: Valgrind says 'hi'

        >Advocating for use of valgrind and static analysis is just suggested that people use better tools

        No what was being advocated was to adopt a disciplined aka Software Engineering approach, where you will have worked out the logic and associated data structures before even opening the code development toolset.

        One of the problems with programming is that there is little cost involved in code writing, it is easy to code a little and modify it as you go along to get it right. With engineering such an approach is much more expensive - try building a bridge in the same way as many people code; even Agile doesn't really translate very well...

        1. Phil Lord

          Re: Valgrind says 'hi'

          > No what was being advocated was to adopt a disciplined aka Software Engineering approach, where you will have worked out the logic and associated data structures before even opening the code development toolset.

          Yes, if that helps you. But how do you do that? Keep it all in your head? Or do you use some design tool to record those data structures before you open up your code development toolset? Some clicky, pointy, UML type tool?

          Personally, I've never particularly liked those. I'm just not much of a figure person; I don't draw flow charts when writing documents either. I'd rather write a set of outline notes, with some comments, which I gradually expand in to a long document. With code, I do the same thing, just using comments in a code, with some rough outline of the data structures, in code. I can't see the point of using a tool that at some point you have to say "now we stop and write these things in Rust".

          The OP was advocating the same thing. Writing comments first. So straight into a code development environment.

          1. Roland6 Silver badge

            Re: Valgrind says 'hi'

            >I can't see the point of using a tool that at some point you have to say "now we stop and write these things in Rust".

            Clearly not been involved in any large-scale development or complex code that implements a state machine for example. However, this also shows a difference between engineering and computing; with engineering, you do get to a point where you stop using Autocad and start playing with the CNC machines. Which would suggest that the approach many programmers take is more akin to an artist than an engineer.

            Personally, I find it helpful to be disciplined in my thinking and so produce layered documentation, something that is encouraged by the architectural and design frameworks, applying my skills to determine just how I really need to use for a particular project...

            >The OP was advocating the same thing. Writing comments first. So straight into a code development environment.

            I suggest this mindset is more to do with the tools generally available.

            One of the big problems is keeping documentation uptodate and also keeping it crossed referenced to the code (and in sync). There are tools available to help with this task - some have been around since the 1970's (Eg. SADT), however, they've not been widely used for many reasons.

            The easiest is do exactly as many do, jump straight into the text editor and produce some outline design comments, once happy, simply add comment markers and change file extension from .txt to .c and edit again to insert code...

            What is in some respect surprising is that 50 years on we are (in general) still developing code (whether it be ASM, C, Rust etc.) in exactly the same way as we were back then...

            1. Phil Lord

              Re: Valgrind says 'hi'

              > What is in some respect surprising is that 50 years on we are (in general) still developing code (whether it be ASM, C, Rust etc.) in exactly the same way as we were back then...

              Perhaps that is not so surprising. All the massive changes in technology that we have both seen over the years, and here were are, writing short notes to each other in English. Not that different from a bulletin board in the late 70's.

              But the languages have changed, and so have the idioms and standard practices. I routinely hack on one piece of software that has been in continuous development for 40 years. You can see the change and flow of the decades, sometimes in pieces of code that sit next to each other in the same source file. How much of this is a fashion cycle? When I see people getting excited about functional programming, like it's new, I wonder at times.

              1. ssokolow

                Re: Valgrind says 'hi'

                As far as functional programming goes, I think it's that, in the early days, it wasn't suitable for purpose for various reasons (performance, language design being too alien, etc.) and system architectures didn't present an especially strong demand for it, so people got very comfortable with imperative programming.

                Now, we've reached a point where compilers have CPU budgets unheard of in the 70s and earlier, CPU clock speeds have plateaued and functional programming designs lend themselves well to compile-time-correct parallelism, and language designers are looking for ways to incorporate functional ideas into languages without going full Haskell.

                As a result, that combination of things is prompting people to take a second look.

                Will we ever see pure functional languages like Haskell taking over? I seriously doubt it. Why require the masses of mediocre programmers to think in two different paradigms to cover the niches where the abstraction is simply too thick when you can have languages that are fundamentally imperative all up and down the stack?

                ...but is the market ready for the use of more functional stuff in multi-paradigm languages? It certainly looks like it.

        2. Electronics'R'Us
          Thumb Up

          Obligatory quote

          "If carpenters built buildings the way programmers write programs, then civilization would be destroyed by the first woodpecker to come along".

      2. DrXym

        Re: Valgrind says 'hi'

        It's kind of depressing that people think valgrind and static analysis are an excuse for problems caused by C. Certainly they help find a bug, after it has been compiled, but that is no substitute for not allowing the code to compile in the first place.

    2. dajames

      Are you sure?

      Rust allows and even promotes implicit actions and conversions, whereas C++ has been discouraging this for years now (C++ style casts).

      Rust doesn't even allow an integer variable to be assigned the value from a shorter integer variable -- an explicit cast is needed (even though the value can never be truncated or otherwise corrupted). Such an assignment is always benign, and is accepted without even a warning in C++ (as it is in C). I'm not aware of any case in which Rust allows a coercion implicitly that C++ wouldn't.

      C++ discourages C-style casts because they lack clarity of intent. Casts that are meaningful can be expressed in C++ as static_cast, dynamic_cast, etc., which make the programmer's intention explicit.

      The meaning of a C-style cast depends on its context in the program, and it may end up being treated as a reinterpret_cast, which is likely to result in non-portable or undefined behaviour. That's why they're discouraged.

      1. DrXym

        Re: Are you sure?

        It's not even a cast, it's a coercion because it's potentially lossy/destructive.

        Rust can cast within unsafe blocks (make something mutable, or mutate an object) but the only excusable time to do this is when dealing with hardware or some foreign function or structure.

        1. nijam Silver badge

          Re: Are you sure?

          > It's not even a cast, ...

          If you want to know how casts should work, read the Algol-68 manual.

      2. Anonymous Coward
        Anonymous Coward

        Re: Are you sure?

        Rust doesn't even allow an integer variable to be assigned the value from a shorter integer variable -- an explicit cast is needed (even though the value can never be truncated or otherwise corrupted). Such an assignment is always benign, and is accepted without even a warning in C++ (as it is in C). I'm not aware of any case in which Rust allows a coercion implicitly that C++ wouldn't.

        Why is this supposed to be a good thing. *even though the value can never be truncated or otherwise corrupted*

        Oh sign me up, a finger wagging compiler enforced purity test.

        1. ssokolow

          Re: Are you sure?

          What you're replying to is a misrepresentation of the situation. Rust has three different ways to change an integer's type, each with different properties:

          my_var.into() or TargetType::from(my_var) is implemented for conversions which can never fail or truncate. (If a type implements from(), it gets into() for free and .into() is the one typically used because, usually, the type inference can tell what type you need to convert to, so ".into()" is all you need.)

          If you try to use .into() for a conversion that can't be guaranteed to be lossless (eg u64 → u32), the compiler will give you an error about it not being implemented for that type signature.

          my_var.try_into() or TargetType::try_from(my_var) is implemented for types where the conversion can't be guaranteed to be losseless and infallible at compile time. They return Result<TargetType>, which is a data-carrying enum (tagged union with compile-time checks) that could be Ok(value) or Err(error).

          the "as" keyword can be used for lossy but infallible conversion that will truncate, zero-extend, sign-extend, etc. depending on the types you're casting between.

          "as" can be used in safe Rust, but Clippy (Rust's linter) has a lint that can be set to warn or error out on use of "as" in all or just certain parts of your codebase.

          ...which is in line with the Rust design philosophy of encapsulating unsafety to make auditing more viable.

          1. Anonymous Coward
            Anonymous Coward

            Re: Are you sure?

            Thank you for the clarification, it seems that a cast in rust returns an optional type like https://en.cppreference.com/w/cpp/utility/optional if I understand you

            I don't quite see the integer truncation or sign extension as being related to safety, can you give me an example where that might be considered helpful?

            Generally I'm quite laid back about integer truncation, and I'm not entirely sure why an optional type which I examine at runtime is bringing to the party.

            "as" seems like an implicit admission that the desire to enforce a certain style of development is not coming from mechanical sympathy. I suspect I'm missing something, so apologies if these are foolish questions.

            1. ssokolow

              Re: Are you sure?

              Almost. Rust's equivalent to std::optional<T> is Option<T>. Result<T, E> is Rust's alternative to raising an exception.

              (It's an idea from the world of functional programming referred to as "monadic error handling using sum types" and, in practice, it means encoding all return paths that aren't assertion failures in a function's type signature. Think of it like Java's checked exceptions but done with generics instead of an out-of-band extension to the type system.)

              As for truncation and sign extension being related to safety, it has to do with upholding invariants. Suppose the number is the reference count inside the implementation of a reference counted pointer type. Truncating it could result in the memory getting freed early, leaving dangling pointers. Rust is big on letting you teach the compiler to watch for broken invariants where practical.

              (For example, because the ownership-and-borrowing paradigm can be used to turn use of stale references into a compile-time error, it can be used to implement a design pattern known as the typestate pattern, which lets you enforce proper traveral of finite state machines at compile time and turn things like "Tried to set an HTTP header after the response body began streaming" into compile-time errors.)

              As for "as", it's an admission that, if your language is meant to span from very high level to very low level tasks, it's probably going to need to have two or more APIs meeting different needs.

              For example, Rust has a constructor for the String type which verifies that the resulting object upholds Unicode's well-formedness invariants and can be used anywhere, but, when profiling reveals that to be problematic and you know you're already checking that invariant elsewhere, it also has a constructor which must be called from within an `unsafe` block which is basically just a typecast from a byte vector to a String type.

              1. Anonymous Coward
                Anonymous Coward

                Re: Are you sure?

                (It's an idea from the world of functional programming referred to as "monadic error handling using sum types" and, in practice, it means encoding all return paths that aren't assertion failures in a function's type signature. Think of it like Java's checked exceptions but done with generics instead of an out-of-band extension to the type system.)

                I think you mean the "Either Monad", in boost its;

                https://www.boost.org/doc/libs/1_75_0/libs/outcome/doc/html/tutorial/essential/outcome.html

                It's proposed as std::expected but not standardised yet http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0323r3.pdf

                As for truncation and sign extension being related to safety, it has to do with upholding invariants. Suppose the number is the reference count inside the implementation of a reference counted pointer type. Truncating it could result in the memory getting freed early, leaving dangling pointers.
                A reference counted pointer type like std::shared_ptr is correct by contruction, it atomically increments, and decrements reference counts. There is no use case for assignment of mixed size integers in reference counting. It's either CAS which would use same sized quanties or inc/dec.

                (For example, because the ownership-and-borrowing paradigm can be used to turn use of stale references into a compile-time error, it can be used to implement a design pattern known as the typestate pattern, which lets you enforce proper traveral of finite state machines at compile time and turn things like "Tried to set an HTTP header after the response body began streaming" into compile-time errors.)
                https://github.com/nitnelave/ProtEnc

                Incidently, you *can* set headers after the body when you have a chunked transfer encoding in http/1.1

                Chunked encoding allows the sender to send additional header fields after the message body. This is important in cases where values of a field cannot be known until the content has been produced, such as when the content of the message must be digitally signed. Without chunked encoding, the sender would have to buffer the content until it was complete in order to calculate a field value and send it before the content.

                https://en.wikipedia.org/wiki/Chunked_transfer_encoding

                As for "as", it's an admission that, if your language is meant to span from very high level to very low level tasks, it's probably going to need to have two or more APIs meeting different needs.
                If your language spans very high to very low, you need more than one language.

                For example, Rust has a constructor for the String type which verifies that the resulting object upholds Unicode's well-formedness invariants and can be used anywhere

                What does this mean?

                UTF-8 has a known set of codepoints, and a well defined simple mapping, but "Unicode"?

                std::string utf8chr(int cp)

                {

                char c[5]= { 0x00,0x00,0x00,0x00,0x00 };

                if(cp<=0x7F)

                {

                c[0] = cp;

                }

                else if(cp<=0x7FF)

                {

                c[0] = (cp>>6)+192;

                c[1] = (cp&63)+128;

                }

                else if(0xd800<=cp && cp<=0xdfff) {} //invalid block of utf8

                else if(cp<=0xFFFF)

                {

                c[0] = (cp>>12)+224;

                c[1]= ((cp>>6)&63)+128;

                c[2]=(cp&63)+128;

                }

                else if(cp<=0x10FFFF)

                {

                c[0] = (cp>>18)+240;

                c[1] = ((cp>>12)&63)+128;

                c[2] = ((cp>>6)&63)+128;

                c[3]=(cp&63)+128;

                }

                return std::string(c);

                }

                A process boundary, the entire process crashes or not, is a safety boundary imo.

                1. ssokolow

                  Re: Are you sure?

                  The Either monad can be used to *implement* monadic error handling, but the two terms aren't interchangeable. However, it could be said that Result<T, E> is an Either monad with a convention for the meaning and purpose of the two arguments.

                  As for reference counted pointer types, the relevant detail is that Rust has a mechanism that allows you to restrict things to a single thread, and one of its two shared pointer types takes advantage of that to omit the atomics as a performance optimization.

                  Rust has Haskell-like typeclasses which it calls traits (basically interfaces that can carry default implementations if you're not familiar with Haskell jargon) and there are two marker traits, Send and Sync, that the compiler will implement automatically for any struct where all its members implement them. (Forcing an implementation of Send or Sync, like inside Arc<T> or Mutex<T>, requires `unsafe`)

                  APIs for creating threads or which can be used to pass data between threads require arguments to implement Send or you get a compile-time error.

                  Arc<T> is equivalent to std::shared_pointer, with the "A" standing for "Atomic", and does implement Send.

                  Rc<T>, on the other hand, does NOT use atomics but also does not implement Send, meaning any data structure which contains it will also not implement Send and cannot wind up being referenced from more than one thread at once without being wrapped in something which restores the Send bound.

                  That's why my mind went to the example of a data structure which uses a non-atomic for a reference count. (However, I could also imagine something like an integer used to offset a pointer or as an index into a Vec<T>.)

                  Anyway, for ProtEnc, I'm curious to know how it prevents you from storing a reference to an intermediate state and calling methods on it after you've transitioned out of it. That's what the ownership and borrowing checks prevent and, last I checked, C++ compilers didn't have a reliable way to make use of a moved value a compile-time error.

                  Remember that the design goal of safe Rust is for these sorts of things to be impossible if the `unsafe` Rust is implemented correctly, not merely "why would anyone ever do that?".

                  As for "Unicode invariants", String is UTF-8 internally but I said "Unicode" invariants because one could argue that the "unpaired surrogate codepoints are forbidden" requirement of UTF-16 is a subset of the "surrogate codepoints are forbidden" requirement of UTF-8.

                  I'd certainly agree that a process boundary is a safety boundary, but a process boundary wouldn't be enough to prevent something like Heartbleed. Rust is concerned with memory safety and data race prevention.

                  1. Anonymous Coward
                    Anonymous Coward

                    Re: Are you sure?

                    Thank you for the explanation, I'm not at all acquainted with Haskell.

                    As for reference counted pointer types, the relevant detail is that Rust has a mechanism that allows you to restrict things to a single thread, and one of its two shared pointer types takes advantage of that to omit the atomics as a performance optimization.
                    I don't see what that has to do with needing to used mixed size numbers in this use case.

                    Rust has Haskell-like typeclasses which it calls traits (basically interfaces that can carry default implementations if you're not familiar with Haskell jargon) and there are two marker traits, Send and Sync, that the compiler will implement automatically for any struct where all its members implement them. (Forcing an implementation of Send or Sync, like inside Arc<T> or Mutex<T>, requires `unsafe`)

                    This https://wiki.haskell.org/OOP_vs_type_classes seems to suggest roughly what I think you are describing but I'm a little confused by the naming..

                    I don't understand the module your are describing tbh.

                    #include <atomic>

                    #include <algorithm>

                    #include <utility>

                    #include <functional>

                    #include <iostream>

                    #include <thread>

                    #include <numeric>

                    #include <vector>

                    struct worker

                    {

                    template <typename T>

                    worker(T && callable)

                    : m_thread{

                    std::move(callable)

                    }

                    {

                    }

                    ~worker()

                    {

                    m_thread.join();

                    }

                    std::thread m_thread;

                    };

                    int main()

                    {

                    constexpr int max_workers = 4;

                    constexpr int slice = 1024 * 64;

                    size_t v1[ slice * max_workers] = {0};

                    size_t v2[ slice * max_workers] = {0};

                    std::fill(v1,v1+sizeof(v1)/sizeof(*v1),1);

                    std::fill(v2,v2+sizeof(v2)/sizeof(*v2),2);

                    {

                    std::vector< std::shared_ptr<worker> > workers;

                    for(int i = max_workers+1; (--i); )

                    {

                    workers.emplace_back(

                    std::make_shared<worker>(

                    [&v1,&v2,slice,i](){

                    int end = i * slice;

                    int begin = end - slice;

                    while(begin != end)

                    {

                    v1[begin] *= v2[begin];

                    ++begin;

                    }

                    })

                    );

                    }

                    // implictly join all threads

                    }

                    std::size_t dot_product = std::accumulate(std::begin(v1),std::end(v1), 0);

                    std::cout << "Got dot product of v1.v2: " << dot_product << "\n";

                    return 0;

                    }

                    This is a toy parallel dot product across multiple threads with final addition as the join.

                    It's thread safe, memory safe, and correct by construction.

                    Can you port it to rust to show the semantics please, I'm sorry to ask for homework, but I really am struggling with the haskell syntax and functional terminology.

                    It seems the Rust for C++ posts, don't know C++ very well. This might be a more informative resource

                    https://www.youtube.com/watch?v=sWgDk-o-6ZE (Sean is head sw scientist at Adobe)

                    This seems like it handles the typetrait case from the linked article..

                    std::vector<std::variant<char, long, float, int, double, long long>> // 1

                    vecVariant = {5, '2', 5.4, 100ll, 2011l, 3.5f, 2017};

                    // display each value

                    for (auto& v: vecVariant){

                    std::visit([](auto&& arg){std::cout << arg << " ";}, v); // 2

                    }

                    std::cout << std::endl;

                    That's why my mind went to the example of a data structure which uses a non-atomic for a reference count. (However, I could also imagine something like an integer used to offset a pointer or as an index into a Vec<T>.)

                    So integer with pointer should translate to displacement offset addressing in ASM, but there are specific types for this. https://en.cppreference.com/w/cpp/types/ptrdiff_t

                    Anyway, for ProtEnc, I'm curious to know how it prevents you from storing a reference to an intermediate state and calling methods on it after you've transitioned out of it.

                    I personally don't use that library.

                    That's what the ownership and borrowing checks prevent and, last I checked, C++ compilers didn't have a reliable way to make use of a moved value a compile-time error.

                    Why would using a moved value be a problem, you move the value, as it has a single owner.

                    So your typical use case for move only types, are sources and sinks.

                    so

                    std::unique_ptr<ExpensiveStruct> acquire_expensive_resource(){

                    return std::make_unique<ExpensiveStruct>();

                    }

                    move assignment operator or move constructor are responsible for this constraint.

                    Typically you just steal the handle to underlying resource. - So std::unique_ptr<T> has a T *ptr member.

                    A move constructor would make a copy of the value of the ptr, so now two unique_ptr<T>::ptr point at the same address. Then set the donor copy of ptr to nullptr (it may elide this, you are not supposed to use the moved from resource afterwards as it will be in an valid but unspecified state).

                    .

                    Typically you would return std::unique_ptr<T>/move only types from a function, which makes such improper usage impossible.

                    Remember that the design goal of safe Rust is for these sorts of things to be impossible if the `unsafe` Rust is implemented correctly, not merely "why would anyone ever do that?".

                    I must confess I don't see the design goal.

                    As for "Unicode invariants", String is UTF-8 internally but I said "Unicode" invariants because one could argue that the "unpaired surrogate codepoints are forbidden" requirement of UTF-16 is a subset of the "surrogate codepoints are forbidden" requirement of UTF-8.

                    In c++ we have

                    std::string is suitable for UTF-8

                    std::u16string is suitable for UTF-16

                    std::u32string is suitable for UTF-32

                    So fair enough, sorry for being pedantic, and thank you for many interesting points.

                    1. ssokolow

                      Re: Are you sure?

                      It doesn't strictly have anything to do with reference counted pointers, just that there is a reason that types other than atomics require those sorts of explicit conversions. There exist valid use cases for non-atomics where truncation of such integers might induce memory unsafety and Rust is designed around giving you as much ability as is reasonably possible to rule out abuse of your APIs at compile time.

                      As for typeclasses, my advice is to only turn to Haskell material as a last resort. It tends to be like consulting Wikipedia about mathematics.

                      I mentioned the name in case you were already familiar with them, but, if not, "a cross between interfaces and abstract classes" is a good enough conceptual starting point as long as you understand that structs in Rust are always plain old data.

                      Monomorphic/static dispatch is done by having methods be syntactic sugar for functions which take the struct as the first argument and dynamic dispatch is done by storing the vtable separately and using a fat pointer instead.

                      The Box<dyn Trait> and &dyn Trait parts of this slide explain that visually:

                      https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4

                      As for your example code, I'll give three implementations, because that kind of parallelism is typically handled using one of two libraries in real-world code, but you probably want to see it done using the primitives.

                      First, Rayon, which uses Rust's support for implementing new traits on other people's types to add parallel iterators:

                      https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=b933ba75d1db68e5f69009aa3e41501a

                      Second, Crossbeam, which provides an abstraction over the `unsafe` code necessary to assure the compiler that your threads won't outlive the backing store so you can hand out immutable slices without needing to maintain a reference count.

                      https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=3cacfac45bf302019efd19df481a7778

                      Third, using the Rust equivalent of std::shared_ptr and a less functional style closer to your code:

                      https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=9352a28cb8c111ee644ad1a8f6ce46b3

                      Apologies if any of these are sub-optimal. It's late and I spend most of my time writing I/O-bound code, so I haven't yet built up the strongest intuition for where LLVM's optimizers might fall short and I'm starting to get drowsy, so I didn't throw them into godbolt to check.

                      (Because I'm getting drowsy, I'll have to reply to the rest of what you said tomorrow.)

                      ...but thanks for the video. I'll see if I can find time for it within the next little while. (Lately, most of my video watching time has been for stuff where it's OK to listen with half an ear while I do something else.)

                      1. Anonymous Coward
                        Thumb Up

                        Re: Are you sure?

                        Thank you for the very comprehensive answer, the last link was indeed the easiest for me to understand, and it describes a few of my concerns with rust.

                        // Rust does bounds checks on indexing by default, but iterating downward is one way to prompt the optimizer to elide them aside, from using iterators, because if foo[N] is valid, then all values between foo[0] and foo[N] must also be valid.

                        If I understand this imposing the constraint that I have reverse iterators rather than the weaker constraint of forward iterators. What have I gained from this restriction, and can I avoid the tax?

                        Secondly, the code goes through the loop in reverse order to the cache, which is surprising, and slower, that it's faster than mandated bounds checking is not feeling like a good trade off.

                        for idx in (begin..end).rev() {

                        // Your algorithm gives me the sense it's going to suffer from false sharing problems, so I used per-thread mutable variables (https://www.youtube.com/watch?v=WDIkqP4JbkE)

                        accum += v1[idx] * v2[idx];

                        }

                        accum // Omit the ; and the last line in a block is a return

                        So this is not the same algo any more.

                        The multiplication only is supposed to happen in parallel, with the summation afterwards in serial.

                        You multiply and sum the subvectors, then sum the sub sums, which is mathematically the correct answer, but this is not about the maths, it's about fork / join parallelism. It's just the simplest example that demonstrates the point, and is not totally contrived. I never less applaud your creative hack.

                        Also the slices are 64K or 1024 cache lines, which would be aligned to a 64byte cache boundary, and not suffer from false sharing. There is precisely one single writer for each slice, and the forward direction means we will process full cache lines at time. So the natural usage, and the size provides the both the efficiency and clarity without sacrificing correctness.

                        https://www.sciencedirect.com/topics/computer-science/cache-line-size

                        Aligning data allocations to 64B boundaries can be important for several reasons. First, because the cache-line sizes on Intel Xeon processors and Knights Landing are also 64B, this can help to prevent false sharing for per-thread allocations.
                        (the same link shows looking up the correct cacheline size value at runtime with CPUID on x86-64 other platforms vary but the magic number exists as cachelines are the cause and fix of false sharing )

                        let result: u32 = (&v1 as &[_])

                        .par_iter() // <-- This swaps in Rayon's version of the usual iterator API

                        .zip_eq(&v2 as &[_])

                        .fold(|| 0, |accum, (x1, x2)| accum + (x1 * x2))

                        .sum();

                        Likewise this also sums the sub sums, and doesn't appear to do the work in SLICE sized blocks. I would personally find this quite difficult to review for security or clarity.

                        Where are the copies taking place, what are references, are there locks? It's seems that you "fork" with par_iter and "join" with fold.

                        I would expect the ASM to look very different from the source here, completely impossible to audit the generated code.

                        Second, Crossbeam, which provides an abstraction over the `unsafe` code necessary to assure the compiler that your threads won't outlive the backing store so you can hand out immutable slices without needing to maintain a reference count.

                        The backing store is declared after the threads, so it's correct by construction, as the stack is a LIFO data structure.

                        c1.iter().zip(c2).map(|(x, y)| x * y) see this seems closest to the spirit if I read it correctly, but again you seem to be making another copy with zip, I can see why coming from a python background you like it. It's rubyish enough for me to read map block syntax.

                        Also you now have to remember to call handle.join() explictly rather than correct by construction allowing the workers Destructor to call join.

                        Apologies if any of these are sub-optimal. It's late and I spend most of my time writing I/O-bound code, so I haven't yet built up the strongest intuition for where LLVM's optimizers might fall short and I'm starting to get drowsy, so I didn't throw them into godbolt to check.

                        Firstly thank you again for showing these examples, they are much appreciated. Secondly you've no need to apologise, you are going out of your way to educate and enlighten, it's a goodness ;)

                        1. ssokolow

                          Re: Are you sure?

                          Anything that proves to the optimizer that the bounds checks are unnecessary will elide them.

                          The iterator APIs are the idiomatic in-standard-library way to get guaranteed omission of bounds checks without using `unsafe` to access the unchecked operations. (But Rust does offer unchecked operations.)

                          When I said it was late, I neglected to mention how tired I was. Iterating in reverse was just uttering an incantation without thinking to check if it was necessary.

                          The other common way to remove bounds checks without using iterators is to use the `assert!` macro to assert that the array is long enough before doing your indexing. (`debug_assert!` is the macro which gets omitted from release builds in Rust.)

                          On a related note, simplest way to explain the split between things with and without `mut` is that, rather than forbidding mutation like functional languages, Rust solves the problem of provably correct shared mutability by annotating every reference or slice (except the "raw pointer" which you can only dereference inside `unsafe` blocks) with zero-overhead compile-time reader-writer lock semantics. (It's purely a type system thing that constrains the set of programs considered valid. Nothing visible at runtime.)

                          Also, it's been a while since I watched that Scott Meyers talk I linked, but I remembered him saying that the CPU will prefetch forward or backward equally well/poorly. That and the aforementioned tiredness is why I did what I did.

                          (I have to admit that Rust is the first non-scripting language I've done any non-trivial, non-coursework stuff in, so I have a lot of book learning but very little practical experience writing CPU-bound code.)

                          Speaking of being tired, that's also why I said false sharing. I was actually thinking of a mud-ball of concepts relating to false sharing and cache associativity effects when I wrote that and grabbed the first term that came to mind.

                          That said, you make a good point.

                          As for "my creative hack", I can't take credit for that. It's a recurring pattern that's popular in both imperative languages, because it minimizes the amount of cross-thread synchronization needed, and in functional languages because it lends itself well to one-way data flows.

                          As for Rayon, slice-sized blocks, security, and clarity, Rayon is designed to be a very high-level API for parallelism that's trivial to drop in and then profile to see if you really need to put in the effort to hand-roll something or if your bottlenecks lie elsewhere. (In some cases, it's literally just "Add a line for Rayon in `Cargo.toml`, `use` the trait, and change `.iter()` to `.par_iter()`.)

                          That's why I included it... not because it's the best way to do a fork-join dot product, but because it's the best answer for many other fork-join parallelism problems, so it's in common use.

                          It will do slice-sizing calculations similar to what I did in the other examples and, if I hadn't specified a number of threads, it would decide automatically based on the number of CPU cores in the system.

                          Clarity-wise, I think that's just a matter of familiarity. The iterator API that Rayon provides a parallel version of is about as standard as can be in functional programming languages.

                          As for security, I won't be foolish enough to say that security isn't a problem in code written by a human, but I think you're underestimating the degree of correctness Rust can assure as long as the code inside your `unsafe` blocks is sound.

                          One article I'd suggest looking at for that is The Problem With Single-threaded Shared Mutability by Manish Goregaokar, which shows how Rust's ownership and borrowing system statically prevents things like iterator invalidation or using a stale reference to the wrong variant of a tagged union.

                          As for copies, references, and locks, funny enough, your concerns with Rayon are exactly the reason I have no interest in pure functional languages like Haskell, so I understand where you're coming from, even if I'm a little more permissive about it.

                          The Rayon version is algorithmically the same as the crossbeam version, just formulated via an iterator API.

                          1. `par_iter()` will set up to dispatch the work to a pool of threads, each of which will receive an iterator over part of the source sequence. Each will hold a reference to the original sequence and no locking or reference counts will be used because it can be statically guaranteed that they're unnecessary.

                          2. `fold()` takes two functions which will be inlined: A constructor to build per-thread accumulators and a function defining the operation to run once for each item in the sequence.

                          3. `sum()` Takes the sequence of per-thread accumulators and sums them together. I haven't looked at Rayon's internals but, in the normal iterator API, this is what would drive the lazy iterator to produce output.

                          That said, I've only used Rayon for tasks like image thumbnailing, so I'm not an expert on its performance characteristics.

                          (This sort of CPU-bound high-performance coding isn't my usual fare. In fact, after 20 years of mostly Python, I'm still unlearning a habit of dismissing hobby project ideas that would require me to write new CPU-bound code (rather than glueing together things like PyOpenCV) before they reach my conscious awareness.)

                          As for the stack being LIFO, there aren't enough invariants encoded in Rust's type system for the minimal wrapper around OS-level threading primitives to be statically verifiable without something like a reference-counted pointer for the backing store.

                          Safe rust errs on the side of "prefer to reject valid programs over accepting invalid ones" and the `unsafe` keyword is how you flip that and say "you don't know how to verify what I'm saying, but this has been manually audited so trust me". It's meant to be used to build abstractions like `crossbeam::scoped` so that as small a portion of the codebase as possible needs manual auditing to be memory safe.

                          (`unsafe` allows you to call functions/methods marked `unsafe`, dereference the raw pointers that abstractions like `Arc` are built using, implement traits marked `unsafe`, and perform FFI calls. From the perspective of Rust coders, a C++ program is written inside one giant `unsafe` block.)

                          As for having to call `handle.join()` explicitly, I wrote it that way to take advantage of how `handle.join()` can have a return value, making a thread more like a function that's run in the background. The Crossbeam approach is probably the most idiomatic one for statically guaranteeing all threads get joined by a certain point.

                          That said, it's definitely possible to implement joining threads your way. To specify custom destruction behaviour outside what the built-in RAII gives you, you implement the Drop trait.

                          Example

                          Now, to reply to the rest of your previous message...

                          Sorry for being confusing. "Use of a moved value" is Rust terminology for using the moved from resource.

                          In Rust, there are no move or copy constructors. A move is semantically equivalent to a memcpy but with attempts to access the original copy being a guaranteed compile-time error. Move semantics are default, and you either apply the `Copy` marker trait to a struct to get copy semantics via memcpy (which is forbidden for structs that implement `Drop` and intended for things that are cheaper to memcpy than to take a reference to, such as a struct representing a pixel) or you implement the `Clone` trait, which serves a similar role to a copy constructor but is invoked explicitly via the `.clone()` method so potentially costly operations are explicit.)

                          Example

                          If all members of your struct implement `Clone`, you can put `#[derive(Clone)]` on your struct to auto-generate a default implementation which clones all members by calling their `Clone` implementations.

                          (Try changing `#[derive(Debug)]` to `#[derive(Debug, Clone, Copy)]` in the example I linked, or changing it to `#[derive(Debug, Clone)]` and `let dummy2 = dummy;` to `let dummy2 = dummy.clone();`.)

                          ...and thank you for listening. I always enjoy explaining things to people who honestly want to know.

                          1. Anonymous Coward
                            Thumb Up

                            Re: Are you sure?

                            Anything that proves to the optimizer that the bounds checks are unnecessary will elide them.

                            Sometimes state is not visible. This means we need to anonate with a storage class specifier "volatile".

                            volatile int foo = 1;

                            volatile int bar = 0;

                            bar = foo;

                            This says I don't remove dead stores, or redundant loads in conceptual model where hardware disagrees, MCU has bugs, hardware has bugs, software must make the hardware present a sane interface.

                            The iterator APIs are the idiomatic in-standard-library way to get guaranteed omission of bounds checks without using `unsafe` to access the unchecked operations. (But Rust does offer unchecked operations.)

                            So the bounds check, I'd prefer an opt-in model.

                            On a related note, simplest way to explain the split between things with and without `mut` is that, rather than forbidding mutation like functional languages, Rust solves the problem of provably correct shared mutability by annotating every reference or slice (except the "raw pointer" which you can only dereference inside `unsafe` blocks) with zero-overhead compile-time reader-writer lock semantics. (It's purely a type system thing that constrains the set of programs considered valid. Nothing visible at runtime.)

                            C++ and C maintain the notion of const correctness, which since C11 and the standardizing of the C11 memory model into C++11, means read-only thread-safe. Const provides a data flow protection to reject invalid data flow in this case using modern constexpr (since C++11, and much expanded into compile time functions). The std library is a security boundary.

                            C++ and C have the notion of "correct by construction". Correct by construction code is correct, so the problem reduces to using the standard constructions.

                            “… to prevent data races (1.10). … [a] C++ standard library function shall not directly or indirectly modify objects (1.10) accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function’s non-const arguments, including this.”—ISO C++ §17.6.5.9

                            https://herbsutter.com/2013/05/24/gotw-6a-const-correctness-part-1-3/

                            Also, it's been a while since I watched that Scott Meyers talk I linked, but I remembered him saying that the CPU will prefetch forward or backward equally well/poorly.

                            This is not the case for all or even most hardware, it generally has a preferred method of access that is correct or doesn't impose an cost by virtual of the underlying design choice.

                            The iterator API that Rayon provides a parallel version of is about as standard as can be in functional programming languages.

                            The issue is that C programmers and C++ programmers see the compilers as an

                            assembly language generator, so being able to opt into high level expressive

                            features, is fine, but it must be possible to write efficent code and predict

                            the output assembler. https://stackoverflow.com/questions/7919304/gcc-sse-code-optimization Is a good thread, showing a nice snapshot of C++ culture.

                            As for security, I won't be foolish enough to say that security isn't a problem in code written by a human, but I think you're underestimating the degree of correctness Rust can assure as long as the code inside your `unsafe` blocks is sound.

                            One article I'd suggest looking at for that is The Problem With Single-threaded Shared Mutability by Manish Goregaokar, which shows how Rust's ownership and borrowing system statically prevents things like iterator invalidation or using a stale reference to the wrong variant of a tagged union.

                            Aliasing that doesn’t fit the RWLock pattern is dangerous. .. Let’s step back a bit and figure out why we need locks in multithreaded programs. The way caches and memory work;

                            we’ll never need to worry about two processes writing to the same memory location simultaneously and coming up with a hybrid value, or a read happening halfway through a write.

                            This is overly simplistic and incorrect for all but the simplest cases, his c++ ignores const, which even in c++98 catches problematic usage. This is from https://manishearth.github.io/blog/2015/05/17/the-problem-with-shared-mutability/

                            https://godbolt.org/z/s1TzeG

                            .

                            In a language like C++ there’s only once choice in this situation;

                            that is to clone the vector1. In a large C++ codebase if I wished to use a pointer I would need to be sure that the vector isn’t deallocated by the time I’m done with it, and more importantly, to be sure that no other code pushes to the vector (when a vector overflows its capacity it will be reallocated, invalidating any other pointers to its contents).

                            https://manishearth.github.io/blog/2015/05/03/where-rust-really-shines/ His opinion on C++ really cannot be taken seriously since he doesn't understand the difference between the internal pointer to storage held within a vector, and the "aggregate of fields" that represent the vector. The vector reallocating the memory pointed at by the internal pointer, doesn't change the address of the "aggregate of fields" which defines the representation of a vector. eg.

                            // simple possible definition just to show the types of the T and vector<T>

                            template <typename T>struct vector{ size_t cap; size_t len; T *ptr; };

                            vector<int> vec{};

                            vector<int> *vec_ptr = &vec;

                            int *int_ptr = vec.ptr;

                            As you can see these are different types, the rest of his article is riddled with howlers, suffice to say, not clear on threads or c++

                            As for copies, references, and locks, funny enough, your concerns with Rayon are exactly the reason I have no interest in pure functional languages like Haskell, so I understand where you're coming from, even if I'm a little more permissive about it.

                            ...

                            As for the stack being LIFO, there aren't enough invariants encoded in Rust's type system for the minimal wrapper around OS-level threading primitives to be statically verifiable without something like a reference-counted pointer for the backing store.

                            Safe rust errs on the side of "prefer to reject valid programs over accepting invalid ones" and the `unsafe` keyword is how you flip that and say "you don't know how to verify what I'm saying, but this has been manually audited so trust me".

                            ...

                            From the perspective of Rust coders, a C++ program is written inside one giant `unsafe` block.

                            C++ is a very safe language, idomatic usage tends to avoid all problems. It seems like most of rust is inside the subset blessed by the compiler, and outside of that subset not much is offered in protection.

                            Where are C++ is very safe in the subset that rust seems to have just fenced off as unsafe, which is fairly judgmental naming ;)

                            Thanks for reading this far and pointing me at that article, it did make me chuckle.

        2. DrXym

          Re: Are you sure?

          It's a good thing because it forces you as a programmer to use the right type in the first place or to consciously write code to coerce between types.

          C/C++ might throw up some warnings but would otherwise not care much. It is quite typical to read some code in C which uses signed ints as a loop counter where the upperbound is a size_t or something. If the language were more strict it wouldn't happen.

          1. Anonymous Coward
            Anonymous Coward

            Re: Are you sure?

            that is a signed/unsigned mismatch which gcc and clang should flag.

            UBSAN and cppcheck would catch that, most linters would get that.

            You have to explicitly disable that with GCC with -Wno-sign-compare and if you compile with WError then all warnings are fatal and you will catch that everytime.

            The issue is bad code that nobody wants to clean up, so people don't run with warn free code on highest settings with diverse compilation, as much as they should.

            1. DrXym

              Re: Are you sure?

              As I said, the compiler might throw up some warnings and lint / static analysis might notice it after the fact. But it would still compile. And that's just a trivial example.

              Rust's compiler would tell you to GTFO. It would tell you where in the code you're going wrong, marking it for you, and suggest what you should do to fix it. And because the language has proper auto typing (far better than the auto keyword in C++) as well as proper functional iterators chances are you wouldn't screw up types to start with.

    3. DrXym

      Re: Valgrind says 'hi'

      Valgrind implies you already have compiled code and only after the fact realise it's leaking. Maybe you the programmer found the leak but it may well be the (very unhappy) customer who finds the bug.

      Then of course just *finding* the leak is one thing but fixing it could be another. Maybe you're lucky and it's a function that forgot to free something. Or you could be unlucky and it's a broken ownership model which takes even more time to fix properly.

      Either way it's hours or days wasted for something the compiler could have done by telling you to fix your broken code.

      And frankly I find the defensive attitude kind of weird to be honest. C programmers of all people should *get* Rust. I've programmed C and C++ pretty much for all my entire career (and other languages) and I get it. It is a low level systems programming language that is compatible with C (it is trivial to call C from Rust or Rust from C), uses the same backend as Clang (LLVM), runs as fast as C and means their code will be better. What is the problem?

      1. druck Silver badge

        Re: Valgrind says 'hi'

        The defensive attitude as because once you've been in the industry for a while, you see that every five years or so someone comes along, and says "you don't want to use that language any more, its crap, you are crap, look at the dozens of bugs in (several million lines of) your code. You want to use #newshiny instead, its so much better, look I've written half a screen full of code with no memory leaks whatsoever."

        Some of these new languages may have significant advantages and become popular, others wont. But you can be sure that in a few years time, someone will be trashing the investment you've made in Rust and suggesting you rewrite everything in #newevenmoreshiny.

        1. DrXym

          Re: Valgrind says 'hi'

          Personally I think the defensiveness comes from C and C++ being under assault for years and now something comes along that negates the few remaining reasons for using C.

          And nobody, let alone me is advocates rewriting *anything* in Rust for the sake of it. If however you are rewriting something because it needs to be rewritten or writing something from scratch then Rust is compelling.

          1. Anonymous Coward
            Anonymous Coward

            Re: Valgrind says 'hi'

            In what way is it compelling?

            C++ and a scripting language for the win.

            Why am I using rust again when I can use C++ ?

          2. Electronics'R'Us
            Holmes

            C has many reasons to use it

            In microcontroller embedded development (small standalone systems with perhaps a serial interface) C is perfect.

            Let's look at the 2 major problems that can crop up.

            Dynamic memory management. In these types of products that is simply not an issue. Any buffers are declared using a const (not malloc()ed). This is pretty much standard practice in this area.

            Buffer overflows. Pretty trivial to catch. I already know the size of the buffer(s) involved (because they are defined by a const rather than try and keep track through an assigned variable) and can easily bounds check them. In many function calls, I actually specify the buffer size being passed by reference.

            The one area that could catch someone out is communication buffers. Once more, it is trivial to simply stop receiving data if an overflow would occur (once more, because I know the size of the buffer).

            Yes, there are parts of C that can produce unsafe code but there are many applications where those issues simply do not exist.

            Given that if I am writing a bare metal application I will, of necessity, need to use pointers because that is precisely the use case for internal dedicated hardware registers. Provided the headers have been properly written (doesn't happen every time, admittedly, but that can be checked via proper testing) then I use the definitions from the headers and not magic numbers.

            Oh, and -Wall.

            Certainly there are things I would not necessarily write in C but remaining aware of the pitfalls (particularly if I am maintaining some older code) keeps things clean.

        2. Anonymous Coward
          Anonymous Coward

          Re: Valgrind says 'hi'

          And the, "I already run static/dynamic analysis tooling, two separate compilers, on three platforms."

          I can produce bit for bit identical binaries going back to as many verified chain of dependencies as I care to name. In C++ this is reasonable, in C it's out of the box.

          I don't see why we're expected to trade that for slow compilers and in two articles over several days, not a single example of something that couldn't be written well in C++ or C.

          Just again and again the assertion, "your tool is rubbish.". Okay, use another one then mate, it's a free keyboard after all.

          I see the poor java programmers at work doing cargo cult changes from onejar to shadowjar or possibly the otherway round to accomplish getting the sodding code to compile.

          I see the golang programmers needing to upgrade the compiler everytime something new is installed.

          Rust builds are a "thief of time", however unlike C++ or C, you can't fix the problem.

          But at least you'll never assign a byte to a quadword to idomatically set the bottom byte rather than oring in byte that you've pointlesslly expanded to a mask to shut the sodding compiler up.

          This is the point when voice are raised, pointed questions, like why won't you emit "movzbq 0x01(%eax,%ecx), rax" It's right there in the ABI, this was supposed to be easier, my life was supposed to be better.

          Oh you mean X = (y*0x0101010101010101) & (~0xffffffffffffff00))); That's so much more readable and covers my intent so much better.

          That take a byte, fans it out to the fullword, clears those bits, effectively zero extending the byte to be the correct size. Only in slow code rather than the fast single instruction specifically for this which is emitted in C by X = y Seems like I'm pretty clear on what I want to do here..

    4. StrangerHereMyself Silver badge

      Re: Valgrind says 'hi'

      The problem is that the C++ language doesn't enforce good usage and that's basically what Rust brings to the table.

      I've seen many so-called C++ programs which were really just C programs with some OO-sauce added to them.

      With C++ programs the memory corruption problems were potentially solved decades ago, with RAII, but the dire fact it that no one uses it consistently in their programming.

      The wxWidgets framework consistently uses RAII and AFAIK there have been few, if any, memory corruption bugs found in it.

  7. Anonymous Coward
    Anonymous Coward

    As a low level programmer, working with C and assembler, I think that people who want a "safe C" are missing the point.

    C is not and never really has been a high level language like Pascal and all the modern object oriented languages that came afterwards.

    C is basically a platform-independent and more readable way of writing assembler code. If that is what you want, then C is brilliant at it. Better than any other language. This is why it has been around for so long and probably always will be. It's absolutely perfect for low level manipulation of data and pointers and registers. The only way to make a "safe C" is to take away the very features that make it so good.

    Over the years people have written all kinds of large applications that C was never really designed for. This was exacerbated by C++, a language which made it possible to write incredibly complex applications and almost impossible to do so correctly. It is not the fault of C that those applications have problems. It is simply because somebody chose the wrong tool for job in the first place. Probably because C programmers were cheap and plentiful at the time.

    1. StrangerHereMyself Silver badge

      The problem is that there are legions of developers who insist on using the absolutely fastest programming language there is. And C, because of its lack of safety features (safety generally means overhead) fits that bill.

      Most of those programs could've been easily written in Pascal or some other safe application programming language, but they insist on using a systems programming language for their trivial applications for the sake of speed.

      Since people can't be waned off this we need a safe systems programming language which everyone knows will be used for application programming.

      1. Roland6 Silver badge

        >Since people can't be waned off this we need a safe systems programming language which everyone knows will be used for application programming.

        Rust more than just a "safe systems programming language" there is a good comparative review here: Using Rust for an Undergraduate OS Course

        I think part of the problem is that C back in the 1980's was seen as new and many only learnt Unix and C, rather than a handful of languages and so had little exposure to application programming languages. Employers, not wishing to invest, decided to simply allow projects to be coded in C - given those skills were readily available...

  8. Julz
    Unhappy

    Continuing Issue

    "C programmers were cheap and plentiful at the time"

    s/C /Javascript /

  9. Ilsa Loving

    Can't disagree

    I can't say I disagree. IMO we should be moving all system code to Rust. Rust is the single best programming language to come about since C itself. Fully compatible with C so you can link it to almost all other languages, but is specifically designed to force safe coding practices.

    1. yetanotheraoc Silver badge

      Re: Can't disagree

      Rust the language is great.

      The world needs more code than the talent pool of good C coders can provide. So we need a safer language that allows to draw on the anti-talented pool of "other" coders to make up the gap. (I put myself in the "other" category.)

      My problem with Rust is the platform is not stable. It has a repository which I'm supposed to update from nightly. So in order to get this great type safety and compiler safety I'm supposed to accept this unstable and unsafe library environment. Not a great tradeoff.

      1. ssokolow

        Re: Can't disagree

        I'm not sure what you mean by "supposed to update from nightly".

        Yes, there *are* some projects which rely on language or standard library features that aren't yet covered by the API stability promise, and those require you to build them using the nightly-channel compiler to acknowledge that you accept the risk, but they're a shrinking percentage as more language features get stabilized.

        For example, a basic subset of constant generics support just got marked as stabilized in nightly channel in December and is now riding the release trains down to stable channel.

        Personally, the only things I've ever needed the nightly release channel for are:

        1. rustfmt, because I'm stubborn and some of the formatting knobs needed to force it to match the coding style I'm used to haven't yet become API-stable.

        2. Miri, the execution engine for running `const` functions at compile-time (Mid-Level Intermediate Representation Interpreter), which currently needs nightly channel if you want to run it separately from compilation as an analogue to LLVM's UBSan. I run it on the test suite of every dependency which uses `unsafe` as part of deciding whether it's suitable for purpose.

        1. yetanotheraoc Silver badge

          Re: Can't disagree

          Yes, but you missed the point. If Rust is supposed to paper over the programmer's inability to write memory safe code, then you just moved the problem domain into the programmer's inability to use the library in a repeatable way. If there's a way to build against an "API-stable" version of Rust then I missed that memo, and I'm sure many others will make the same mistake.

          1. ssokolow

            Re: Can't disagree

            Rust is not supposed to "paper over the programmer's inability to write memory safe code". Rust is supposed to allow you to move enforcement of invariants from the programmer to the compiler, so that it can watch your back on that day when you didn't sleep well, or you got sick but didn't have any sick days left, or the boss put some new idiot on your team who changed something half a codebase away and broke an invariant.

            (Rust is designed around the reality that people writing secure code have to get things right every time, while attackers only have to succeed once.)

            As for API stability, with the exception of code that's discovered to rely on compiler or standard library bugs, the Rust developers rigorously stick to "The Rust API Stability Promise"... a promise that anything that developed against any stable-channel compiler version back to v1.0 will continue to build on any future version.

            They're so dedicated to this that they have a bot called Crater running on cloud hosting (donated by Microsoft, I think) which, if commanded, is capable of building and running the unit tests for any selected subset (including all) of packages registered on the crates.io package repository against a proposed patch that "should be fine", and have, in the past, postponed plans to improve things until they could resolve things with developers who had found ways to depend on internal implementation details.

            Now ABI stability, that might be what you're thinking of. Rust's ABI is currently unstable (same as C++'s, technically... though C++ is more reluctant to make changes there), so Rust binaries are statically linked unless you use Rust's equivalent of "extern C"... though there is a crate (package) named `abi_stable` which does Rust-to-Rust FFI through the C ABI for the purpose of things like runtime-loadable modules.

            1. yetanotheraoc Silver badge

              Re: Can't disagree

              I didn't downvote you.

              You know a lot about Rust, and no doubt know more about programming than I do. Since I'm a little fuzzy on the difference between API and ABI you are probably correct there as well. Still I stand by my initial assessment of Rust (good) and the Rust platform (less good). And crates just give me the willies. I looked at some of them, am I supposed to rely on them? I couldn't make myself do it.

              I know enough C to write a function, but would never dream of writing a large program in C. Too painful. I think I could code a large application in Rust but for now I'm sticking with other languages. We need something like Rust but to repeat myself I'm not happy with some of their choices.

              1. ssokolow

                Re: Can't disagree

                Basically, API is to source code as ABI is to machine code.

                Rust promises source code compatibility from v1.0 into the future, but, if you don't use Rust's equivalent to "extern C", they reserve the right to revise things like how structs get laid out in memory and the Rust-to-Rust calling convention for functions.

                That's why the default is to statically link all the Rust code into a single binary.

                In fact, they already did that at least once, when they taught the compiler to automatically reorder members in structs not marked "#[repr(C)]" to minimize the amount of padding imposed by alignment restrictions.

                As for your opinion on the platform and their choices, that's fair and I can respect that.

      2. Phil Lord

        Re: Can't disagree

        This is true. Rust has gone the approach of having a small standard library and then a dependency mechanism. This avoids the issue that Python has with a big standard library a lot of which is covered in rust (sorry, bad pun). And, so while the core of Rust is relatively stable (not compared to C) and has strong promises about forward compatibility, it's hard to get far using just Rust the language and standard library.

        There needs to be an additional layer of metadata in the dependency ("crate") infrastructure that lets you know which crates are stable enough and which going to provide future stability promises. This would allow you to get some idea of how long the software that you are building on is likely to be supported for.

        At the same token, Rust version moves are generally pretty easy to manage, and were fairly well tooled. They haven't got to the Python2/3 (or Perl 6!) situation so far.

  10. StrangerHereMyself Silver badge

    Rust is taking over

    It's about time too. I'm sick of all those C programmers who keep telling me *they* can write code without memory management errors.

    I've been writing C / C++ code for almost 30 years and even I do not believe I can write C code flawlessly all the time (certainly not the code I wrote 20 years ago).

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like