back to article Arm rages against the insecure chip machine with new Morello architecture

Arm has made available for testing prototypes of its Morello architecture, aimed at bringing features into the design of CPUs that provide greater robustness and make them resistant to certain attack vectors. If it performs as expected, it will likely become a fundamental part of future processor designs. The Morello programme …

  1. b0llchit Silver badge
    Alert

    David Weston, director of Enterprise and OS Security at Microsoft, said that memory safety exploits are one of the longest-standing and most challenging problems in software security.

    Translation: writing software is hard, very hard. Writing good and secure software is even harder.

    It is ok to let hardware "help" with some problems. However, the main focus should be educating the software writers. There is a reason why C/C++ are popular languages (because they do specific jobs very well). But simply leaving the checks to hardware is a potential dangerous path too. The less brilliant programmer might get the impression that the hardware will think for the programmer so the programmer can think less. That would be a path to destruction.

    1. Mishak Silver badge

      The problem with C and C++ (possibly to a lesser extent when well implemented), is there is virtually no runtime error checking. This means it is very easy to run off the ends of something like a C style array, as these are accessed (at the machine level) using a pointer (possibly with an offset).

      I have for years thought it would be beneficial if the CPU had registers dedicated for pointers, so that automatic protection could be built in that would do something like trigger a hardware trap if the pointer ever held an invalid value. For example, the pointer "register" would include the upper and lower bounds, so that any dereference of the pointer value would lead to an exception (hardware or software). This would obviously require some work within the compliers and would really need pointer provenance adding to the relevant language standards.

      The real issue with the way things are is that programmers know what needs to be done, but it is very, very easy to make one or two mistakes when working on large, complex applications where the combinatorial explosion of control flow paths means it is not possible for a human or analysis tool to be able to detect all possible failures. Lots of explicit bounds tests can be added to try and stop them, but that can lead to serious runtime performance penalties.

      1. Flocke Kroes Silver badge

        Or you could use valgrind right now.

        1. Conor Stewart

          Just writing good software doesn't provide much defence against malware or hackers so a hardware solution to stop unauthorized access is a good thing.

          1. Alan Brown Silver badge

            The problem is that mediocre writers will use the hardware as a crutch instead of as a fire extingusher

            Defense in depth. Securing software after it's written is a fool's errand and relying on externals to secure the system is an exercise in futility - there is ALWAYS a hole and the issue is ensuring the various holes don't all line up

            1. Anonymous Coward
              Anonymous Coward

              Then software security is itself a fool's game, as there's no real assurance of getting things right the first time, especially if sabotaged by a well-resourced adversary like a state actor. If you can't rely on securing it in the design phase, and you can't rely on securing it after the fact, where does that leave you?

        2. fg_swe

          Valgrind

          A valgrind program execution takes 10 to 100 times more CPU time. Use an efficient memory safe language such as Sappeur and the penalty will be in the order of 10%.

          1. Dan 55 Silver badge

            Re: Valgrind

            Valgrind is obviously only supposed to be used in development environments as it's a development tool. Once you have caught the bugs, the penalty in the release version is in the order of 0%.

            1. fg_swe

              Production Runtime Bugs

              Typically, programs can not be exposed to all theoretically possible inputs during the test+validation phase. So some programming errors will only show up during productive runtime. For example, when little Ivan from Tomsk enumerates all possible inputs.

              See Sir Tony Hoare on the issue of runtime checking.

      2. Brewster's Angle Grinder Silver badge

        16-bi and /32-bit x86 had an instruction to do this BOUND (first appears in 286, IIRC). It was never used and not propagated to 64 bit.

        1. _andrew

          286 bound instructions on one hand, and segment memory registers on the other, were an attempt at a cut-down (and therefore viable) version of the i432 project, which was indeed supposed to be a capability machine along the lines of the IBM AS400.

          Bounded segment registers seem to have always had an impedance mismatch to real software, usually unable to deal nicely with large data structures, like frame buffers. (The 286's segment bounds were limited to 64k, which to be fair was quite a lot at the time.)

      3. martinusher Silver badge

        Its a trade-off

        There's nothing stopping you implementing variables as objects that include access methods (or operators) that check on the legality and validity of what you're doing. There will be a performance and space penalty if you do this. That's why we limit these checks to design testing and control over interface functions.

        You can also break down memory into segments and access everything indirectly through those segments, placing limits on segment size, segment usage (data, code or stack for example) and strictly controlling who gets to access segment allocation and control by segregating system from user code and tightly controlling who gets to run code that allocates, de-allocates and modifies segments. I can't speak for Intel's latest and greatest but this has been in the Intel processor architecture since the 80286 (~circa 1984) and used correctly can make managing and debugging code really easy. Unfortunately nobody to my knowledge does use it -- common code bases are designed for a non-segmented architecture so the best you're going to get is everything put in a big user segment with consequent anarchy about who points where and who gets to execute data. (As a bonus segment word width is set as a segment property so there's no need to rebuild the entire system around a particular word size.)

        Its difficult finding non-marketing information about Morello but it appears to be a reverse segment table -- instead of memory being described in terms of segment:offset translated to physical address the physical address is translated to segment:offset, matched with properties and access type and an exception is thrown if there's an issue with the access. Quite a good idea but with all those Intel users out there why have we never figured out how to use their segmenting model?

        1. Charles 9 Silver badge

          Re: Its a trade-off

          Their segments were a relic of Real Mode and were of a fixed size (64KiB, IIRC, due to the registers at the time being 16-bit). Segment jumping made for a major performance hit, much like context jumping does today, so Protected Mode with its flat memory model without all the segment jumping was adopted relatively quickly for performance and large-memory applications.

        2. J.G.Harston Silver badge

          Re: Its a trade-off

          That's exactly what I was pondering. They've re-invented segmentation registers. It even pre-dates the 80x86, the memory mapping in the PDP11 does it, with the Morello improvement of not being fixed sizes.

        3. dajames Silver badge

          Re: Its a trade-off

          You can also break down memory into segments and access everything indirectly through those segments, ...

          Segments -- they're called "selectors" these days -- are still used in 32/64 bit x86 code, but they are used by the OS to separate the address spaces of different processes, and so operate at a much coarser granularity than you seem to have in mind.

          It's the wrong solution for array-bound checking, because loading a selector into a register is a fairly slow operation and would cripple performance compared with the smaller overhead of bound checking instructions emitted by the compiler.

          It is worth noting, though, that in some languages -- in C in particular -- there is not enough information available to the compiler for the generation of runtime checks. Other languages (such as C++ (when not used simply as a "better C") or Rust) are able to make the necessary information to the compiler, so runtime checking is possible.

        4. fg_swe

          Sir Tony Hoare

          We already had very nice Algol mainframes completely with lots of runtime checks. Then came the Hamburger Approach to computing in the form of Unix+C.

          https://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare/

          1. Dan 55 Silver badge

            Re: Sir Tony Hoare

            I don't know what the hamburger approach is, but the link you've given states Algol had null pointers before C.

            1. fg_swe

              Null Pointers Are Not The Worst

              A null pointer will result in a deterministic crash. What is much worse is heap corruption from invalid pointers, which can modify program heap in all sorts of weird ways. Attackers also love these pointers.

            2. fg_swe

              Hamburger Computing

              Hamburger computing is to focus on cost and execution speed only. Disregarding safety and security.

              Like eating fast food and being amazed about cost efficiency, while wondering where all the health issues come from.

      4. nintendoeats Silver badge

        The problem with C/C++ is that it doesn't runtime bounds/pointer check.

        The good thing about C/C++ is that you don't have to pay the performance cost of a runtime bounds/pointer check.

        Even if there were dedicate hardware to bounds/pointer check, that would still consume power and die space, so in effect there is still a cost. Hopefully it would be a low cost, low enough that high performance computing applications can feel good about it.

        1. Dan 55 Silver badge

          C++ is not C, there are references and bounds checking in C++.

          1. nintendoeats Silver badge

            Yes, optionally. I look at (and in fact write) a lot of C++ code that does not have bounds checking.

          2. jrtc27

            References can be, and often are, turned back into raw pointers, and nothing stops you using the unchecked parts of C++. How many times have you use the unchecked std::vector<T>::operator[] instead of the same std::vector<T>::at, or used the unsafe std::vector<T>::front/back, for example? Just because they exist in parts of the language doesn't mean C++ is suddenly a panacea.

      5. Anonymous Coward
        Anonymous Coward

        Design of Intel MPX ~2015

        https://intel-mpx.github.io/design/

        Now that the pointer to the object is loaded in objptr, the program wants to load the obj.len subfield. By design, MPX must protect this second load by checking the bounds of the objptr pointer. Where does it get these bounds from? In MPX, every pointer stored in memory has its associated bounds also stored in a special memory region accessed via bndstx and bndldx MPX instructions (see next subsection for details). Thus, when the objptr pointer is retrieved from memory address ai, its corresponding bounds are retrieved using bndldx from the same address (Line 9). Finally, the two bounds checks are inserted before the load of the length value on Lines 11-12.

        1. jrtc27

          Re: Design of Intel MPX ~2015

          MPX was racy, had no protection against using the wrong bounds with a given pointer as it was totally separate, and only had four bounds registers. It was not a good design.

    2. swm Silver badge

      The people at Franz Lisp were so fed up with continual security patches to ftp etc. that they wrote their own in Franz Lisp (common lisp) in less than a day. No further problems.

  2. Hawkeye Pierce

    +1 for the Title

    ... not sure how many will get it though...

    1. chivo243 Silver badge
      Happy

      Re: +1 for the Title

      I got it, and wondering if Tom should get any royalties?

      +1 for the handle! And does Hawkeye have sisters or not... is his mom alive? If not when did she die? Yes, I've seen them too many times...

    2. Excellentsword (Written by Reg staff)

      Re: +1 for the Title

      One is enough for me

  3. Doctor Syntax Silver badge

    "The Morello programme was started in 2019 by UK Research and Innovation and intended to span five years."

    Span five? With Morello involved that should be Take Five. Just to drum up some business of course.

  4. Old Used Programmer Silver badge

    What's old is new again....

    This sounds a lot like the base-bound register pairs in the CDC 6400/6500/6600 machines...from the mid-1960s.

    1. fg_swe

      ICL, UNISYS, MOSCOW

      They all had mainframe computers with plenty of runtime checking. Then came the "cheap" Unix+C. "presented" like a Trojan Horse to the computing world.

      https://en.wikipedia.org/wiki/ICL_2900_Series

      https://en.wikipedia.org/wiki/Burroughs_large_systems

      https://en.wikipedia.org/wiki/Elbrus_(computer)

  5. Primus Secundus Tertius

    Pointers to failure

    Pointers are fine when the right chaps use them. But a lot of programmers I met did not seem to understand them properly.

    1. Tom 7 Silver badge

      Re: Pointers to failure

      If only there was some way to write some code to make them Smart Pointers so the programmer doesnt have to.

      1. fg_swe

        Re: Pointers to failure

        Smart pointers cant help you in case of multithreaded race bugs. A proper programming language can. See my other post.

    2. fg_swe

      Pointers Are Fine For Code Generators

      ...but very bad for 100% of human developers. See the CVE database and all the exploits in the code developed by "seasoned" kernel developers. Or Mozilla, Oracle, MSFT, QNX, ...

  6. fg_swe

    Very Expensive Approach

    Software-based approaches are much more lightweight and powerful than doing it by means of hardware. The compiler's type system has a much better view of the entire program than a CPU can ever have at runtime.

    See this language of mine:

    http://sappeur.ddnss.de/SAPPEUR.pdf

    1. fg_swe

      Re: Very Expensive Approach / Details

      If I understand the ARM concept correctly, they need very fat pointers to store all the safety information.

      Compared to that, a Sappeur program needs just the native pointer size(can be anything from 16 to 64 bits) plus a reference counter(typically 4 octets plus pthread_mutex for multithreaded objects) at the targeted object.

      ARM seems to consume 16 octets for a safe pointer. That is a lot of cache bloat and a lot of excess memory transfer bandwidth spent.

      Compile this sample to see it yourself: http://gauss.ddnss.de/

      1. jrtc27

        Re: Very Expensive Approach / Details

        Safe languages are great, but the world has huge piles of C/C++ and people keep adding to that pile every day. Some projects are being rewritten in languages like Rust, but they are few and far between. If you can write everything in a safe language then you do not need hardware memory protection (though there is the possibility that having bounds checks in hardware could remove the need for generated bounds checks in software). C/C++ are going to remain around for decades and something needs to be done to tame those languages, and that is what CHERI/Morello addresses. It also allows you to limit the damage of unsafe regions of code in safe languages in the same way that it limits the damage of buggy C/C++, be that something like Rust's unsafe or something like Java's JNI.

  7. msobkow Silver badge

    A little light on details, but intriguing. They don't mention anything about the so-called "side-band" attacks some CPUs are vulnerable to; I fail to see how memory bounding would resolve that.

    1. jrtc27

      If you want the details, there's Arm's full specification at https://developer.arm.com/documentation/ddi0606/latest, as well as our CHERI specification at https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-951.pdf; Arm's spec describes everything about the architecture but leaves out a lot of the design rationale that is present in the CHERI spec, instead choosing to reference our spec. https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-916.pdf is a few years old now but has ideas for how CHERI can help lessen the effects of speculative execution attacks; the high-level observation is you can avoid speculatively accessing out of bounds, so if you have accurate bounds for your language-level objects then you can avoid the `if (x < len) return a[x];`-style Spectre gadgets being abused as arbitrary read gadgets, only for reads within bounds (some of which may still not be permitted by the language-level checks, but it's at least a start). There's still a lot of nuance though.

    2. fg_swe

      Avoiding Side Channel Attacks

      The main guy behind the SeL4, Gernot Heiser, explained how to rid of the SPECTRE style side channel attacks in a video (he has lengthy videos online - cannot find the right one; please look at them for your enlightenment). Essentially, what must be done is to flush/invalidate all kinds of caches, TLBs and similar mechanisms before you perform a task switch.

      SPECTRE definitely is an issue, but it is orthogonal to memory safety.

      Similar to ABS brakes, automatic video/radar brakes and safety belts. All are important measures in their own right.

      1. Charles 9 Silver badge

        Re: Avoiding Side Channel Attacks

        Raises a new question, then. How often does your average processir core switch tasks?

        1. fg_swe

          Invalidating Caches, Cost of That

          Certainly invalidating caches and TLBs on each task switch (and also each interrupt ???) is not cost-free. Security does not come for free, insecurity might be fatal. Please refer all detailed questions to Admirals Yamamoto and Dönitz.

          Some smart computer scientists/engineers should be inventing a smart approach which does not cost too many transistors while achieving the intended security properties. Might take a few years until we have them.

          1. fg_swe

            Re: Invalidating Caches, Cost of That

            For the time being, just use a dedicated computer for any high security application. Cloud computing is a security risk, anyway. Can you trust a random $employee in $CloudCorporation ?

            For example, run a dedicated Email server just for your CEO. Lock this server in a specially hardened, access-controlled space. Never let a single Snowden do the maintenance on the server etc etc.

            The latest RPI is both cost efficient and small. You can mount it in a safe, which is bolted into the concrete of your data center,...

      2. fg_swe

        Re: Avoiding Side Channel Attacks

        Mr Heiser now works for Hensoldt Cyber, fomerly EADS/AIRBUS avionics. If you have an olive green NATO application, maybe he has a SPECTRE-safe CPU for you, already.

        https://hensoldt-cyber.com/mig-v/

        Please disregard the naming, its a Munich Product.

  8. David Halko

    Silicon Secured Memory (SSM)

    >> Code operating within one compartment has no access to any other area, which means that even if an attacker compromises one piece of the code or data, they cannot access other areas. Arm claims there has never been a silicon implementation of this kind of hardware capability in a high-performance CPU

    The industry had Silicon Secured Memory (SSM) on SPARC since 2015, in the highest performing processors on the market.

    https://www.theregister.com/2015/10/28/oracle_sparc_m7/

    Looking back on a 2015 register article...

    - SPARC Solaris actually used the MMU to separate User from Kernel memory by default, avoiding typical non-SPARC pointer security exploits by hackers (do OS's on ARM actually separate Kernel from User memory maps today? This is an OS issue, not a hardware issue, and OS's should have been called out.)

    - There is virtually no CPU cost for SPARC Silicon Secured Memory (SSM) protection in hardware, making sure pointers do not exit their [already reasonably secured] MMU isolated area

    - The SSM under SPARC really only needs to protect from stray pointers in a very limited MMU area, so 4 bits is more than enough, which the 2015 article did not understand

    - "If it doesn't alert anyone" was a false fear in 2015, since violations immediately notify Oracle via ASR in real time, before app owners are aware

    Claiming protection in hardware "in a high performance CPU" is a first of it's kind, is ludicrous... the register should have caught & compared it to what already exists.

    1. jrtc27

      Re: Silicon Secured Memory (SSM)

      SSM, also known as ADI, is not a compartmentalisation technique. It is the same concept as Arm's MTE (which was clearly inspired by it). Such tagging techniques do not provide deterministic fine-grained memory protection; they operate only at a coarse granularity, typically on the order of 16 bytes (that's what MTE typically uses, thought the SPARC M7 has a whopping 64 bytes, i.e. cache line, granularity), and are in the general case only probabilistic with a 1 in 2^N chance, for some small N that's typically around 4 or 8, since if you pick two allocations at random there is a 1 in 2^N chance the memory colours/versions will happen to match. On the M7 that means a 1 in 16 chance if you have no prior knowledge about the version. Moreover you can only bound the allocation, you cannot hand out access to part of an allocation, since recolouring part of the allocation would break the original pointer (you can, though, *split* the allocation at the tagging granularity), and in a compartmentalised world you rely solely on a malicious compartment not just guessing, or inferring, the right version to use. This is the killer that makes it useless for compartmentalisation, resigned to being solely a debugging and probabilistic mitigation technology. These limitations are all laid out for you in the very article you link to. Also not sure where you got this idea of "making sure pointers do not exit their [already reasonably secured] MMU isolated area", there's no such thing, SSM does nothing to stop you storing your pointer wherever you like.

      CHERI suffers from none of those issues. Capabilities cannot be forged, even if you know every single bit of the capability, the bounds are in the pointer so you can hand out multiple aliasing capabilities with different bounds and permissions, and single-byte capabilities are supported.

  9. Sparkus Bronze badge

    Need more info

    the 'pointer control and compartmentalization' sounds very much like a family of features that have been in Power since 2010 or so......

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2022