* Posts by Torben Mogensen

474 posts • joined 21 Sep 2006

Page:

What's that hurtling down the Bifröst? Node-based network fun with Yggdrasil 0.4

Torben Mogensen

What's with the Ös?

It may seem a bit metal to add random umlauts over Os (I blame Motörhead). But in Nordic languages it does, in fact, change the pronunciation, unlike in English, where an umlaut indicates vowels being pronounced separately rather than as a diphthong (as in naïve). Bifrost and Ragnarok definitely have O sounds, not Ö sounds.

Realizing this is getting out of hand, Coq mulls new name for programming language

Torben Mogensen

*Bleep*

Given the sounds that cover up four-letter words on TV, how about using the name *Bleep*? I'm sure a suitable backronym can be found. "p" could obviously stand for "prover", and "l" could stand for "logic", but the people behind the language would probably prefer French words. Any suggestions?

Blessed are the cryptographers, labelling them criminal enablers is just foolish

Torben Mogensen

Are banks criminal?

When i use my online bank services, I believe (and seriously hope) that all traffic is strongly encrypted. Does that make the banks criminal? (O.k., they may be, but for other reasons). What about the VPN I need to use to access my work server when not on the local network? What about using https instead of http? And so on.

If governments do not want us to use crypto, they should show an example and stop using it themselves, making all documents and communication public. Like that's ever gonna happen.

Ah, you know what? Keep your crappy space station, we're gonna try to make our own, Russia tells world

Torben Mogensen

Unmanned space station == satellite?

An unmanned orbital space station is just a satellite by another name. "Not permanently manned" could mean anything from short-term maintenance crews every five years to almost always manned, but given that it is stated that the reason for not having permanent manning is radiation, my guess is that it is closer to the first. Higher radiation probably means inside the inner Van Allen belt, which is lower than ISS. This would lower the cost, but require more frequent boosting to maintain orbit.

What's this about a muon experiment potentially upending Standard Model of physics? We speak to one of the scientists involved

Torben Mogensen

Connection to new Dark Matter model?

Researchers at the University of Copenhagen recently released a theoretical study where they replace Dark Energy with adding magnetic-like properties to Dark Matter. It would be interesting (though highly unlikely) if the observed muon magnetic anomaly was related to this.

Where did the water go on Mars? Maybe it's right under our noses: Up to 99% may still be in planet's crust

Torben Mogensen

Not really surprising

As the article states, water on Earth is recycled by volcanic activity and would otherwise not be found in any great quantity on the surface. This has been known long, so it is not really a surprise that the lack of volcanic activity on Mars has contributed to its loss of liquid water.

What is new is that measurements of H20 vs. d2O can give a (very rough) estimate of how much is lost underground compared to lost to space.

In any case, for those who dream of terraforming Mars, its low gravity and lack of volcanic activity will make it hard to sustain a viable biosphere without having to replenish it forever. In spite of its current unfriendly environment, I think Venus is a better long-term option for terraforming: Blow away most of the current atmosphere and add water. Redirecting comets from the Kuiper belt to hit Venus will contribute to both. Sure, we are a long way from being able to do that, but in the long run, it will make more sense.

Memo to scientists. Looking for intelligent life? Have you tried checking for worlds with a lot of industrial pollution?

Torben Mogensen

What would it do us of good to build and send such a missile if the other side has already launched one (or will so so before our missile arrives)? At best, satisfaction when we all die that we will be revenged, but that is a poor comfort.

And, in the event that such a missile misses its mark or is intercepted, we will have made an enemy that might otherwise have been an ally.

Interstellar distances are so large that invasion of another civilized planet is unrealistic. We can destroy one, yes, but invasion assumes that there is something worthwhile left to invade. And the amount of war material that it is realistic to bring across interstellar distances will be relatively easily countered by the defender, even if their level of technology is lower -- as long as they have orbital capability. Added to that, invasion is only really worthwhile if the goal is colonization -- sending goods back to the mother planet is too expensive to be worth it -- and sending a large number of colonizers across interstellar space is unrealistic. This is why invasion SciFi postulate hypothetical technologies such as FTL flight.

It might make sense to colonize extrasolar planets that have biospheres but no civilization. You can send frozen fertilised eggs there and let them be raised by robots until they grow up. This will in no way help Earth, but it can ensure long-term survival of the human species.

PayPal says developer productivity jumped 30% during the COVID-19 plague

Torben Mogensen

Meetings

I'm sure the main reason is that developers didn't waste so much time on useless meetings. At zoom meetings, they can code in a another window and only pay attention to the meeting in the 5% of the time something useful is said.

Useful quantum computers will be impossible without error correction. Good thing these folks are working on it

Torben Mogensen

"All we have to do is put them together"

That must be the understatement of the decade. Problems arise in quantum computer exactly when you put elements together. Each element may perform predictably on its own, but when you put them together, chaos ensues.

Arm at 30: From Cambridge to the world, one plucky British startup changed everything

Torben Mogensen

Re: Depends on what you mean by "reduced"

"The *real* point of RISC was that it worked round the memory bandwidth problem."

That too, but mostly the load-store architecture prevented a single instruction from generating multiple TLB lookups and multiple page faults. On a Vax, a single instruction could (IIRC) touch up to four unrelated addresses, which each could require a TLB lookup and each cause a page fault. In this respect x86 isn't all bad, as most instructions only touch one address each (though they may both load from and store to this address).

On the original ARM, a load/store multiple registers could cross a page boundary, which actually caused faulty behaviour on early models.

A load-store architecture requires more registers, which is why ARM had 16 registers from the start, which x86 only got in the 64-but version. In retrospect, letting one register double as the PC (a trick they goy from PDP-11) was probably a mistake, as it made the pipeline visible, which gave complications when this was lengthened (as it was in the StrongARM).

Torben Mogensen

Re: Depends on what you mean by "reduced"

"As you can imagine, decoding an "instruction" is a lot harder if you don't know how many bytes it contains until you've already begun decoding the first part!"

Even worse, you can't begin decoding the next instruction until you have done a substantial part of the decoding of the current instruction (to determine its size). Decoding the next N instructions in parallel is easy if they are all the same size, but difficult if they are not. You basically have to assume that all byte borders can be the start of an instruction and start decoding at all these, throwing away a lot of work when you later discover that these were not actual instruction prefixes. This costs a lot of energy, which is a limiting factor in CPU, and getting more so over time.

You CAN design multi-length instructions without this problem, for example by letting each 32-bit word either hold two 16-bit instructions or a single 32-bit instruction, so you can decode at every 32-bit boundary in parallel. But this is not the case for x86 because it has grown by bits and pieces over time, so it is a complete mess. So you need to do speculative deconding, most of which is discarded.

Torben Mogensen

You can't use the number of transistors to measure RISC vs. CISC. The majority of transistors in modern CPUs are used for cache, branch prediction, and other things that don't depend on the size or complexity of the instruction set.

Torben Mogensen

Who killed MIPS?

The article states that Arm killed of its RISC rival MIPS. I do not believe this to be true. IMO, it was Intel's Itanium project that killed MIPS: Silicon Graphics, which at the time had the rights to MIPS, stopped development of this to join the Itanium bandwagon, long before any hardware was available. Hewlett-Packard (which had their own PA-RISC architecture) did the same, as did Compaq, who had recently acquired the Alpha architecture from DEC. So, effectively, Itanium killed three of the four dominant server RISC architectures (the fourth being Sun's SPARC architecture, that was later acquired by Oracle), and that was solely based on wildly optimistic claims about future performance made by Intel. MIPS continued to exist as an independent company for some years, but never regained its position. It was eventually open-sourced and used as the basis of some Chinese mobile-phone processors, but these were, indeed, swamped by Arm. Itanium didn't affect Arm much, except that Intel stopped producing their StrongArm (acquired from DEC) and the successor XScale.

So, while Itanium itself was a colossal failure, it actually helped Intel gain dominance in the server market -- with x86 -- as it had eliminated potential competitors in the server market. Now, it seems Arm is beginning to make inroads on this market.

The evolution of C#: Lead designer describes modernization journey, breaks it down about getting func-y

Torben Mogensen

Functional C# == F#

If you like the .NET platform and C#, but want something more functional, you could try F#. F# is sort of a merge of OCaml and C#, having most of the features of both, but works best if you program mostly in a functional style. You can use all .NET libraries, and there are some F#-specific libraries that better support a functional style.

There are some places where having to support both functional and OO styles make things a bit ugly (for example two different syntaxes for exactly the same thing), but overall it is not a bad language. Not as elegant as Standard ML, though.

Torben Mogensen

"You can't take anything away"

While it is traditional to not remove features from a language to ensure full backwards compatibility, there are ways around it:

- You can make a tool that will transform programs using the deleted feature into some that don't. This can require a bit of manual work, but not too much. Of course, fully automatic is best.

- You can remove the feature from all future compilers, but keep supporting the last compiler that has the feature (without adding new features to this).

- Warn that it will be removed in X years, and then remove it, in the meantime letting compilers warn that the feature will disappear. People will then have the choice between modifying their programs or use old, non-supported compilers once the feature is gone.

- You can introduce a completely new language that is only library-compatible with the old, let the old stay unchanged forever, and suggest people move to use the new language. This is sort of what Apple did with Swift to replace Objective C.

Third event in 3 months, Apple. There better be some Arm-powered Macs this time

Torben Mogensen

Emulation

It used to be that emulation caused a ×10 slowdown or thereabouts because the emulator had to decode every instruction before executing it. These days, emulation is more like JVM systems: You compile code on the fly into a code cache, optimising the compiled code if it is executed a lot (and this optimisation can be done on separate cores). This can keep the slowdown down to around 20% on average and almost nothing on programs with very small intensive compute kernels. On top of this, calls to the OS run natively, so you are unlikely to feel a significant slowdown. The cost is more memory use (for the code cache) and more heat generation (as you use otherwise idle cores for compilation and optimisation of code).

I can even imagine that Apple has tweaked the x86 code generation in their compilers to avoid code that is difficult to cross-compile to ARM, such as non-aligned memory accesses. This will only have marginal impact on x86 performance (it might actually improve it), but it can have a significant impact on the performance of the generated ARM code.

Amazon blasts past estimates, triples profits to $6.2bn but says COVID will cost it $4bn over the next quarter

Torben Mogensen

COVID?

Only the headline mentions COVID, the article itself just says "employee safety". That can, of course, include COVID measures, but it probably includes all sorts of other measures.

And COVID is likely to make more people do their Black Friday and Christmas shopping online, so it will probably gain Amazon more than the $4bn that they claim to spend on employee safety, including COVID measures.

As a "Carthago delenda est" line, I will add that I think Amazon should be forced to split into at least two independent companies: One for online sale and one for streaming videos. It would not be difficult to argue that Amazon uses its dominant position in online shopping to do unfair competition towards Netflix and other streaming services, by combining a free shipping membership with their streaming services.

What will you do with your Raspberry Pi 4 this week? RISC it for a biscuit perhaps?

Torben Mogensen

Dead end?

Much as I like RISC OS (I had an Archimedes and an Acorn A5000, and used RISC OS on a RPC emulator for a while), I think it is painting itself into a corner from which it can not escape. It is still mainly written in 32-bit in ARM assembly code, and the world is moving to 64 bits -- it is even becming common that ARM processors are 64-bit only. And it can only use one core, where even tiny systems these days are multicore. Cooperative multi-tasking is also rather dated. There were good reasons for these design decisions in the late 1980s, but they do not fit modern computing. MacOS had similar issues in the 1980s, but when they moved to a platform based on BSD (from Steve Jobs' NEXT project), most of these problems were solved. Acorn died before it could do similar changes to its own platform, and attempts at moving RISC OS to a more modern kernel have been half-hearted -- there was a RISC OS style desktop called ROX for Linux, but it mainly copied the look and feel of RISC OS and didn't go very deep. And nothing seems to have happened with it for a long time.

So, I can't really see RISC OS moving out of a hobbyist niche and into anything approaching mainstream. Not without a so complete rewrite that it is debatable that you can call the result RISC OS anymore. It might be better to port some of the interesting parts of RISC OS (some of the apps, the app-as-a-folder idea, and the font manager) to Linux and let the rest die.

Heads up: From 2022, all new top-end Arm Cortex-A CPU cores for phones, slabtops will be 64-bit-only, snub 32-bit

Torben Mogensen

Makes sense

I loved the old 32-bit instruction set when I had my Archimedes and A5000 home computers, but over time the instruction set accumulated so much baggage that it became a mess. So I'm fine with a 64-bit only ARM. Nearly all modern applications use 64-bit only, so support for the 32-bit ISA is just extra silicon area that could better be used for something else.

Sure, it is a drag that future Raspberry Pis will not be able to run RISC OS, as this is (still) mainly 32-bity assembly code. But RISC OS, for all its qualities, will not amount to anything other than a hobby system until it is ported to a high-level language (such as Rust) and made more secure. Even as a past user of RISC OS, it was not the OS core that I loved -- it was higher-level details such as the GUI, the built-in apps, the font manager, the file system (with file types and applications as folders), and the easy-to-use graphics system. These could well be ported to a more modern OS kernel.

Ah yes, Sony, that major player in the smartphone space, has a new flagship inbound: The Xperia 5 II

Torben Mogensen

Lenses

I seriously doubt the tiny lenses used in smartphones are precise enough for true 8K video. So you could probably do just as well with intelligent upscaling of 4K or lower.

0ops. 1,OOO-plus parking fine refunds ordered after drivers typed 'O' instead of '0'

Torben Mogensen

ABC80

In the 1980s there was a Swedish-made home computer called ABC80. On this computer, the pixel patterns for O and 0 were EXACTLY the same. Since O and 0 are close on a keyboard, this could give hard-to-find errors when programming in BASIC. Is this a variable called "O" or the number 0? It didn't help that the designers had the bright idea that distinguishing integer constants from floating point constants, you added a "%" at the end of integer constants (similar to how integer variables were suffixed in most BASICs at the time). So O% and 0% were both valid. Variable names could only be a single letter or a single letter followed by a single digit (and suffixed with % or $ to indicate integer or string variables). All in all, this was not hugely user friendly. The follow-up ABC800 added a dot in the centre of zero, but the BASIC was otherwise the same.

I was the happy owner of a BBC Micro, but I was briefly hired by a company to port some school software to ABC80. The way it operated on strings used huge amounts of memory, so I had to add a small machine-code routine to make in-place updates (insert char, delete char, replace char) in strings to keep it from running out of memory.

Nvidia to acquire Arm for $40bn, promises to keep its licensing business alive

Torben Mogensen

Independence

The main reason ARM was spun off from Acorn Computers to become an independent company was that Apple (who wanted to use ARM in their Newton hand-held) did not want to be dependent on a competitor (however tiny). Having NVIDIA control ARM can lead to similar sentiments from ARM licensees that compete with NVIDIA.

I would prefer ARM to be neutral with no single instance (company or person) owning more than 20% of the company.

'A guy in a jetpack' seen flying at 3,000ft within few hundred yards of passenger jet landing at LA airport

Torben Mogensen

I wonder why commercial airplanes don't have cameras recording everything visible through the cockpit windows? That way, you could review all sightings after the plane lands. Such cameras would not need extensive testing, as they do not affect the flight (except by drawing a small amount of power, and even that could be replaced by batteries).

Toshiba formally and finally exits laptop business

Torben Mogensen

Libretto

I remember Toshiba best for their ultra-tiny laptops -- the Libretto range. I had a Libretto 50CT -- 210×115×34 mm with a screen smaller than many modern smart phones. A curiosity was that it had a mouse "nub" besides the screen and mouse buttons on the back of the lid. So you would use your thumb to move the cursor and index and middle fingers to operate the buttons. It was great for taking along when travelling.

See more at https://www.youtube.com/watch?v=7HQt6EwA0JE

A tale of mainframes and students being too clever by far

Torben Mogensen

Ray-tracing on a Vax

When I was doing my MSc thesis about ray-tracing in the mid 1980's, we didn't have very good colour screens or printers, so to get decent images, you had to use dithering, where different pixels were rounded differently to the available colours to create dot-patterns that average to the true colour. One such technique is called error distribution: When you round a pixel, you divide the rounding error by 4 and add this to the next pixel on the same row and the three adjacent pixels in the next row. This way, the colours in an area would average to the true colour.

I ran the ray-tracer program (written in C) on the department Vax computer, but I had an annoying problem: At some seemingly random place on the image, a "shadow" would appear making all the pixels below and to the right of this be really odd colours. I looked at my code and could see nothing wrong, so I ran the program again (this would take half an hour, so you didn't just run again without looking carefully at the code). The problem re-appeared, but at a different point in the image! Since I didn't use any randomness, I suspected a hardware fault, but I needed more evidence to convince other people of this. I used a debugger and found that, occasionally, multiplying two FP numbers would give the wrong result. The cause of the shadow was that one colour value was ridiculously high, so even after distributing the error to neighbouring pixels, these would also be ridiculously high, and so on.

To make a convincing case, I wrote a short C program that looped the following:

1. Create two pseudo-random numbers A and B.

2. C = A*B; D=A*B;

3. if (C != D) print A, B, C, and D and stop program.

This program would, on average, stop and print out different values for C and D after one or two minutes of running (but never with the same two numbers), and this convinced the operators that there was something wrong, and they contacted DEC. They sent out some engineers, and they found out that there was a timing problem where the CPU sometimes would fetch the result of the multiplication from the FPU slightly before it was ready, so by increasing the delay slightly, they got rid of the problem.

Apple to keep Intel at Arm's length: macOS shifts from x86 to homegrown common CPU arch, will run iOS apps

Torben Mogensen

Acorn welcoming Apple to the RISC club

When the PPC-based Macs first came out in 1994, Apple claimed to be the first to use a RISC processor for a personal computer. Acorn, that had done so since 1987, ran an ad welcoming Apple to the club (while pushing their RISC PCs).

If Acorn still existed, they could run a similar add welcoming Apple into the ARM-based PC club.

Torben Mogensen

Re: RIP Hackintosh

"I do feel like Apple are consistently missing a trick here - not everyone can afford to buy their expensive hardware"

Apple is primarily a hardware company, and they actively oppose running their software on other hardware, as the software is mainly there to sell the hardware. Same reason they don't license iOS to other phone makers.

Torben Mogensen

"The only ones with a problem are people buying a Mac exclusively to run Windows."

That used to be a thing, when Macs were known for superior hardware and design, but these days you can get Wintel laptops with similar design and build quality for less than equivalent Macs. So while a few people may still do it, it is not as widespread as it was 15 years ago. It is certainly not enough to make a dent in Apple's earnings if they switch to other brands.

Besides, if you have already bought a Mac to run Windows, you should not have any problems: Windows will continue to run just fine. Just don't buy one of the new ARM-based Macs to run Windows, buy Asus, Lenovo, or some of the other better PC brands.

Torben Mogensen

Re: Keyword here is "maintained"

"Compare Apple's approach to Windows and the differences are clear: Windows runs code from 30 years ago, and likely will continue to run it unchanged."

30 year old (or even 10 year old) software should run fast enough even with a naive cross-compilation to modern ARM CPUs. I occasionally run really old PC games on my ARM-based phone using DosBox, and that AFAIK uses emulation rather than JIT compilation.

And running really old Windows programs (XP or earlier) is not really that easy on Windows 10.

Torben Mogensen

Re: Rosetta

One of the things that made Rosetta work (and will do so for Rosetta 2) is that OS calls were translated to call natively compiled OS functions rather than emulating OS code using the old instruction set. These days, many apps use dynamically loaded shared libraries, and these can be precompiled so apps spend most of their time in precompiled code, even if the apps themselves are written in x86. Also, with multicore CPUs, JIT translation can be done on some cores while other cores execute already-compiled code.

But the main advantage is that nearly all software is written high-level languages, so they can essentially be ported with just a recompilation. The main thing that hinders this is that programs in low-level languages like C (and Objective C) may make assumptions about memory alignment and layout that may not be preserved when recompiling to another platform. Swift is probably less problematical in this respect. And these days you don't need to buy a new CD to get a recompiled program from the vendor -- online updates can do this for you. So moving to an ARM-based Mac is much less of a hassle for the user than the move from PPC to x86 (and the earlier move from 68K to PPC).

Moore's Law is deader than corduroy bell bottoms. But with a bit of smart coding it's not the end of the road

Torben Mogensen

Re: Dennard scaling

Video compression is not really highly serial. The cosine transforms (or similar) used in video compression are easily parallelised. It is true that there are problems that are inherently sequential, but not as many as people normally think, and many of those that are are not very compute intensive. It is, however, true that not all parallel algorithms are suite for vector parallelism, so we should supplement graphics processors (SIMD parallelism) with multi-cores (MIMD parallelism), but even here we can gain a lot of parallelism by using many simple cores instead of few complex cores.

But, in the end, we will have to accept that there are some problems that just take very long time to solve, no matter the progress in computer technology.

Torben Mogensen

Dennard scaling

The main limiter for performance these days is not the number of transistors per cm², it is the amount of power drawn per cm². Dennard scaling (formulated in the 1970s, IIRC) stated that this would remain roughly constant as transistors shrinks, so you could get more and more active transistors operating at a higher clock rate for the same power budget as transistors shrink. This stopped a bit more than a decade ago: Transistors now use approximately the same power as they shrink, so with the same amount of transistors at smaller areas you get higher temperatures, which requires more fancy cooling, which requires more power. This is the main reason CPU manufacturers stopped doubling the clock rate every two years (it has been pretty much constant at around 3GHz for laptop CPUs for the last decade). To get more compute power, the strategy is instead to have multiple cores rather than faster single cores, and now the trend is to move compute-intensive tasks to graphics processors, which are essentially a huge amount of very simple cores (each using several orders of magnitude fewer transistors than a CPU core).

So, if you want to do something that will benefit a large number of programs, you should exploit parallelism better, in particular the kind of parallelism that you get on graphics processors (vector parallelism). Traditional programming languages (C, Java, Python, etc.) do not support this well (and even Fortran, which does to some extent, requires very careful programming to do so), so the current approach is to use libraries of carefully coded code in OpenCL or CUDA and call these from, say, Python, so few programmers would even have to worry about parallelism. This works well as long as people use linear algebra (such as matrix multiplication) and a few other standard algorithms, but it will not work if you need new algorithms -- few programmers are trained to use OpenCL or CUDA, and using these effectively is very, very difficult. And expecting compilers for C, Java, Python etc. to automatically parallelize code is naive, so we need languages that from the start are designed for parallelism and do not add features unless the compiler knows how to generate parallel code for these. Such languages will require more training to use than Python or C, but far less than OpenCL or CUDA. Not all code will be written in these languages, but the compute-intensive parts will, while things such as business logic and GUI stuff will be written in more traditional languages. See Futhark.org for an example of such a language.

On a longer term, we need to look at low-power hardware design, maybe even going to reversible logic (which, unlike irreversible logic, has no theoretical lower bound of power use per logic operation).

ALGOL 60 at 60: The greatest computer language you've never used and grandaddy of the programming family tree

Torben Mogensen

The influence of ALGOL 60

You can see the influence of ALGOL 60 on modern languages most clearly if you compare programs written in 1960 versions of ALGOL, FORTRAN, COBOL, and LISP (which were the most widespread languages at the time). The ALGOL 60 program will (for the most part) be readily readable by someone who has learned C, Java, or C# and nothing else. Understanding the FORTRAN, COBOL, or (in particular) LISP programs would require a good deal of explanation, but understanding the ALGOL 60 program would mainly be realising that begin, end, and := correspond to curly braces and = in C, Java, and C#. Look, for example, at the Absmax procedure at https://en.wikipedia.org/wiki/ALGOL_60#Examples_and_portability_issues

FORTRAN and COBOL continued to evolve into something completely unlike their 1960 forms while still retaining their names -- even up to today. ALGOL mainly evolved into languages with different names, such as Pascal, C, CPL, Simula and many others. So ALGOL is not really more of a dead language than FORTRAN II and COBOL 60. There was a computer scientist in the late 60s that was asked "What will programming languages look like in 2000?". He answered "I don't know, but I'm pretty sure one of them will be called FORTRAN". This was a pretty good prediction, as Fortran (only name change is dropping the all caps) still exists, but looks nothing like FORTRAN 66, which was the dominant version at the time. You can argue that the modern versions of FORTRAN and COBOL owe more to ALGOL 60 than they do to the 1960 versions of themselves.

Torben Mogensen

Re: BNF

BNF is essentially just a notation for context-free grammars, and originated in Noam Chomsky's work on linguistics. While compiler books and parser generators use somewhat different notations, they are still BNF at heart. EBNF (extended Backus-Naur form) extended the notation with elements from regular expressions (but using a different notaion), including repetition ({...}), optional parts ([...]) and local choice (...|...).

Happy birthday, ARM1. It is 35 years since Britain's Acorn RISC Machine chip sipped power for the first time

Torben Mogensen

Sophie Wilson post about early ARM history from 1988

I saved a USENET post that Sophie Wilson made in November 1988 about the early ARM days. Since USENET is publicly archived, I don't think there are any IP issues with showing this. I think many of you will find the following interesting:

From: RWilson@acorn.co.uk

Newsgroups: comp.arch

Subject: Some facts about the Acorn RISC Machine

Keywords: Acorn RISC ARM

Message-ID: <543@acorn.UUCP>

Date: 2 Nov 88 18:03:47 GMT

Sender: andy@acorn.UUCP

Lines: 186

There have now been enough partially correct postings about the Acorn RISC

Machine (ARM) to justify semi-official comment.

History:

ARM is a key member of a 4 chip set designed by Acorn, beginning in 1984, to

make a low cost, high performance personal computer. Our slogan was/is "MIPs

for the masses". The casting vote in each design decision was to make the

final computer economic.

The chips are (1) ARM: a 32 bit RISC Microprocessor; (2) MEMC: a MMU and

DRAM/ROM controller; (3) VIDC: a video CRTC with on chip DACs and sound; and

(4) IOC: a chip containing I/O bus and interrupt control logic, real time

clocks, serial keyboard link, etc.

The first ARM (that referred to by David Chase @ Menlo Park) was designed at

Acorn and built using VLSI Technology Inc's (VTI) 3 micron double level metal

CMOS process using full custom techniques; samples, working first time, were

obtained on 26th April 1985. The target clock was 4MHz, but it ran at 8. The

timings that David gives are for the ARM Evaluation System, where ARM was run

at 3.3MHz and 6.6MHz (20/3) for initial and page-mode DRAM cycles,

respectively. The ARM comprises 24,000 transistors (circa 8,000 gates). Every

instruction is conditional, but there are neither delayed loads/stores nor

delayed branches (sorry, Martin Hanley). Call is via Branch and Link (same

timing as Branch). All instructions are abortable, to support virtual memory.

The first VIDC was obtained on 22nd Oct 1985, the first MEMC on 25th Feb 1986,

and the first IOC 30th Apr 1986. All were "right first time".

We then redesigned ARM to make it go faster (since, by this time, Acorn had

decided roughly what market to aim the completed machines at and 8MHz minimum

capability was required - but we did continue to develop software on the 3

micron part!). Some more FIQ registers were added, bringing the total to 27

(some of our "must go as fast as possible for real time reasons" code didn't

manage with the smaller set). A multiply instruction (2 bits per cycle,

terminate when multiplier exhausted so that 8xn multiply takes 4 cycles max)

and a set of coprocessor interfaces were added. Scaled indexed by register

shifted by register (i.e. effective address was ra+rb<<rc) was removed from

the instruction set (too hard to compile for) [scaled indexed by register

shifted by constant was NOT removed!].

The new, 2 micron ARM was right first time on 19th Feb 1987. It's peak

performance was 18MHz; its die size 230x230 mil^2; 25,000 transistors.

VTI were given a license to sell the chips to anyone. They renamed the chips:

VL86C010 (ARM), VL86C110 (MEMC), VL86C310 (VIDC), VL86C410 (IOC).

Acorn released volume machines "Acorn Archimedes" in June 1987. Briefly:

A305: 1/2 MByte, 1MByte floppy, graphics to 640x514x16 colours

A310: ditto, 1MByte

A310M: ditto with PC software emulator (circa a PC XT, if you're interested)

A440: 4MByte, 20MByte hard disc, 1152x896 graphics also.

All machines have ARM at 4/8MHz (circa 5000 dhrystones 1.1), 8 channel sound

synthesiser, proprietry OS, 6502 software emulator, software.... Prices

between 800 and 3000 pounds UK with monitor and mouse and all other useful

bits. Not available in the US, but try Olivetti Canada.

VTI make ARM available as an ASIC cell. Sanyo have taken a second source

license (in April 1988) for the chip set, and make a 32 bit microcomputer

(single chip controller). In "VLSI Systems Design" July 1988, the following

statements are made by VTI: ARM in 1.5 micron (18-20MHz clock), 180x180 mil^2;

future shrink to 1 micron (they are expecting "perhaps 40MHz" and 150 mil

square with the price dropping from $50 to $15); expected sales in 1988

90-100,000 units.

Contact Ron Cates, VTI Application Specific Logic Products Division,

Tempe, Arizona for details (e.g. the "VL86C010 RISC Family Data Manual").

Plug in boards for PCs are available. A controller for Laser printers

with ARM, MEMC, VIDC and 4MBytes DRAM has been sold to Olivetti [Acorn'

parent company as of 1985-6] (contact SWoodward@acorn.co.uk if you want to

know more).

In the Near Future:

We have a Floating Point Coprocessor interface chip working "in the lab" - the

fifth member of the four chip set. It interfaces an ATT WE32206 to ARM's

coprocessor bus. It benchmarks at 95.5 KFlops LINPACK DP FORTRAN Rolled BLAS

(slowest) (11KFlops with a floating point emulator) on an A310. Definitely

have to make our own, some time...

Acorn is about to release UNIX 4.3BSD including TCP/IP, NFS, X Windows and

IXI's X.desktop on the A440. Contact MJenkin@acorn.co.uk or

DSlight@acorn.co.uk for more info (and to be told that it isn't available in

the US {yet}).

Operating Systems:

Acorn's proprietry OS "Arthur" is written in machine code: it fills 1/2MByte

of ROM! (yes, writing in RISC machine code is truly wonderful as others have

noted on comp.arch). Its main features are windows, anti-aliased fonts

(wonderful at 90 pixels per inch - I use 8 point all the time) and sound

synthesis. It runs on all Archimedes machines. A 2nd release is due real soon

now and features multitasking, a better desktop and a name change to RISC OS.

VTI are porting VRTX to the ARM; Cambridge (UK) Computer Lab's Tripos has been

ported to A310/A440. UNIX has been ported by Acorn: see above. There are MINIX

ports everywhere one looks (try querying the net...).

Software:

C Compiler: ANSI/pcc; register allocation by graph colouring; code motion;

dead code elimation; tail call elimination; very good local code generation;

CSE and cross-jumping work and will be in the next release. No peepholing (yet

- not much advantage, I'm afraid). Can't turn off most optimisation features.

Also FORTRAN 77, ISO PASCAL, interpreted BASIC (structured BBC BASIC, very

fast), Forth, Algol, APL, Smalltalk 80 (as seen at OOPSLA 88: on an A440 it

approximates a Dorado) and others (LISP, Prolog, ML, Ponder, BCPL....).

Specific applications for Archimedes computers are too numerous to mention!

(though the high speed Mandelbrot calculation has to be seen to be believed -

one iteration of the set in 28 clock ticks [32 bit fixed point] real time

scroll across the set [calculate row/column in a frame time and move the

picture]).

There is a part of the net that talks about Archimedes machines:

(eunet.micro.acorn).

Random Info:

Code density is approximately that of 80x86/68020. Occasionally 30% worse

(usually on very small programs).

The average number of ticks per instruction 1.895 (claims VTI - we've never

bothered to measure it).

DRAM page mode is controlled by the MEMC, but there is a prediction signal

from the ARM saying "I will use a sequential address in the next cycle" which

helps the timing a great deal! S=125nS, N=250nS with current MEMC and DRAM

(see David Chase's article for instruction timing). Static RAM ARM systems

have been implemented up to 18MHz - S=N=1/18 with these systems.

Approximately 1000 dhrystones 1.1 per MHz if N=S; about 1000/1.895 dhrystones

per MHz if N=2S (i.e. 5K dhrystones for a 4/8MHz system; 18K dhrystones for

an 18/18MHz system).

Most recent features: Electronic Design Jul 28 1988, VLSI Systems Design July

1988.

We had a competition to see who would use "ra := rb op rc shifted by rd" with

all of ra, rb, rc and rd actually different registers, but the graphics people

won it too easily!

ARM's byte sex is as VAX and NS32000 (little endian). The byte sex of a 32 bit

word can be changed in 4 clock ticks by:

EOR R1,R0,R0,R0R #16

BIC R1,R1,#&FF0000

MOV R0,R0,ROR #8

EOR R0,R0,R1,LSR #8

which reverses R0's bytes. Shifting and operating in one instruction is fun.

Shifted 8bit constants (see David Chase's article) catch virtually everything.

Major use of block register load/save (via bitmask) is procedure entry/exit.

And graphics - you just can't keep those boys down. The C and BCPL compilers

turn some multiple ordinary loads into single block loads.

urn some multiple ordinary loads into single block loads.

MEMC's Content Addressable Memory inverted page table contains 128 entries.

This gives rather large pages (32KBytes with 4MBytes of RAM) and one can't

have the same page at two virtual addresses. Our UNIX hackers revolted, but

are now learning to love it (there's a nice bit in the standard kernel which

goes "allocate 31 pages to start a new process"....)

Data types: byte, word aligned word, and multi-word (usually with a

coprocessor e.g. single, double, double extended floating point).

Neatest trick: compressing all binary images by around a factor of 2. The

decompression is done FASTER than reading the extra data from a 5MBit

winchester!

Enough! (too much?) Specific questions to me, general brickbats to the net.

.....Roger Wilson (RWilson@Acorn.co.uk)

DISCLAIMER: (I speak for me only, etc.)

The above is all a fiction constructed by an outline processor, a thesaurus

and a grammatical checker. It wasn't even my computer, nor was I near it at

the time.

Fomalhaut b exoplanet may have been cloud in a trench coat: Massive 'world' formed after 'mid-space super-prang'

Torben Mogensen

Or a comet?

Bright. Not emitting heat. Disperses over time. That sounds suspiciously like a large chunk of ice that sheds its surface as it nears a star. Otherwise known as a comet.

'I give fusion power a higher chance of succeeding than quantum computing' says the R in the RSA crypto-algorithm

Torben Mogensen

About voting machines and blockchain: https://xkcd.com/2030/

WebAssembly: Key to a high-performance web, or ideal for malware? Reg speaks to co-designer Andreas Rossberg

Torben Mogensen

Re: WaSm to you too

The "populos" have already accepted Wasm, since it is included as standard in the major browsers. So, no I don't think it will take much convincing. Another thing is that Wasm is extremely simple and designed for sandboxing from the start, so it can not encrypt your files. It can be used to mine bitcoin, as all that requires is some CPU time and the ability to send short messages to a web server somewhere. But it does not have access to your own bitcoins.

Wasm has a complete formal specification, which means that undefined or implementation-specific behaviour (which is often used as an attack vector) is avoided. I trust Wasm much more than Javascript, which is only somewhat safe because browsers have become good at detecting bad behaviour in Javascript.

Moore's Law isn't dead, chip boffin declares – we need it to keep chugging along for the sake of AI

Torben Mogensen

Hot Chips, indeed

The major problem with cramming more transistors into chips is that if they all operate at the same time (which is a requirement for more performance), they will generate a lot of heat and consume a lot of power. The generated heat requires more cooling, which increases the power usage even more. There is a reason that your graphics card has a larger heat sink than your CPU, and the article talks about many more processing elements than on a GPU (albeit a bit simpler).

So rather than focusing on speed, the focus should be on power usage: Less power implies less heat implies less cooling. One option is to move away from silicon to other materials (superconductors, for example), but another is to use reversible gates: They can, in theory at least, use much less power than traditional irreversible gates such as AND and OR, and you can build fully capable processors using reversible gates. But even that requires a different technology: Small-feature CMOS uses more power in the wires than in the gates, so reducing the power of the gates does not help a lot. Maybe the solution is to go back to larger features (at the end of the Dennard scaling range).

Poor old Jupiter has had a rough childhood after getting a massive hit from a mega-Earth

Torben Mogensen

Alternative explanation

Another possibility is that, when the core was massive, it had sufficient fissionable material to start a nuclear reaction that pushed the material apart again. This could be an ongoing thing: Fissionable materials collect the centre, react, disperse, collect again, and so on.

I have absolutely no evidence for this theory, though.

Packet's 'big boy' servers given a shot in the Arm with 32-core, 3.3GHz Ampere CPUs

Torben Mogensen

Re: Could do with that in a laptop.

If you are willing to have a 7kg laptop with a battery life of 30 minutes and a noisy fan running constantly, then by all means.

If you're worried that quantum computers will crack your crypto, don't be – at least, not for a decade or so. Here's why

Torben Mogensen

Re: 6,681 qubits?

Spontaneous collapse of the quantum state to a classical state is indeed a major problem for quantum computing. And the problem increases not only with the number of entangled qubits, but also with the number of operations performed on these. And to crack codes with longer keys, you not only need more qubits, you also need more operations on each.

The simple way to avoid quantum computers cracking your code is just to increase the key length -- if 256 bit keys become crackable in 10 years (which I doubt), you just go to 1024 bit key length, and you will be safe for another decade or more. Unless some giant breakthrough is made that will make quantum computers scale easily, and I seriously doubt that.

That doesn't mean that quantum computers are pointless. They can be used for things such as simulating quantum systems and for quantum annealing. But forget about cracking codes or speeding up general computation. You are better off with massively parallel classical computers, and to avoid huge power bills, you should probably invest in reversible logic, which can avoid the Landauer limit (a thermodynamic lower bound on the energy cost of irreversible logic operations).

In a galaxy far, far away, aliens may have eight-letter DNA – like the kind NASA-backed boffins just crafted

Torben Mogensen

Which subsets are viable?

So, these people have "shown" that a DNA variant with eight bases is viable. But I find it more interesting to know which subsets of these eight bases are viable. Obviously, the four-base subset GATC is perfectly viable, but are there other four-base subsets that are viable? And are any in some sense superior to GATC? For example, is there a subset that uses simpler molecules or less energy to replicate? Or some that allow simpler ribosomes? Are there viable two-base subsets?

And what other possible bases are there? Could there be a base that pairs with itself, so you can have a three-base set?

What a smashing time, cheer astroboffins: Epic exoplanet space prang evidence eyeballed

Torben Mogensen

This happened to Earth

The modern theory of the formation of Earth and its moon is that two planets in near-identical orbits collided and merged, and that the impact ejected a large mass that became the moon.

Roughly 30 years after its birth at UK's Acorn Computers, RISC OS 5 is going open source

Torben Mogensen

Re: arguably other languages better suited to the modern world

Yes, BBC BASIC was only an improvement on what you got for default on home computers at the time, which is almost all cases were BASIC variants, and most often inferior to BBC BASIC. Hard disks were uncommon even at the time Archimedes shipped (with a floppy as standard), so using compilers were impractical -- you wanted programs to load and run without storing extra files on your floppy or using large amounts of memory for compiled code. It was only after I got a hard disk that I started using compilers on my Archimedes. Several compilers existed early on for RISC OS, including Pascal, which was arguably the most popular compiled language at the time (until C took over that role). There was even a compiler for ML, which is a much better language than both Pascal or C.

So, to look at alternatives for BBC BASIC for floppy-only machines, let us consider languages that runs interpreted without too much overhead and which were well-known at the time. Pascal and C are rules out because they are compiled. Forth was mentioned, but only RPN enthusiasts would consider this a superior language. LISP (and variant such as Scheme) is a possibility (LISP was actually available even for the BBC MIcro as a plug-in ROM), but it is probably a bot too esoteric for most hobbyists, and it requires more memory than BASIC and similar languages. Prolog is even more esoteric. COMAL is not much different from BBC BASIC, so this is no strong contender.

Also, BASIC was what was taught in schools, so using a radically different language would have hurt sales. So, overall, I would say BBC BASIC was a sensible choice as the default language. Other languages such as C, Pascal, or ML, were available as alternatives for the more professionally minded (who could afford hard disks and more than 1MB of RAM).

Torben Mogensen

Re: Vector drawing

One thing that I would have wanted for RISC OS is to allow !Draw-files for file/application icons. This would allow them to be arbitrarily scalable, and a bit of caching would make the overhead negligible. It was such caching that made Bezier-curve fonts render in reasonable time on an 8MHz machine. Similarly, you could define cursor shapes as !Draw files to make cursors easily scalable. Thumbnails of files could also be !Draw files.

Another obvious extension would be folder types, similar to file types. As it is, you need a ! in front of a folder name to make into an application, and there are no other kinds of folder than normal or application. Using types, you can make, say, word-processor files into typed folders. For example, a LaTeX folder could contain both .tex, .aux, .log, .toc, and all the other files generated by running LaTeX. I recall that one of the RISC OS word processors used application folders as save files, but that meant that all file names had to start with !. You could get rid of this for both save files and applications by using folder types instead.

For modern times, you will probably need more bits for file types than back in the day. I don't know if later versions of RISC OS has added more bits, bit if not you might as well add folder types at the same time that you add more bits for file types.

Torben Mogensen

A bit too old now.

While I love the GUI, the file system (typed files, applications as folders, uniform display and print graphics, and a modular file system) the font manager and the standard applications (especially Draw), I haven't used RISC OS in over a decade. It lacks a lot of things that are needed for modern desktop/laptop use: Multi-core, pre-emptive multitasking, proper UNICODE support (unless this has been added since last I looked), support for most USB-devices, support for graphics cards, and so on. It is also a problem that it is written in ARM32-assembler. Not only does it make it harder to maintain, it also limits use on ARM64 and other modern systems (except through emulation).

I think the best route would be to build a RISC OS desktop on top of a Linux kernel, rewriting the RISC OS modules and applications in Rust (or C), and use Linux drivers etc. to make it exploit modern hardware.

Relive your misspent, 8-bit youth on the BBC's reopened Micro archive

Torben Mogensen

Re: Damn, daniel!

Your first touch point with this new fangled tech is...Lisp? That's some serious brain engagement.

LISP is not really that complex to learn. The syntax is dead simple (albeit a bit verbose), and you can code with loops and assignments just like in BASIC, if you find that simpler than recursive functions (I don't).

LISP on the BBC was, however, considerably slower than BASIC, which was probably because BBC BASIC was written by Sophie Wilson, who did a lot to optimise it.

Now Microsoft ports Windows 10, Linux to homegrown CPU design

Torben Mogensen

Re: Computer says "No"

"Actually, microcode has been around since CPUs existed. it's how they work internally."

True, but in the beginning microcode interpreted the instructions, whereas modern processors compile instructions into microcode. This gets rid of the interpretation overhead (which is considerable).

Boffins offer to make speculative execution great again with Spectre-Meltdown CPU fix

Torben Mogensen

Re: Speculative versus parallel execution

"You want to have a context switch every time a branch causes a cache miss? That would be a Bad Thing."

It would indeed. But that is not what I say. What I say is that there is a pipeline of instructions interleaved from two or more threads, each having their own registers. No state needs to be saved, and executing every second instruction from different thread is no more expensive than executing instructions from a single thread. The advantage is that functional units can be shared, and since independent threads do not have fine-grained dependencies between each other, instructions from one thread can easily execute in parallel with instructions from another.

This is not my idea -- it has been found for decades in processors (just look for "X threads per core" in specifications). IMO, it is a better approach than speculative execution since it does not waste work (all instructions that are executed will be needed by one thread or another) and it is not considerably more complex than having one thread per core. Note that out-of-order execution is not a problem: That also executes only instruction that are needed, it just does so out of sequence, which requires register renaming, but that is not a huge problem. The main cost is complex scheduling, which increases power use (OOO processors use more energy scheduling instructions than actually executing them).

What speculation gives that these do not is (potentially) much faster execution of a single thread. But to do so, it uses resources that could have been used to execute instructions that are definitely needed. So it improves latency at the cost of throughput. OOO execution improves both at a cost in complexity and power use, and multi-threading improves only throughput, at a small cost in latency, because the two (or more) threads are given equal priority, so each thread may have to wait for others to stop using functional units.

Page:

SUBSCRIBE TO OUR WEEKLY TECH NEWSLETTER

Biting the hand that feeds IT © 1998–2021