* Posts by Torben Mogensen

451 posts • joined 21 Sep 2006


Toshiba formally and finally exits laptop business

Torben Mogensen


I remember Toshiba best for their ultra-tiny laptops -- the Libretto range. I had a Libretto 50CT -- 210×115×34 mm with a screen smaller than many modern smart phones. A curiosity was that it had a mouse "nub" besides the screen and mouse buttons on the back of the lid. So you would use your thumb to move the cursor and index and middle fingers to operate the buttons. It was great for taking along when travelling.

See more at https://www.youtube.com/watch?v=7HQt6EwA0JE

A tale of mainframes and students being too clever by far

Torben Mogensen

Ray-tracing on a Vax

When I was doing my MSc thesis about ray-tracing in the mid 1980's, we didn't have very good colour screens or printers, so to get decent images, you had to use dithering, where different pixels were rounded differently to the available colours to create dot-patterns that average to the true colour. One such technique is called error distribution: When you round a pixel, you divide the rounding error by 4 and add this to the next pixel on the same row and the three adjacent pixels in the next row. This way, the colours in an area would average to the true colour.

I ran the ray-tracer program (written in C) on the department Vax computer, but I had an annoying problem: At some seemingly random place on the image, a "shadow" would appear making all the pixels below and to the right of this be really odd colours. I looked at my code and could see nothing wrong, so I ran the program again (this would take half an hour, so you didn't just run again without looking carefully at the code). The problem re-appeared, but at a different point in the image! Since I didn't use any randomness, I suspected a hardware fault, but I needed more evidence to convince other people of this. I used a debugger and found that, occasionally, multiplying two FP numbers would give the wrong result. The cause of the shadow was that one colour value was ridiculously high, so even after distributing the error to neighbouring pixels, these would also be ridiculously high, and so on.

To make a convincing case, I wrote a short C program that looped the following:

1. Create two pseudo-random numbers A and B.

2. C = A*B; D=A*B;

3. if (C != D) print A, B, C, and D and stop program.

This program would, on average, stop and print out different values for C and D after one or two minutes of running (but never with the same two numbers), and this convinced the operators that there was something wrong, and they contacted DEC. They sent out some engineers, and they found out that there was a timing problem where the CPU sometimes would fetch the result of the multiplication from the FPU slightly before it was ready, so by increasing the delay slightly, they got rid of the problem.

Apple to keep Intel at Arm's length: macOS shifts from x86 to homegrown common CPU arch, will run iOS apps

Torben Mogensen

Acorn welcoming Apple to the RISC club

When the PPC-based Macs first came out in 1994, Apple claimed to be the first to use a RISC processor for a personal computer. Acorn, that had done so since 1987, ran an ad welcoming Apple to the club (while pushing their RISC PCs).

If Acorn still existed, they could run a similar add welcoming Apple into the ARM-based PC club.

Torben Mogensen

Re: RIP Hackintosh

"I do feel like Apple are consistently missing a trick here - not everyone can afford to buy their expensive hardware"

Apple is primarily a hardware company, and they actively oppose running their software on other hardware, as the software is mainly there to sell the hardware. Same reason they don't license iOS to other phone makers.

Torben Mogensen

"The only ones with a problem are people buying a Mac exclusively to run Windows."

That used to be a thing, when Macs were known for superior hardware and design, but these days you can get Wintel laptops with similar design and build quality for less than equivalent Macs. So while a few people may still do it, it is not as widespread as it was 15 years ago. It is certainly not enough to make a dent in Apple's earnings if they switch to other brands.

Besides, if you have already bought a Mac to run Windows, you should not have any problems: Windows will continue to run just fine. Just don't buy one of the new ARM-based Macs to run Windows, buy Asus, Lenovo, or some of the other better PC brands.

Torben Mogensen

Re: Keyword here is "maintained"

"Compare Apple's approach to Windows and the differences are clear: Windows runs code from 30 years ago, and likely will continue to run it unchanged."

30 year old (or even 10 year old) software should run fast enough even with a naive cross-compilation to modern ARM CPUs. I occasionally run really old PC games on my ARM-based phone using DosBox, and that AFAIK uses emulation rather than JIT compilation.

And running really old Windows programs (XP or earlier) is not really that easy on Windows 10.

Torben Mogensen

Re: Rosetta

One of the things that made Rosetta work (and will do so for Rosetta 2) is that OS calls were translated to call natively compiled OS functions rather than emulating OS code using the old instruction set. These days, many apps use dynamically loaded shared libraries, and these can be precompiled so apps spend most of their time in precompiled code, even if the apps themselves are written in x86. Also, with multicore CPUs, JIT translation can be done on some cores while other cores execute already-compiled code.

But the main advantage is that nearly all software is written high-level languages, so they can essentially be ported with just a recompilation. The main thing that hinders this is that programs in low-level languages like C (and Objective C) may make assumptions about memory alignment and layout that may not be preserved when recompiling to another platform. Swift is probably less problematical in this respect. And these days you don't need to buy a new CD to get a recompiled program from the vendor -- online updates can do this for you. So moving to an ARM-based Mac is much less of a hassle for the user than the move from PPC to x86 (and the earlier move from 68K to PPC).

Moore's Law is deader than corduroy bell bottoms. But with a bit of smart coding it's not the end of the road

Torben Mogensen

Re: Dennard scaling

Video compression is not really highly serial. The cosine transforms (or similar) used in video compression are easily parallelised. It is true that there are problems that are inherently sequential, but not as many as people normally think, and many of those that are are not very compute intensive. It is, however, true that not all parallel algorithms are suite for vector parallelism, so we should supplement graphics processors (SIMD parallelism) with multi-cores (MIMD parallelism), but even here we can gain a lot of parallelism by using many simple cores instead of few complex cores.

But, in the end, we will have to accept that there are some problems that just take very long time to solve, no matter the progress in computer technology.

Torben Mogensen

Dennard scaling

The main limiter for performance these days is not the number of transistors per cm², it is the amount of power drawn per cm². Dennard scaling (formulated in the 1970s, IIRC) stated that this would remain roughly constant as transistors shrinks, so you could get more and more active transistors operating at a higher clock rate for the same power budget as transistors shrink. This stopped a bit more than a decade ago: Transistors now use approximately the same power as they shrink, so with the same amount of transistors at smaller areas you get higher temperatures, which requires more fancy cooling, which requires more power. This is the main reason CPU manufacturers stopped doubling the clock rate every two years (it has been pretty much constant at around 3GHz for laptop CPUs for the last decade). To get more compute power, the strategy is instead to have multiple cores rather than faster single cores, and now the trend is to move compute-intensive tasks to graphics processors, which are essentially a huge amount of very simple cores (each using several orders of magnitude fewer transistors than a CPU core).

So, if you want to do something that will benefit a large number of programs, you should exploit parallelism better, in particular the kind of parallelism that you get on graphics processors (vector parallelism). Traditional programming languages (C, Java, Python, etc.) do not support this well (and even Fortran, which does to some extent, requires very careful programming to do so), so the current approach is to use libraries of carefully coded code in OpenCL or CUDA and call these from, say, Python, so few programmers would even have to worry about parallelism. This works well as long as people use linear algebra (such as matrix multiplication) and a few other standard algorithms, but it will not work if you need new algorithms -- few programmers are trained to use OpenCL or CUDA, and using these effectively is very, very difficult. And expecting compilers for C, Java, Python etc. to automatically parallelize code is naive, so we need languages that from the start are designed for parallelism and do not add features unless the compiler knows how to generate parallel code for these. Such languages will require more training to use than Python or C, but far less than OpenCL or CUDA. Not all code will be written in these languages, but the compute-intensive parts will, while things such as business logic and GUI stuff will be written in more traditional languages. See Futhark.org for an example of such a language.

On a longer term, we need to look at low-power hardware design, maybe even going to reversible logic (which, unlike irreversible logic, has no theoretical lower bound of power use per logic operation).

ALGOL 60 at 60: The greatest computer language you've never used and grandaddy of the programming family tree

Torben Mogensen

The influence of ALGOL 60

You can see the influence of ALGOL 60 on modern languages most clearly if you compare programs written in 1960 versions of ALGOL, FORTRAN, COBOL, and LISP (which were the most widespread languages at the time). The ALGOL 60 program will (for the most part) be readily readable by someone who has learned C, Java, or C# and nothing else. Understanding the FORTRAN, COBOL, or (in particular) LISP programs would require a good deal of explanation, but understanding the ALGOL 60 program would mainly be realising that begin, end, and := correspond to curly braces and = in C, Java, and C#. Look, for example, at the Absmax procedure at https://en.wikipedia.org/wiki/ALGOL_60#Examples_and_portability_issues

FORTRAN and COBOL continued to evolve into something completely unlike their 1960 forms while still retaining their names -- even up to today. ALGOL mainly evolved into languages with different names, such as Pascal, C, CPL, Simula and many others. So ALGOL is not really more of a dead language than FORTRAN II and COBOL 60. There was a computer scientist in the late 60s that was asked "What will programming languages look like in 2000?". He answered "I don't know, but I'm pretty sure one of them will be called FORTRAN". This was a pretty good prediction, as Fortran (only name change is dropping the all caps) still exists, but looks nothing like FORTRAN 66, which was the dominant version at the time. You can argue that the modern versions of FORTRAN and COBOL owe more to ALGOL 60 than they do to the 1960 versions of themselves.

Torben Mogensen


BNF is essentially just a notation for context-free grammars, and originated in Noam Chomsky's work on linguistics. While compiler books and parser generators use somewhat different notations, they are still BNF at heart. EBNF (extended Backus-Naur form) extended the notation with elements from regular expressions (but using a different notaion), including repetition ({...}), optional parts ([...]) and local choice (...|...).

Happy birthday, ARM1. It is 35 years since Britain's Acorn RISC Machine chip sipped power for the first time

Torben Mogensen

Sophie Wilson post about early ARM history from 1988

I saved a USENET post that Sophie Wilson made in November 1988 about the early ARM days. Since USENET is publicly archived, I don't think there are any IP issues with showing this. I think many of you will find the following interesting:

From: RWilson@acorn.co.uk

Newsgroups: comp.arch

Subject: Some facts about the Acorn RISC Machine

Keywords: Acorn RISC ARM

Message-ID: <543@acorn.UUCP>

Date: 2 Nov 88 18:03:47 GMT

Sender: andy@acorn.UUCP

Lines: 186

There have now been enough partially correct postings about the Acorn RISC

Machine (ARM) to justify semi-official comment.


ARM is a key member of a 4 chip set designed by Acorn, beginning in 1984, to

make a low cost, high performance personal computer. Our slogan was/is "MIPs

for the masses". The casting vote in each design decision was to make the

final computer economic.

The chips are (1) ARM: a 32 bit RISC Microprocessor; (2) MEMC: a MMU and

DRAM/ROM controller; (3) VIDC: a video CRTC with on chip DACs and sound; and

(4) IOC: a chip containing I/O bus and interrupt control logic, real time

clocks, serial keyboard link, etc.

The first ARM (that referred to by David Chase @ Menlo Park) was designed at

Acorn and built using VLSI Technology Inc's (VTI) 3 micron double level metal

CMOS process using full custom techniques; samples, working first time, were

obtained on 26th April 1985. The target clock was 4MHz, but it ran at 8. The

timings that David gives are for the ARM Evaluation System, where ARM was run

at 3.3MHz and 6.6MHz (20/3) for initial and page-mode DRAM cycles,

respectively. The ARM comprises 24,000 transistors (circa 8,000 gates). Every

instruction is conditional, but there are neither delayed loads/stores nor

delayed branches (sorry, Martin Hanley). Call is via Branch and Link (same

timing as Branch). All instructions are abortable, to support virtual memory.

The first VIDC was obtained on 22nd Oct 1985, the first MEMC on 25th Feb 1986,

and the first IOC 30th Apr 1986. All were "right first time".

We then redesigned ARM to make it go faster (since, by this time, Acorn had

decided roughly what market to aim the completed machines at and 8MHz minimum

capability was required - but we did continue to develop software on the 3

micron part!). Some more FIQ registers were added, bringing the total to 27

(some of our "must go as fast as possible for real time reasons" code didn't

manage with the smaller set). A multiply instruction (2 bits per cycle,

terminate when multiplier exhausted so that 8xn multiply takes 4 cycles max)

and a set of coprocessor interfaces were added. Scaled indexed by register

shifted by register (i.e. effective address was ra+rb<<rc) was removed from

the instruction set (too hard to compile for) [scaled indexed by register

shifted by constant was NOT removed!].

The new, 2 micron ARM was right first time on 19th Feb 1987. It's peak

performance was 18MHz; its die size 230x230 mil^2; 25,000 transistors.

VTI were given a license to sell the chips to anyone. They renamed the chips:

VL86C010 (ARM), VL86C110 (MEMC), VL86C310 (VIDC), VL86C410 (IOC).

Acorn released volume machines "Acorn Archimedes" in June 1987. Briefly:

A305: 1/2 MByte, 1MByte floppy, graphics to 640x514x16 colours

A310: ditto, 1MByte

A310M: ditto with PC software emulator (circa a PC XT, if you're interested)

A440: 4MByte, 20MByte hard disc, 1152x896 graphics also.

All machines have ARM at 4/8MHz (circa 5000 dhrystones 1.1), 8 channel sound

synthesiser, proprietry OS, 6502 software emulator, software.... Prices

between 800 and 3000 pounds UK with monitor and mouse and all other useful

bits. Not available in the US, but try Olivetti Canada.

VTI make ARM available as an ASIC cell. Sanyo have taken a second source

license (in April 1988) for the chip set, and make a 32 bit microcomputer

(single chip controller). In "VLSI Systems Design" July 1988, the following

statements are made by VTI: ARM in 1.5 micron (18-20MHz clock), 180x180 mil^2;

future shrink to 1 micron (they are expecting "perhaps 40MHz" and 150 mil

square with the price dropping from $50 to $15); expected sales in 1988

90-100,000 units.

Contact Ron Cates, VTI Application Specific Logic Products Division,

Tempe, Arizona for details (e.g. the "VL86C010 RISC Family Data Manual").

Plug in boards for PCs are available. A controller for Laser printers

with ARM, MEMC, VIDC and 4MBytes DRAM has been sold to Olivetti [Acorn'

parent company as of 1985-6] (contact SWoodward@acorn.co.uk if you want to

know more).

In the Near Future:

We have a Floating Point Coprocessor interface chip working "in the lab" - the

fifth member of the four chip set. It interfaces an ATT WE32206 to ARM's

coprocessor bus. It benchmarks at 95.5 KFlops LINPACK DP FORTRAN Rolled BLAS

(slowest) (11KFlops with a floating point emulator) on an A310. Definitely

have to make our own, some time...

Acorn is about to release UNIX 4.3BSD including TCP/IP, NFS, X Windows and

IXI's X.desktop on the A440. Contact MJenkin@acorn.co.uk or

DSlight@acorn.co.uk for more info (and to be told that it isn't available in

the US {yet}).

Operating Systems:

Acorn's proprietry OS "Arthur" is written in machine code: it fills 1/2MByte

of ROM! (yes, writing in RISC machine code is truly wonderful as others have

noted on comp.arch). Its main features are windows, anti-aliased fonts

(wonderful at 90 pixels per inch - I use 8 point all the time) and sound

synthesis. It runs on all Archimedes machines. A 2nd release is due real soon

now and features multitasking, a better desktop and a name change to RISC OS.

VTI are porting VRTX to the ARM; Cambridge (UK) Computer Lab's Tripos has been

ported to A310/A440. UNIX has been ported by Acorn: see above. There are MINIX

ports everywhere one looks (try querying the net...).


C Compiler: ANSI/pcc; register allocation by graph colouring; code motion;

dead code elimation; tail call elimination; very good local code generation;

CSE and cross-jumping work and will be in the next release. No peepholing (yet

- not much advantage, I'm afraid). Can't turn off most optimisation features.

Also FORTRAN 77, ISO PASCAL, interpreted BASIC (structured BBC BASIC, very

fast), Forth, Algol, APL, Smalltalk 80 (as seen at OOPSLA 88: on an A440 it

approximates a Dorado) and others (LISP, Prolog, ML, Ponder, BCPL....).

Specific applications for Archimedes computers are too numerous to mention!

(though the high speed Mandelbrot calculation has to be seen to be believed -

one iteration of the set in 28 clock ticks [32 bit fixed point] real time

scroll across the set [calculate row/column in a frame time and move the


There is a part of the net that talks about Archimedes machines:


Random Info:

Code density is approximately that of 80x86/68020. Occasionally 30% worse

(usually on very small programs).

The average number of ticks per instruction 1.895 (claims VTI - we've never

bothered to measure it).

DRAM page mode is controlled by the MEMC, but there is a prediction signal

from the ARM saying "I will use a sequential address in the next cycle" which

helps the timing a great deal! S=125nS, N=250nS with current MEMC and DRAM

(see David Chase's article for instruction timing). Static RAM ARM systems

have been implemented up to 18MHz - S=N=1/18 with these systems.

Approximately 1000 dhrystones 1.1 per MHz if N=S; about 1000/1.895 dhrystones

per MHz if N=2S (i.e. 5K dhrystones for a 4/8MHz system; 18K dhrystones for

an 18/18MHz system).

Most recent features: Electronic Design Jul 28 1988, VLSI Systems Design July


We had a competition to see who would use "ra := rb op rc shifted by rd" with

all of ra, rb, rc and rd actually different registers, but the graphics people

won it too easily!

ARM's byte sex is as VAX and NS32000 (little endian). The byte sex of a 32 bit

word can be changed in 4 clock ticks by:

EOR R1,R0,R0,R0R #16

BIC R1,R1,#&FF0000

MOV R0,R0,ROR #8

EOR R0,R0,R1,LSR #8

which reverses R0's bytes. Shifting and operating in one instruction is fun.

Shifted 8bit constants (see David Chase's article) catch virtually everything.

Major use of block register load/save (via bitmask) is procedure entry/exit.

And graphics - you just can't keep those boys down. The C and BCPL compilers

turn some multiple ordinary loads into single block loads.

urn some multiple ordinary loads into single block loads.

MEMC's Content Addressable Memory inverted page table contains 128 entries.

This gives rather large pages (32KBytes with 4MBytes of RAM) and one can't

have the same page at two virtual addresses. Our UNIX hackers revolted, but

are now learning to love it (there's a nice bit in the standard kernel which

goes "allocate 31 pages to start a new process"....)

Data types: byte, word aligned word, and multi-word (usually with a

coprocessor e.g. single, double, double extended floating point).

Neatest trick: compressing all binary images by around a factor of 2. The

decompression is done FASTER than reading the extra data from a 5MBit


Enough! (too much?) Specific questions to me, general brickbats to the net.

.....Roger Wilson (RWilson@Acorn.co.uk)

DISCLAIMER: (I speak for me only, etc.)

The above is all a fiction constructed by an outline processor, a thesaurus

and a grammatical checker. It wasn't even my computer, nor was I near it at

the time.

Fomalhaut b exoplanet may have been cloud in a trench coat: Massive 'world' formed after 'mid-space super-prang'

Torben Mogensen

Or a comet?

Bright. Not emitting heat. Disperses over time. That sounds suspiciously like a large chunk of ice that sheds its surface as it nears a star. Otherwise known as a comet.

'I give fusion power a higher chance of succeeding than quantum computing' says the R in the RSA crypto-algorithm

Torben Mogensen

About voting machines and blockchain: https://xkcd.com/2030/

WebAssembly: Key to a high-performance web, or ideal for malware? Reg speaks to co-designer Andreas Rossberg

Torben Mogensen

Re: WaSm to you too

The "populos" have already accepted Wasm, since it is included as standard in the major browsers. So, no I don't think it will take much convincing. Another thing is that Wasm is extremely simple and designed for sandboxing from the start, so it can not encrypt your files. It can be used to mine bitcoin, as all that requires is some CPU time and the ability to send short messages to a web server somewhere. But it does not have access to your own bitcoins.

Wasm has a complete formal specification, which means that undefined or implementation-specific behaviour (which is often used as an attack vector) is avoided. I trust Wasm much more than Javascript, which is only somewhat safe because browsers have become good at detecting bad behaviour in Javascript.

Moore's Law isn't dead, chip boffin declares – we need it to keep chugging along for the sake of AI

Torben Mogensen

Hot Chips, indeed

The major problem with cramming more transistors into chips is that if they all operate at the same time (which is a requirement for more performance), they will generate a lot of heat and consume a lot of power. The generated heat requires more cooling, which increases the power usage even more. There is a reason that your graphics card has a larger heat sink than your CPU, and the article talks about many more processing elements than on a GPU (albeit a bit simpler).

So rather than focusing on speed, the focus should be on power usage: Less power implies less heat implies less cooling. One option is to move away from silicon to other materials (superconductors, for example), but another is to use reversible gates: They can, in theory at least, use much less power than traditional irreversible gates such as AND and OR, and you can build fully capable processors using reversible gates. But even that requires a different technology: Small-feature CMOS uses more power in the wires than in the gates, so reducing the power of the gates does not help a lot. Maybe the solution is to go back to larger features (at the end of the Dennard scaling range).

Poor old Jupiter has had a rough childhood after getting a massive hit from a mega-Earth

Torben Mogensen

Alternative explanation

Another possibility is that, when the core was massive, it had sufficient fissionable material to start a nuclear reaction that pushed the material apart again. This could be an ongoing thing: Fissionable materials collect the centre, react, disperse, collect again, and so on.

I have absolutely no evidence for this theory, though.

Packet's 'big boy' servers given a shot in the Arm with 32-core, 3.3GHz Ampere CPUs

Torben Mogensen

Re: Could do with that in a laptop.

If you are willing to have a 7kg laptop with a battery life of 30 minutes and a noisy fan running constantly, then by all means.

If you're worried that quantum computers will crack your crypto, don't be – at least, not for a decade or so. Here's why

Torben Mogensen

Re: 6,681 qubits?

Spontaneous collapse of the quantum state to a classical state is indeed a major problem for quantum computing. And the problem increases not only with the number of entangled qubits, but also with the number of operations performed on these. And to crack codes with longer keys, you not only need more qubits, you also need more operations on each.

The simple way to avoid quantum computers cracking your code is just to increase the key length -- if 256 bit keys become crackable in 10 years (which I doubt), you just go to 1024 bit key length, and you will be safe for another decade or more. Unless some giant breakthrough is made that will make quantum computers scale easily, and I seriously doubt that.

That doesn't mean that quantum computers are pointless. They can be used for things such as simulating quantum systems and for quantum annealing. But forget about cracking codes or speeding up general computation. You are better off with massively parallel classical computers, and to avoid huge power bills, you should probably invest in reversible logic, which can avoid the Landauer limit (a thermodynamic lower bound on the energy cost of irreversible logic operations).

In a galaxy far, far away, aliens may have eight-letter DNA – like the kind NASA-backed boffins just crafted

Torben Mogensen

Which subsets are viable?

So, these people have "shown" that a DNA variant with eight bases is viable. But I find it more interesting to know which subsets of these eight bases are viable. Obviously, the four-base subset GATC is perfectly viable, but are there other four-base subsets that are viable? And are any in some sense superior to GATC? For example, is there a subset that uses simpler molecules or less energy to replicate? Or some that allow simpler ribosomes? Are there viable two-base subsets?

And what other possible bases are there? Could there be a base that pairs with itself, so you can have a three-base set?

What a smashing time, cheer astroboffins: Epic exoplanet space prang evidence eyeballed

Torben Mogensen

This happened to Earth

The modern theory of the formation of Earth and its moon is that two planets in near-identical orbits collided and merged, and that the impact ejected a large mass that became the moon.

Roughly 30 years after its birth at UK's Acorn Computers, RISC OS 5 is going open source

Torben Mogensen

Re: arguably other languages better suited to the modern world

Yes, BBC BASIC was only an improvement on what you got for default on home computers at the time, which is almost all cases were BASIC variants, and most often inferior to BBC BASIC. Hard disks were uncommon even at the time Archimedes shipped (with a floppy as standard), so using compilers were impractical -- you wanted programs to load and run without storing extra files on your floppy or using large amounts of memory for compiled code. It was only after I got a hard disk that I started using compilers on my Archimedes. Several compilers existed early on for RISC OS, including Pascal, which was arguably the most popular compiled language at the time (until C took over that role). There was even a compiler for ML, which is a much better language than both Pascal or C.

So, to look at alternatives for BBC BASIC for floppy-only machines, let us consider languages that runs interpreted without too much overhead and which were well-known at the time. Pascal and C are rules out because they are compiled. Forth was mentioned, but only RPN enthusiasts would consider this a superior language. LISP (and variant such as Scheme) is a possibility (LISP was actually available even for the BBC MIcro as a plug-in ROM), but it is probably a bot too esoteric for most hobbyists, and it requires more memory than BASIC and similar languages. Prolog is even more esoteric. COMAL is not much different from BBC BASIC, so this is no strong contender.

Also, BASIC was what was taught in schools, so using a radically different language would have hurt sales. So, overall, I would say BBC BASIC was a sensible choice as the default language. Other languages such as C, Pascal, or ML, were available as alternatives for the more professionally minded (who could afford hard disks and more than 1MB of RAM).

Torben Mogensen

Re: Vector drawing

One thing that I would have wanted for RISC OS is to allow !Draw-files for file/application icons. This would allow them to be arbitrarily scalable, and a bit of caching would make the overhead negligible. It was such caching that made Bezier-curve fonts render in reasonable time on an 8MHz machine. Similarly, you could define cursor shapes as !Draw files to make cursors easily scalable. Thumbnails of files could also be !Draw files.

Another obvious extension would be folder types, similar to file types. As it is, you need a ! in front of a folder name to make into an application, and there are no other kinds of folder than normal or application. Using types, you can make, say, word-processor files into typed folders. For example, a LaTeX folder could contain both .tex, .aux, .log, .toc, and all the other files generated by running LaTeX. I recall that one of the RISC OS word processors used application folders as save files, but that meant that all file names had to start with !. You could get rid of this for both save files and applications by using folder types instead.

For modern times, you will probably need more bits for file types than back in the day. I don't know if later versions of RISC OS has added more bits, bit if not you might as well add folder types at the same time that you add more bits for file types.

Torben Mogensen

A bit too old now.

While I love the GUI, the file system (typed files, applications as folders, uniform display and print graphics, and a modular file system) the font manager and the standard applications (especially Draw), I haven't used RISC OS in over a decade. It lacks a lot of things that are needed for modern desktop/laptop use: Multi-core, pre-emptive multitasking, proper UNICODE support (unless this has been added since last I looked), support for most USB-devices, support for graphics cards, and so on. It is also a problem that it is written in ARM32-assembler. Not only does it make it harder to maintain, it also limits use on ARM64 and other modern systems (except through emulation).

I think the best route would be to build a RISC OS desktop on top of a Linux kernel, rewriting the RISC OS modules and applications in Rust (or C), and use Linux drivers etc. to make it exploit modern hardware.

Relive your misspent, 8-bit youth on the BBC's reopened Micro archive

Torben Mogensen

Re: Damn, daniel!

Your first touch point with this new fangled tech is...Lisp? That's some serious brain engagement.

LISP is not really that complex to learn. The syntax is dead simple (albeit a bit verbose), and you can code with loops and assignments just like in BASIC, if you find that simpler than recursive functions (I don't).

LISP on the BBC was, however, considerably slower than BASIC, which was probably because BBC BASIC was written by Sophie Wilson, who did a lot to optimise it.

Now Microsoft ports Windows 10, Linux to homegrown CPU design

Torben Mogensen

Re: Computer says "No"

"Actually, microcode has been around since CPUs existed. it's how they work internally."

True, but in the beginning microcode interpreted the instructions, whereas modern processors compile instructions into microcode. This gets rid of the interpretation overhead (which is considerable).

Torben Mogensen

Re: CPU meets EPIC / GPGPU hybrid

Actually, the description reminded me a lot of EPIC/Itanium: Code compiled into explicit groups of instructions that can execute in parallel. The main difference seems to be that each group has its own local registers. Intel had problems getting pure static scheduling to run fast enough, so they added run-time scheduling on top, which made the processor horribly complex.

I can't say if Microsoft has found a way to solve this problem, but it still seems like an attempt to get code written for single-core sequential processors to automagically run fast. There is a limit to how far you can get on this route. The future belongs to explicitly parallel programming languages that do not assume memory is a flat sequential address space.

Boffins offer to make speculative execution great again with Spectre-Meltdown CPU fix

Torben Mogensen

Re: Speculative versus parallel execution

"You want to have a context switch every time a branch causes a cache miss? That would be a Bad Thing."

It would indeed. But that is not what I say. What I say is that there is a pipeline of instructions interleaved from two or more threads, each having their own registers. No state needs to be saved, and executing every second instruction from different thread is no more expensive than executing instructions from a single thread. The advantage is that functional units can be shared, and since independent threads do not have fine-grained dependencies between each other, instructions from one thread can easily execute in parallel with instructions from another.

This is not my idea -- it has been found for decades in processors (just look for "X threads per core" in specifications). IMO, it is a better approach than speculative execution since it does not waste work (all instructions that are executed will be needed by one thread or another) and it is not considerably more complex than having one thread per core. Note that out-of-order execution is not a problem: That also executes only instruction that are needed, it just does so out of sequence, which requires register renaming, but that is not a huge problem. The main cost is complex scheduling, which increases power use (OOO processors use more energy scheduling instructions than actually executing them).

What speculation gives that these do not is (potentially) much faster execution of a single thread. But to do so, it uses resources that could have been used to execute instructions that are definitely needed. So it improves latency at the cost of throughput. OOO execution improves both at a cost in complexity and power use, and multi-threading improves only throughput, at a small cost in latency, because the two (or more) threads are given equal priority, so each thread may have to wait for others to stop using functional units.

Torben Mogensen

Speculative versus parallel execution

Speculative execution is basically a way to make sequential computation faster. When the processor has to wait for, say, a condition to be decided, it makes a guess as to the outcome and starts working from that guess. If it guesses right, you save time, if not, you both lose time (for clean-up) and waste heat (for doing wasted work). You can try to work on multiple different outcomes simultaneously, but that is more complicated, and you will definitely waste work (and heat).

Speculative execution relies on very precise predictions, and these cost a lot in resources for gathering and storing statistics and analysing these. The bottom line is that speculative execution is very costly in terms of complexity and energy.

Another solution is to pause execution until the outcome is known. While this pause lasts, you can have another thread use the execution units. This is called multi-threading, and is usually implemented by having an extra (or several) copy of all registers, and schedule instructions from two (or more) threads simultaneously. You only execute instructions that are guaranteed to be needed, so there is no speculation. You can even have both threads execute instructions simultaneously, if there are no resource conflicts. The scheduling unit is somewhat more costly, as it has to look at more instructions, but it is not as bad as the complexity of speculative execution. The downside is that each thread does not run faster than if it ran alone on a processor without speculative execution, but the throughput of instructions is likely higher than this case. If the threads share cache, there is a risk of information spillage, so you generally limit this to threads from the same program.

The next step is to make multiple cores, each with their own cache. If the memory is protected (and cleared when given to a new process), this can be made safe from leakage, it scales better than multi-threading, and the complexity is lower. This is part of the reason why the trend is towards more cores rather than faster single cores. In the extreme, we have graphics processors: A large number of very simple cores that do no speculation and no out-of-order execution and which even share the same instruction stream. Sequential execution on these is horribly slow, but the throughput is fantastic, as long as you can supply a suitable workload. It is nigh impossible to make C, Java, and similar languages run fast on graphics processors, so you either need specialised languages (https://futhark-lang.org/) or call from C or Java library routines written in very low-level languages and hand-optimised.

In conclusion, the future belongs to parallel rather than speculative execution, so you should stop expecting your "dusty decks" of programs written in C, Java, Fortran, etc. to automagically run faster on the next generation of computers.

Intel outside: Apple 'prepping' non-Chipzilla Macs by 2020 (stop us if you're having deja vu)

Torben Mogensen

Re: What was wrong with m68k anyway?

Apple moved from 68K to PowerPC because there was no high-performance 68K processor. PowerPC promised (and delivered) higher performance than 68K, at least in the foreseeable future. At that time, Apple was mainly about desktop machines, so power use was not all-important.

The move to x86 was allegedly motivated by lower power use for the same (or higher) performance, which was required for laptop use. Competition between Intel and AMD had driven an arms race for more power for less power, and Apple could ride on that.

A move to ARM can be partially motivated by a desire for lower power use, but it is more likely so Apple can build their own ASICs, as they have done for iPhone, and so more code can be shared between iOS and MacOS.

Torben Mogensen

About time

I have long been expecting this move, and I I'm surprised it hasn't happened earlier. Using the same CPU on iPhones and Macs will simplify a lot of things for Apple, as will having the ability to make their own chips combining CPUs with coprocessors of their own choice instead of relying on the fairly limited choice that Intel offers.

With the advent of 64-bit ARMs, integer performance is similar to x86 performance, and due to the smaller core size, you can fit more cores onto a chip, increasing overall performance. Where ARM CPUs have lagged behind Intel is in floating-point performance, but that may not be important for Apple. And if it is, they have a license that allows them to make their own FPU to go alongside the ARM cores. In any case, the most FP-intensive tasks are rapidly moving from classical FPUs to GPUs, so as long as Apple supplies their Macs with GPUs that runs OpenCL at decent speed, sequential FP performance may not matter much. In general, single-core performance (whether integer or FP) is becoming less and less important as the number of cores grow: To get high performance, you have to code for multiple cores, regardless of whether you code for Intel or ARM.

As for running legacy code, this can be done with just-in-time binary translation: The first time a block of x86 code is executed, it is emulated by interpretation, but a process is at the same time started on another core that translates the x86 code to ARM. As soon as this translation finishes, the code will run compiled when next executed. There might even be multiple steps: Interpretation, simple translation, and optimised translation, each being started when the previous form has been executed sufficiently often that it is expected to pay off.

Programming languages can be hard to grasp for non-English speakers. Step forward, Bato: A Ruby port for Filipinos

Torben Mogensen

Someone (I don't recall who) once said something along the lines of "If they ever build a computer that can be programmed in English, they will find that people can't program in English". The point being that the level of precision required for instructing a computer is far beyond most people even when using their native language -- or maybe in particular when using their native language.

As a side note, I recall that BBC BASIC had a a "colour" command, while most other BASICs had a "color" command.

Torben Mogensen

Re: Oh, the irony!

"If memory serves me right Simula and Erlang are both Scandinavian."

Yes. Simula is/was Norwegian and Erlang is/was Swedish, but both became international efforts once they gained popularity.

Torben Mogensen

Re: Nothing new here

In Algol 68, keywords were distinguished from identifiers by case Keyword are upper case (or boldface or quoted) and identifiers lower case. This allowed non-English versions of Algol 68 just by providing a table of keyword names. And since there is no overlap with identifiers, the code could automatically be converted to use English keywords (or vice-versa) without risk of variable capture.

Similarly, in Scratch keywords are just text embedded in graphical elements, and changing the bitmaps of these graphical elements can change the language of the keywords without affecting other parts of the program, and the same program will be shown with English keywords in an English-language Scratch system and in Japanese (or whatever) in a Japanese Scratch system, because the internal representation does not include the bitmaps.

But I agree that, unless the programming language attempts to look like English (COBOL, AppleScript, etc.), the language of the keywords matter next to nothing, as long as the letters used are easily accessible from your keyboard. There are programming languages with next to no keywords (APL being an extreme example), and (apart from sometimes requiring special keyboards), they are not really more or less difficult to learn than languages with keywords in your native language (what makes APL difficult to learn is not its syntax). An exception may be children, which is why Scratch allows "reskinning" the graphical elements.

Death notice: Moore's Law. 19 April 1965 – 2 January 2018

Torben Mogensen

HmmYes writes: "To be honest, Moores law died about 2005ish.

Youve not really seen much in the way of clock speeds beyond 2-3G."

What you observe here is not the end of Moore's Law, but the end of Dennard scaling.

Torben Mogensen

Speculative execution

The root of Spectre and Meltdown is speculative execution -- the processor trying to guess which instructions you are going to execute in the future. While this can increase performance if you can guess sufficiently precisely, it will also (when you guess incorrectly) mean that you will have to discard or undo work that should not really have been done in the first place. On top of that, accurate guesses aren't cheap. Some processors use more silicon for branch prediction than they do for actual computation.

This means that speculative execution is not only a security hazard (as evidenced by Meltdown and Spectre), but it also costs power. Power usage is increasingly becoming a barrier, not only for mobile computing powered by small batteries, but also for data centres, where a large part of the power is drawn by CPUs and cooling for these. Even if Moore's law continues to hold for a decade more, this won't help: Dennard scaling died a decade ago. Dennard scaling is the observation that, given the same voltage and frequency, power use in a CPU is pretty much proportional to the area of the active transistors, so halving the size of transistors would also halve the power use for similar performance.

This means that, to reduce power, you need to do something other than reduce transistor area. One possibility is to reduce voltage, but that will effectively also reduce speed. You can reduce both speed and voltage and gain the same overall performance for less power by using many low-frequency cores rather a few very fast cores. Making cores simpler while keeping the same clock frequency is another option. Getting rid of speculative execution is an obvious possibility, and while this will slow processors down somewhat, the decrease in power use (and transistor count) is greater. As with reducing clock speed, you need more cores to get the same performance, but the power use for a given performance will fall. You can also use more fancy techniques such as charge-recovery logic, Bennet-clocking, reversible gates, and so on, but for CMOS this will only gain you a little, as leakage is becoming more and more significant. In the future, superconductive materials or nano-magnets may be able to bring power down to where reversible gates make a significant difference, but that will take a while yet.

In the short term, the conclusion is that we need simpler cores running at lower frequencies, but many more of them, to get higher performance at lower power use. This requires moving away from the traditional programming model of sequential execution and a large uniform memory. Shared-memory parallelism doesn't scale very well, so we need to program with small local memories and explicit communication between processors to get performance. Using more specialised processors can also help somewhat.

Arm Inside: Is Apple ready for the next big switch?

Torben Mogensen

About time, I think

When Apple started making their own ARM chips, I predicted that they would move to ARM on the Mac line also. It has taken longer than I expected, but Apple has good reasons for this:

1. It would make Macs the ideal tool for developing iPhone software, as it can be made to run it without emulation.

2. More parts of the OS can be shared between Mac and iPhone.

3. It allows Apple to make a SoC to exactly their specifications instead of relying on what Intel produces.

4. It removes dependency on Intel (or AMD).

It is not impossible for Apple to make a 64-bit ARM processor that will outperform the fastest Intel processor. I'm sure Apple would love having the fastest laptops around, so people would migrate from Wintel to Apple for performance reasons. Apple need to do more work on the FP side to make this happen, but it is not impossible.

Amazon to make multiple Lord of the Rings prequel TV series

Torben Mogensen

Could go either way

There is potential for disaster, but if handled well, it could be good. I think the best period for an initial run is the period between the Hobbit and LotR, as mentioned earlier. I'm not sure Moria will work well as a main storyline -- the retaking is probably not all that interesting, and the fall happens rather late in the time line (since Gimli is not aware of it when the Fellowship enters Moria). The rangers fighting orcs and goblins up north is probably a better idea, but with a better storyline than the "War in the North" game. It could feature a young Aragorn so there is some name recognition.

I'm not sure the story of Túrin Turambar (The Children of Hurin) will work on TV, nor Beren and Lúthien. The tale of Númenor definitely takes place over too long a time frame to work on TV.

ASUS smoking hashes with 19-GPU, 24,000-core motherboard

Torben Mogensen

Re: Those scientists...

Machine learning of almost any kind is sufficiently parallelisable to exploit such a monster. Deep learning neural networks are very popular these days, and they need lots of processor power. It is already running on graphics processors for that reason. DNA analysis too.

Faking incontinence and other ways to scare off tech support scammers

Torben Mogensen

Quick solution

While playing elaborate pranks on the scammers may be fun, you are wasting your own time as well as theirs -- and your time is probably much more valuable, to you at least.

So when I get a call from someone claiming to be from the Microsoft Tech Support Centre or some such, I just say "No, you're not" and hang up.

81's 99 in 17: Still a lotta love for the TI‑99/4A – TI's forgotten classic

Torben Mogensen

Double interpretation overhead

The main reason that BASIC on the TI99/4a was slow was that the BASIC interpreter was not written in assembly language (which would not have been difficult, as the TMS9900 was much easier to program than 8-bit alternatives such as 6502 or Z80), but in a language called GPL (Graphics Programming Language), which was compiled to a byte code that was interpreted by the CPU. I estimate the overhead of using interpreted byte code to be 5-10 times, so a BASIC interpreter written directly in assembly language would have sped up the BASIC enormously -- depending on what you do, though. For some operations such as floating-point calculation or graphics primitives, the overhead is relatively small, but for integer calculations it is pretty hefty. Games that are written in assembly language are not affected by this, but I still find it a curious design decision -- it made the TI99/4a compare very badly to other home computers in BASIC benchmarks, which is what most magazines used to compare speed of home computers.

UK prof claims to have first practical blueprint of a quantum computer

Torben Mogensen


"the machines will be able to do things like factor very large prime numbers"

That is not very difficult. A prime number factorises to itself, no matter how large it is.

What is meant is that it can (in theory) factor products of very large primes.

Also, the D-wave is not a universal quantum computer, but specialised to do simulated annealing. There probably was a remark about that in an earlier version of the article, since there is an orphaned footnote explaining annealing. The D-wave is similar to analogue computers that can also solve some optimisation problems very quickly, and there is debate about whether D-wave actually uses quantum effects at all or if it is just a fancy analogue computer.

For $deity's sake, smile! It's Friday! Sad coders write bad code – official

Torben Mogensen


has this comment on the matter: https://xkcd.com/1790/

Coming to the big screen: Sci-fi epic Dune – no wait, wait, wait, this one might be good

Torben Mogensen

SciFi Channel version

SciFi Channel made a low-budget, but decent adaptation as a TV miniseries (http://www.imdb.com/title/tt0142032/), followed by a somewhat-higher-budget version of Dune Messiah/Children of Dune as another miniseries (just called "Children of Dune"), which was also quite decent.

I agree that a single film is not enough to give a decent treatment of the book. A GoT-scale TV-series would be best, but a film trilogy could also work. Then one film for Messiah and another trilogy for Children. If all succeed, one film for each of the following books (God Emperor, Heretics, and Chapter House) is a possibility.

Google man drags Emacs into the 1990s

Torben Mogensen

Re: Already in the 1980s

You could, if you added sideways RAM. In any case, most games used lower resolution screen modes not only to save space but also to make updates faster.

Torben Mogensen

Already in the 1980s

Double buffering was widely used on home computers in the early 1980s -- I remember doing it on my BBC micro, and most games did it to get smoother updates. I suspect the method is much earlier than that.

WebAssembly: Finally something everyone agrees on – websites running C/C++ code

Torben Mogensen

O.K. concept

I have long wished for a low-level, ubiquitous browser language that allows static types/checks (unlike Javascript) and which is not sauced in object-oriented legacies such as null-pointers everywhere, downcasts, dynamic calls, and so on. Something that is as well suited for ML or Haskell as it is for Java or C#.

There is a lot of research in typed assembly language, proof-carrying code, and so on, that allow static verification of safety properties without relying on sandboxing. Something like that would be great. I don't know enough about WebAssembly to decide if it does that, but I suspect not.

Perlan 2: The glider that will slip the surly bonds of Earth – and touch the edge of space

Torben Mogensen

Re: This makes me wonder . .

The payload of this glider is two people and life support for these in addition to instruments for sampling air. So my guess is 300-400 kg. That could be enough to carry a small rocket that could reach space, but probably not enough to get anything into orbit. Using a balloon to carry a rocket to the edge of the atmosphere seems more practical.

Science non-fiction: Newly spotted alien world bathes in glow of three stars

Torben Mogensen

Re: habitable?

From the orbit of Pluto (which is a similar distance to our sun as this planet is from its main star), our sun just looks like a very bright star. The two other stars are even further away, so there is very little light indeed.

It is plausible that the planet is still somewhat hot, since it is only 16 million years old, and it is possible that tidal heating might warm some moons. But there is little chance that life has had time to evolve: Initially, the planet would be too hot for life, so if it has the right temperature for life now, it has only had that for a couple of million years, which is probably not enough. The moons are not much better off.

There are much better candidates for life among the known exoplanets.

Three-bit quantum gate a step closer to universal quantum computer

Torben Mogensen

Re: quantum memcopy?

A Fredkin gate can in theory copy qubits: Use the qubit as control and apply 0 and 1 on the two inputs. The control will be unchanged but one of the outputs will be a copy of the control (and the other will be its negation).



Biting the hand that feeds IT © 1998–2020