Re: bottleneck in the memory access pathways.
"the first Alpha chips also had a bottleneck in the memory access pathways. Once the data and code had been loaded into cache it went blindingly fast (for the time),"
Citation welcome. See e.g. McCalpin streams memory performance benchmark(s) from the era.
Bear in mind also that in general, early Alpha system designs (with the exception of 21066/21068, see below) could be based on 64bit-wide path to memory, or 128bit. 128bit was generally faster. The presence or absence of ECC in the main memory design could also affect memory performance.
You *may* be thinking of the 21066/21068 chips. These were almost what would now be called a "system on chip" - a 21064 first-generation Alpha core (as you say, blindingly fast for its time), and pretty much everything else needed for a PC of the era ("northbridge","southbridge", junk IO), all on one passively-coolable 166MHz or 233MHz chip. Just add DRAM. Even included on-chip VGA. In 1994. I'll say that again: in 1994.
Unfortunately it had a seriously bandwidth-constrained DRAM interface, which was a shame.
The 21066/21068 were used in a couple of models of DEC VMEbus boards and the DEC Multia ultra-small Windows NT desktop, which was later sold as the "universal desktop box", because it could run NT (supported), OpenVMS (worked but unsupported), or Linux (customer chooses supported or not). The 21066/21068 weren't used in any close-to-mainstream Alpha systems, not least because of the performance issues.
Alternatively, someone may be (mis?)remembering that early Alpha chips also didn't directly support byte write instructions and the associated memory interface logic, which meant that modifying one byte in a word was a read/modify/rewrite operation. I wonder if this is confusing the picture here.
The compilers knew how to hide this byte-size operation, but in code that used a lot of byte writes, the impact was sometimes visible, which was why hardware support was added quite quickly to the next generation of Alpha designs, and DEC's own compilers were changed to make the hardware support accessible).
As for Mr Chen's original Alpha-related comments: (a) it's hearsay (b) it's somewhat short of logic (at least re hardware constraints), and even shorter on hardware facts.
References include:
Alpha Architecture Handbook (generic architecture handbook, freely downloadable) see e,g,
https://www.cs.arizona.edu/projects/alto/Doc/local/alphahb2.pdf
Alpha Architecture Reference Manuals (2nd edition or later, if you want the byte-oriented stuff)
Digital Technical Journal article on the 21066:
http://www.hpl.hp.com/hpjournal/dtj/vol6num1/vol6num1art5.pdf