Here's the thing.
Suppose I want to buy a gizmo on Amazon. There are a hundred different gizmo brands, and they are sold by a thousand different sellers. Because of this, deciding exactly which gizmo I'm going to buy is not trivial.
If I'm on the website, I can enter 'gizmo' in the search box, then flag a few checkboxes to refine my search, and then scan 20 results.
If I'm on Alexa, I can ask for a gizmo, and then what? What's the voice UX equivalent of reading the filter options and deciding which ones I want to flag? Do I just start qualifying my desired gizmo, without knowing what attributes Amazon's back end is actually capable of searching? Do I listen to an enormous list of flags, and then enunciate the ones I want to activate? How long is that going to take, even in the best case?
When I finally manage to get results, what's the UX equivalent of scanning the list with my eyeballs? Do I listen to the title of each entry? Title + price? Title + price + avg user review? Again, how long is that going to take, even in the best case?
Or is the idea that the AI in Alexa's backend will be able to figure out exactly what I want, just because it knows me so well? Okay, but if that was feasible, then why does the website consistently return tons of irrelevant results in searches? And that's on the website, where I can provide much clearer query parameters. Why would I trust it to work any better by voice? Am I supposed to just say "confirm order" and pray that it has picked a gizmo that's mostly what I want? Not gonna happen.
Alexa as a shopping assistant might, just might, start making sense, when the website search starts working not just "good", but "almost miraculously". Only then it might make sense to attempt a voice shopping assistant.