I'm having trouble seeing what the publishers are complaining about - from reading the article I get the impression that the text is only created and viewable by the listener while the same phrase or sentence is being spoken, in which case the text is as transitory as the audio component.
Surely the publishers should have raised their objections at the stage when audio books were first created.