The basic problem isn't physical like what are your eyes or hands doing? It's where is your mind focused? Is it focused on the outside environment and what is going on or is it focused internally on some discussion picturing who's speaking or what they are speaking about or what the image of that coffee shop looks like and the neighborhood around it? It doesn't matter if it's a heads up display or a phone conversation or a text message.
The difference between that and auto radio is that all of those are two way communications where a response is expected. So your attention must go into composing a response while auto radio is only one way and no response is needed or expected. You can sing along or shake, rattle, and roll but a response is not expected of you so there's no need to compose a response.