Still humans in the mix here not just machines
"it should be possible to identify html or xml just by looking at the first few characters" so you have to open the file??? Trusting the OS to identify it is recipe for disaster and doesn't work in a text interface.
How about you have a simple convention, say a 3 letter label on each file so it's easily human readable in every context to at least get an idea of what it contains BEFORE you look inside it.
They never should've been hidden imho, that way lies pain and confusion.