The fundamental things apply
The issue of bandwith (or 'quantity' of information) of electromagnetic radiation is not as complex as the authors of the 'twisted state' stories present it. Let me brake it down for you:
There are four properties of electromagnetic radiation: (i) the amplitude (strength), (ii) the frequency, (iii) the polarization, and (iv) the localization.
The limit of information content as function of (i) is trivial: in the ultimate limit one might use single photons to transmit a '1' and the absence thereof to transmit a '0'. In practice, there is noise to consider.
Point (ii) is a bit tricky to classify: i you want to transmit information with a well-defined frequency (leave the neighboring frequency open for another sender), then you need a long pulse (the wave frequency of a short pulse is not well defined). So communication becomes slow. If you use shorter pulses, you can communicate fast, but need more frequency space for it. In the end, Heisenbergs uncertainty limit tells you how much information you can get through a limited frequency range. Turns out here is a hard limit, it does not matter whether you use short pulses with broad frequency or long pulses with narrow frequency.
Point (iii) is trivial: there are two polarization states for the photon. You could call them horizontal and vertical or right handed and left-handed, but whatever you call them you can create your polarization state of choice by the sum of horizontal and vertical polarized beams. So by using polarization you can double the information content.
Point (iv) is the hardest to grasp, hence the big discussion about 'twisted' states. You can interfere multiple beams, which will lead to interference and a spatial localization of your photons. Indeed, a simple laser beam propagating through space can be properly described by the sum of many spherical waves which only interfere constructively in the forward propagating direction and destroy each other in all other directions. The 'twisted' states are nothing else but a slightly more complex superposition of waves. If you localize the intensity onto multiple detectors, the you can multiplex the information and transmit more thereof. But there is no magic to it, you could create an interference of 20 beams onto 20 detectors to multiplex your data 20-fold, or you could just send 20 beams separately. In the end it comes down to the technical practicality of the transmission and reception setup, but there is no magic increase in information density with those twisted pulses.
Hope this helps.