@Michael
You really need to read the paper, the Reg's description has no useful information about how it works or in fact what it does.
To the extent I understand the paper, it appears to filter its frequency bins in the time domain. By assuming each bin is dominated (or in the special case only contains) a single frequency it's trivially easy to sample the remaining sine wave to deduce its exact frequency and phase. The filter design seems to be about cleanly splitting the bins so that its safe to further process signals in just each bin rather than the entire source sample.
One worry is the algorithm is probabilistic, the quoted times are *expected runtime* and they don't make clear what the worst case is or how small a sample would provoke them. For audio encoding I can see an adaptive search seeded by the previous frame would fix this and this might be useful in real life with masking effects allowing quite ruthless frequency decimation.
For video encoding they misrepresent the nature of DCT block based coding. We don't pick a number of fourier addends up front and throw away all the others (the 57 of 64 argument), we compute them all *then* let the arithmetic decimate them dynamically and automatically. With no search being made, an inherently messier signal and a lower ability to simply discard components compared to audio it looks like a bad fit. I'd say that would scale to non-block based encodings as well.