* Posts by mr_z

1 post • joined 29 Sep 2014

How the FLAC do I tell MP3s from lossless audio?


The frequency domain adds pre-echo

If you just sample a low-pass filtered signal, even with a bit of jitter, and play it back, you won't add appreciable artifacts. Yes, there will be some, but the artifacts inherent in MP3 and related algorithms are orders of magnitude higher.

MP3 divides the signal into frames, and performs a Modified Discrete Cosine Transform to the signal. This transforms the signal from the time domain to the frequency domain. Then, it compresses the MDCT coefficients by quantizing them, guided by a psycho-acoustic model.

(Psycho-acoustic model means: "We've algorithmically determined that you can't here this thing we're throwing away." It's based on many studies of the masking effects inherent in human hearing, such as not being able to hear certain sounds after a loud plosive sound, etc.)

Quantizing in the frequency domain adds non-causal artifacts to the signal. What do I mean by "non-causal"? You can get what some call a _pre-echo_ before sharp time-domain discontinuities in the input, such as percussive sounds. Pre-echo is what makes percussion sound "muddy" or "blurred". You start to hear a snare hit or cymbal before it's been hit.

That's why I call it non-causal: Analog filtering and properly designed digital filtering don't change the leading edge of a discontinuity; rather, there's an impulse response that appears after the discontinuity. But, with frequency domain quantization, the artifacts get spread to both sides.

You've likely already experienced this elsewhere: highly compressed JPEGs and MPEG video! Take a look at what JPEG and MPEG do to areas of sharp contrast, such as text. You see "sparkles", "ringing" or "mosquito noise" to all sides. Both are based around a similar frequency domain transform, the DCT, and both perform similar quantization, only in two dimensions (horizontal and vertical) rather than one (time).

But the artifacts arise from the same place, mathematically.

If you read the design documents on Ogg Vorbis, they're very sensitive to the issue of pre-echo.

There are other artifacts I can hear in MP3 (especially heavily-compressed MP3) that others don't notice. There's burbles, the occasional tone that sounds like Morse code, and so on. These too are artifacts of popping to the frequency domain and quantizing frequencies to varying degrees.

As for the idea that "most people hear differently:" Because I've worked with our digital video folks, I'm quite sensitive to video artifacts, including DCT artifacts, but also spatial domain quantization (resulting in "contouring") and so forth. My wife and friends never really noticed many of these until I started pointing them out. Now they hate me for "ruining" them. ;-)

All that said: I can definitely hear artifacts in 128kbps CBR recordings, fewer in 192kbps VBR recordings, and rarely or never in 256kbps or 320kbps. At 320kbps, you're only compressing about 3.5:1 or so, and so you're leaving most of the signal intact.

Likewise, I rarely notice JPEG artifacts on something compressed with 90% or higher quality, but then the compression rate also drops significantly compared to lower quality levels. At that point, if it has a lot of text, you may be better off with PNG anyway.


Biting the hand that feeds IT © 1998–2020