The grand age of digital music, along with the grand age of digital just-about-everything, holds one idea as sacred: Files must be as small as possible so we can have moar files on our ipods, or phones, or computers, or anything else. We have taken as truth the axiom “more is better,” and left behind the famous “bigger is better.” Emasculation aside, compression of audio files is definitely the norm, and uncompressed audio is the exception. Do you buy CD’s at a record store, or do you download your music via an online service like itunes (which, with the advent of icloud just became much cooler), amazon, or emusic? Personally, I still buy almost all of my music as CD’s. That may make me sound old, but I’m only 25. My dad doesn’t even buy CD’s anymore. Whatever your preference, I want to do some fun experiments with compressed audio to give you a better idea of what you are (or rather, aren’t) getting.
Lossy or Lossless
In any sort of file compression, there are two main types: lossy and lossless. They are what they sound like; lossy is a compression algorithm where data is lost, and lossless algorithms preserve every bit of data. Jpeg, png, mp3, aac, m4a, gif, and wma are all examples of lossy compression, where zip, flac, alac, and tiff are examples of lossless compression (even though tiff files can be either lossy or lossless). Lossy compression is typically used for distribution while lossless is used for archival purposes. You wouldn’t want your website to have all the images be huge lossless tiff files because it would kill your load times, so the much smaller png and jpeg files are used. But you also wouldn’t want to save that picture you’ve been working on for hours in photoshop as a jpeg, and use that as your backup, you want everything at the full, original quality. The reason I still buy CD’s is because I want to have an uncompressed, and therefore lossless copy of the original, not the less accurate version, even though in most cases it is a little cheaper. I know it’s hard to hear the difference when you’re dealing with a 256kbps aac file, or a 320kbps mp3 file, but I can make those on my own from the CD.
A professor of music at Stanford named Jonathan Berger has been giving a test to his students for years at the beginning of the year, and every year the preference for the mp3 sound goes up. He says they prefer “the sizzle” that mp3’s have, which is caused by mp3’s problem with encoding higher frequencies. Alright, here’s the test, of my own making. I have a clip from Digital Rain off of Star One’s new album Victims of the Modern Age encoded 2 ways, one is a wav, and the other is an mp3. Listen, and rank them in your head. Pay especially close attention to the subtleties: reverbs, cymbal decay, and vocal decay. The answer will be at the bottom of the post so you don’t accidentally see it. You’ll probably notice that both the files are .wav files, but the trick is one was encoded to mp3 first, so now you’ll never know which is which.
The basic theory behind mp3’s (and all other types of lossy compression) is auditory masking. Basically, due to the way we perceive audio, there are sounds that are capable of making other sounds inaudible, masking them from our sense of hearing. A guy named Richard H. Ehmer wrote a whole article on what frequencies affect others, and that’s where the base idea of the mp3 came from. Mp3’s remove from the signal data that we can’t perceive anyways, and then intelligently compress the remaining signal. This is why it’s hard to tell the difference between mp3 and wav. mp3’s do however struggle with frequencies above 16kHz, which is why I asked you to pay attention to cymbal decay and reverb before; it’s more apparent in those things. This is the same reason a lot of people prefer the mp3 sound. Because mp3’s have less high frequency content, they sound “bassier,” and nowadays, that is touted as the greatest thing ever. High fidelity audio does not sound “bassy,” but crisp and clear, which many people think is too tinny. Being able to appreciate good audio takes practice.
The next stage of mp3 was Advanced Audio Coding, or AAC. AAC took the basic mp3 idea and improved it in many ways, making it the preferred way to encode audio. Among other things, AAC has a higher frequency range, can support more channels, and compresses more efficiently, especially at lower bitrates like 128 kbps. When the bitrate is increased, however, the difference between mp3 and aac is not as obvious. Another huge downside to AAC is the availability of players that support encoding, even though that’s becoming more standard, and most mobile devices can play AAC.
I’m actually in the middle of a decision of how I’m going to encode my large-ish CD collection. Lossless is tempting, but players are limited, and I’m having a hard time deciding between the other formats.
Phase Invert Test
So now for the cool stuff. Below I’ve taken a clip from Opeth’s The Grand Conjuration, compressed it three different ways, flipped the phase, and played it along with the original clip. What that does basically is “subtract” the compressed version from the original wav, giving you all the sounds that were excluded from the compression. Originally, I wanted to compare mp3, wma, and aac, but when mp3 is encoded it adds a few milliseconds to the beginning of the file, and lining them up perfectly was difficult. Every time I tried it, I got a slightly different result, so I figured I would just do aac, since that’s the easiest thing for me to encode into. Plus, itunes is the most popular music player anyways.
this is the original clip.
don’t worry, you’re not going crazy, this track is supposed to be silent. Since FLAC is a lossless codec, there is no left over signal, which is a good thing. ALAC (Apple lossless) also has the same result, I just didn’t want to put two silent tracks up.
128 kbps AAC (old itunes store)
That is what is not being encoded into your mp3. You can still hear the song! You could sing along to the song if you wanted!
256kbps AAC (new itunes store)
The interesting thing about this is that it sounds almost exactly the same as the previous one, but it’s just quieter. About 6dB quieter, which means that much more of the original song is in the file.
answers from above: option 2 is the wav, option 1 is the mp3, encoded at 128 kbps