Linear distortion

Linear distortion testing in Loudspeakers

No topic is probably more cryptic for the average DIY'er. What I'll try to do is explain in a plain (i.e. non mathematical) English what is it, why it's important, and how do you measure it. As I mentioned in my other primer, on nonlinear distortion, there are two types of distortion, nonlinear and linear distortion.

Nonlinear distortion is said to occur when the output waveform has any frequency components not present in the original signal. If the electrical drive signal on a woofer's terminal is a 50 Hz signal, any frequencies besides 50 Hz on the acoustic output side (i.e. the sound from the speaker) is termed nonlinear distortion.

If you haven't read my primer, take a look at it for more detail.

Linear distortion is said to occur if the system has a nonflat amplitude transfer function or if the group delay is not zero or constant.

What? "I thought you said it would be simple?" OK, let's simplify that a bit.

Linear distortion can occur for two reasons.

The first is a nonflat amplitude transfer function. Oh, come on, you know what this is. It's called frequency response. It's just a graph of the reproduced amplitude as a function of frequency (as opposed to amplitude as a function of time-the time domain). Don't let this stuff confuse you.

The second is a bit more confusing and has to do with the phase shift that can occur. A signal has an amplitude, but it also has a phase characteristic. If the amplitude relationships are reproduced correctly, but the phase relationships are not, this can cause linear distortion. A certain amount of phase shifting between frequencies occurs wherever there is nonflat frequency response. But a device can have a flat amplitude transfer function and still have this phase shifting going on between adjacent frequencies. This is the crux of the debate regarding "transient perfect" crossovers.

So, what's the big deal. Who cares if the frequency response isn't perfectly flat? And everyone knows the slight phase distortion that occurs with a typical 2nd or 4th order LR crossover isn't audible, right?

Well, here's the big deal.

Linear distortion is transient distortion

It has to be a big deal. It's in 14 point type and bold.

All poor humor aside, if you forget everything else, remember this. If there is significant amplitude variation, or phase variation (as defined above) in a given frequency range, every transient that has frequency components in this range will be distorted. The amount of distortion and it's audibility will vary, but this principle will not. Don't forget it.

Before we veer off on a transient, uh, tangent, let me go back and I'll make a simple analogy. That's always a bit dangerous, because analogies are never completely correct. What the heck though. I think it's important because phase confuses people. Phase turns folks off. Try it at a cocktail party if you don't believe me.

Think of a signal as having 3 major characteristics. A frequency, or set of frequencies, an amplitude, or a set of amplitudes and a phase, or set of phases. This is analogous to a word. A word is made of a letter or letters-these would be the frequencies. The letters also have a certain size, or amplitude. (Capital, lower case, font, etc.) They also have a certain order. Dog is not the same as Dgo, even though the letters and capitalization is the same. The order is analogous to the phase. So to get the word right, all of these need to be correct.

To reproduce a signal, amplitude, phase, and frequency have to be correct.

Got it.

Now, what you really want to know. How does variation in amplitude, phase, or frequency distort a transient? Well, to answer that, you have to define a transient. Let's start very loosely by saying a transient is any signal with a defined start and stop point. There, that pretty much covers every real world signal. But not every signal. What it doesn't cover is a mathematical definition, say, of a sine wave that goes from negative infinity to positive infinity. Anything shorter than this is a transient.

Why am I babbling on about this? Because of a guy who lived in the 18th century who made a rather bold claim that any repeating periodic signal can be constructed out of a series of sine waves of different amplitudes, frequency and phase. It's a fascinating idea and mathematicians spent the better part of the century trying to decide if this were true. And it is. So, to wear out my analogy a bit more, every repeating periodic signal can be thought of as a word made up of letters which are the different sine wave components. A transient can be made into a repeating periodic signal by stringing the transients together from negative infinity to positive infinity. So, in essence, a transient is also made of different sine wave components of varying frequency, amplitude, and phase.

A transient, by definition, is not one frequency. It is a spread of frequencies. There is no such thing as a transient of 1k. It may be centered at 1k, but it consists of a spread of frequencies. The only signal that has energy at one frequency is a single frequency sine wave starting at negative infinity and going to positive infinity. So, the crash of a cymbal, the "t" in "bat"-these are all transients that have sine wave components spread across a spectrum of frequencies.

OK, my head is starting to hurt. Let's have some practical examples and pretty graphs to look at. Quick.

Well, let's take a typical spoken transient. The sound "t" in bat, as above. Or bath, bad, red, bed, shed. You get the idea. These consonants are typically 5-20ms in duration. You can record one and use it as a transient, but it's simpler and more reproducible to make one up. There are a legion of ways to make a transient, but let's keep it very simple. Let's just window a single sine wave with some type of envelope. Specifically, we'll take a cosine shaped "envelope" multiplied by the 1k signal. Below you can see three different window lengths of a 1k signal.

In a sense, these are all 1k, but they are not all the same, are they? If I say, look at the graph with the 1k signal, which one am I referring to? The "fundamental frequency" is 1k, but something else is going on. Like I said, really what these graphs represent is a 1k fundamental plus a series of sine waves at different frequencies, all added together to get these resultant graphs.

Just to hammer home the point, here's an expanded graph.

The 4ms transient in aqua has come and gone before the 100ms transient has started. In fact, the 20ms transient in red is almost gone before the 100ms signal has any significant amplitude.

"They still all look like 1k signals to me?"

Still don't believe me about that "it's really a series of multiple frequencies thing? Well, I can experimentally show it. Let's feed each one of these bursts through my "spectrum" analyzer to see the range of frequencies that each signal contains. What's a spectrum analyzer? It's just a device that analyzes a "real" time domain signal and looks at how much energy is distributed across the frequency spectrum. Like those bar graph displays on some graphic equalizers, but a bit more sophisticated.

Now watch what happens.

All the graphs are centered on 1k. But they are not just single lines suggesting one frequency. And, the width of the graphs is different. Think of the area under each curve representing the distribution of frequency energy. The 100ms burst has almost all of its frequency energy in a very small region between 950-1050 Hz. On the other hand, there is significant energy at frequencies below 700 and above 1.4k for the FFT of the 4ms burst shown as the aqua curve.

Notice something else. Not only do the graphs show a spread, or spectrum of frequency content, but the content spreads out as the transient gets narrower. This leads to another one of those statements in bold.

The narrower the transient in the time domain, the wider the frequency spread in the frequency domain.

Specifically, take a look at the 4ms windowed 1k sine wave. There is significant frequency content from 600 Hz to 1.4k and beyond.

Again, the key point here is that all transients contain a spectrum of frequency energy.

Let's take a break and let that settle in. We'll consider something else for a minute.

Take a look at this graph.

What is it? Well, it's a fantasy graph. I made in using Soundeasy (version 10). I started with a perfectly flat FR, then added a 12dB/octave high pass filter at 50 Hz and an 18dB/octave low pass filter at 20k. This is arbitrary. Just pretend I have the perfect full range driver. Now, I also put a small dip and peak in around 1k. Why? Because I'm trying to simulate a real-world driver irregularity, say the frequency response irregularity associated with a cone edge resonance. Note it's not a very big dip or peak. The dip is approximately 1.15 dB, as is the peak. Plus or minus 1.15 dB isn't much. Very real world.

Now, take a look at this graph.

This is a "perfect" 4 cycle cosine squared windowed burst at 1k. Just remember this is a brief, somewhat arbitrary 4 ms transient, maybe what the consonant "t" in the word "bat" might look like. (Well, it's not really; it's very oversimplified. But the basic concept holds-it's a brief transient like one we might hear in music or speech.)

Now, the $64,000 question. What does the "perfect" 1k transient above look like when we play it through our fantasy driver with the little dip/peak at 1k.

It looks like this.

Notice the difference. In the "perfect" input transient, the signal's value has settled to zero by 4 ms. In the driver's acoustic response, the value does not settle for quite a bit longer. In fact, the waveform has trouble following the original after about 2.5-3 ms.

We can look at these another way, with something called an ETC curve. It basically looks at the amount of energy in the system as a function of time. The ETC curve of the input signal (the "perfect 1k 4ms transient) looks like this.

Notice how the curve drops very quickly, and symmetrically around the peak a 2ms.

Now look at the ETC of the response through the fantasy driver.

Aha! You're playing a 1k tone. Of course is distorts since the FR abnormality is exactly at 1k.

OK, then let's do this. Let's use a 1.4k sine wave and window it to 4 cycles. It looks like this.

Now, let's feed this signal through my fantasy driver. Note that the frequency curve is close, but not quite back to normal by 1.4k. Nonetheless, look what happens when we measure my driver's theoretical acoustic response.

Well, it isn't bad. Still, the ideal curve settles by ~2.8ms, while there is significant low level energy tailing out to 5-6ms. Still, it isn't normal. So, to spell it out, here we have a 1.4k transient affected by an anomaly at 1k

The corresponding ETC curve looks like this. Now it looks even worse.

A quick note about the ETC curves. Why do it this way? Well, according to SL, there is some study data that the distortion is audible if the curve suggests energy storage earlier than 30dB. So this transient "smearing" would be on the edge of audibility. Now, I have not seen the study data myself, so I'll reserve complete judgment on this. I would just keep it in the back of my mind, so to speak.

Now, a bit more explaining how the curves above were generated. No actual, real, physical drivers were harmed, or otherwise used to generate the above graphs. Turns out you can do this all mathematically. In very general terms, if you have a device and know it's impulse response, then you can generate it's response to any time domain signal through a technique known as convolution. What I've done is modeled the frequency response in Soundeasy and generated the corresponding minimum phase curve. Then, I imported the data into Praxis, and I used the inverse Fourier Transform to generate the corresponding impulse response. Once you have the impulse response, you can convolve, or, in my case, convolute the 4 cycle transients with the impulse response, again, using Praxis. And viola, you know exactly what the output of the system would be.

But what's really happening on a physical level in all of this? What does that wiggle at 1k represent on the fantasy driver curve. Well, like I said, maybe it's the cone edge resonance. Hmmm, a resonance. Resonances. Don't they store energy? That's exactly what is happening. A loudspeaker is a mechanical system with moving parts, springs and masses, and the like. Turns out my driver has a bit of resonance at 1k and stores a bit of energy when excited by 1k frequency energy. The transient stops quickly, as transients are prone to do, but the loudspeaker unit has stored up a bit of energy in a itsy bitsy resonance. It has to release this energy. And that's why the time domain curves "ring" a little bit and the ETC curves drift above the ideal one. It's also why the frequency response curve has a wiggle. It's all 3 sides of the same coin.

Let's look at some more curves for a couple of interesting examples.

Let's do the same thing and feed a 4 cycle 1k burst through each of these fantasy drivers.

This is the time response to the black driver (the high Q peak curve).

Rings like a bell. Here's the corresponding ETC curve

How about aqua driver? He's the low Q peak guy.

Hey, not bad. Even the ETC looks good-though not perfect. (I used a different scale on these ETC curves-5dB/div instead of 10dB/div. So it's not quite as good as it looks. And the ETC above is even worse than it looks.)

Now, what about the red speaker? (The high Q dip.) Everyone knows dips are less offensive than peaks, right?

Whoa, this little guy is ringing and distorted as well. Almost as bad as the high Q peak. And the ETC curve below, well, that stinks.

So a dip can cause ringing and smearing of transients as bad as a peak. Dips are not innocuous. However, audibly, dips sound less offensive than peaks even though the "smearing" of the signal is the same. Why? well, because the peak is, well, louder than the dip. Say you're listening to Diana Krall's sibilance. The ideal signal is "ss." When filtered through a peak, you hear "SSSS." When filtered through a dip, you might hear "ssss." Each signal is smeared in the time domain an equivalent amount, but the peak is yelling at you and brings attention to itself. The dip is more subtle. But dips are still a problem. They can be one of the reasons folks complain about there center channel dialogue problems. All those syllabic transients being slurred by peaks and dips. (Whether it's an inherent peak or dip from the driver or a peak or dip from the crossover system design-that is irrelevant. They both cause the same problem. It's the final system response that counts.)

Now, finally, how do you measure this stuff and why is it important?

Well, as you can guess, the first and most common way to look at linear distortion is to look a a frequency response curve. Everyone does this already, so they're paying attention to linear distortion already, even if they aren't thinking about it. However, the frequency curve must be very detailed. The peak and dip in the first fantasy driver I generated was modest and might easily not show up on a short window, in room measurement. Manufacturer's smoothed curves won't help either. This is most problematic below 1-2k, in an average home FFT based, window measurement. As an adjunct, looking at an in-box nearfield plot below 1k can be useful as well.

CSD plots can also be used to detect linear distortion-ridges in the CSD reflect areas of delayed energy release. CSD plots can be made to look like, well, anything you want. To be truely useful, CSD plots have to be made under very similar conditions. In general, it's hard to compare one manufacturer's curve to another. Also, remember that there is no "new" information in a CSD plot that isn't in the FR curve. It's just processed in a different way. That means that CSD plots are subject to the same resolution limitations as the frequency response curve.

Using shaped tonebursts and measuring in the time domain can be helpful. These stay in the time domain, so to speak, and sometimes it's easier to interpret them. For instance, you can measure and see a reflection in the time domain more easily. If you do a frequency response curve or CSD and leave reflections in the impulse, it often becomes very difficult to interpret the resultant frequency irregularities and ridges on a CSD plot. However, it becomes a bit of a question as to what, really, is a true transient and what time domain signal should be used as a measure.

In summary, linear distortion is caused by amplitude or phase variation and causes transient distortion. As with nonlinear distortion, the amount that is audible is open to debate, but it does provide a scientific basis for setting a flat FR curve as the goal, and preservation of phase relationships as well. Linear distortion information is obtained from high quality frequency response and CSD plots, as well as looking directly at signals in the time domain. No one single test can evaluate this fully, just as no one test can clearly evaluate a driver in terms of nonlinear distortion.

Odds and ends.

The point has been made that linear distortion, unlike nonlinear distortion, can be equalized out. While in theory this is true, it's not a trivial matter to equalize out multiple, irregular dips and peaks, distinguishing between resonances and interference. It does mean that, if you are designing a passive system, you would weigh linear distortion less heavily as you go out to the stopband, since the overall linear distortion will likely be dominated by the crossover effects. Remember, it's the linear distortion of the system that counts. It doesn't hurt to start with more linear drivers, all things considered.

If you've read Dr. Toole's paper, you'll note that low Q resonances are also quite problematic. Meaning that high Q resonances, though visually offensive, may not be as audibly so. Likewise, poor performance on a single toneburst test may not be as audibly offensive as the plots would suggest. I suggest you try to take all my driver data as a whole and be very careful about overemphasizing or focusing on a single test.

As I briefly mentioned above, and also covered in Dr. Toole's paper is the distinction between a resonance and an interference effect. A resonance exists on all axis and the on and all off axis plots will contain the frequency irregularity. An irregularity from the baffle edge or some other interference will generally not be as present, or not present at all on other axis and so it's final contribution to the overall sound you hear will be less. Just looking at a CSD or FR curve on one axis will not allow you to distinguish this. You have to do multiple samples at different spatial points. Using a shaped toneburst may allow you to distinguish reflections, assuming the reflection doesn't get buried in the original waveform.

If you still want to buy a low frequency driver with "fast bass," then you didn't really understand any of this and need to start again from the beginning.

What about step and square wave testing? I thought those, and the CSD determined transient performance. Well, yes they do look at transient performance/linear distortion. But I find these difficult to interpret.