Believing is hearing  

Visit a couple of audio forums and you will notice that there are as many opinions as forum members. Often very strong opinions, so strong you might call it a passion.

Passion is our driving force it don’t improve our reality testing.
Anybody a bit familiar with perception knows that a lot of what we believe to be 'facts' exist only in our own Plato’s cave.

 

It might be interesting to read a primer about perception. These psychologists do all kind of very funny experiments.
As usual you have an experimental group and a control group.
You give a paper on digital audio to the experimental group and ask them to rate this paper (hopefully not this one). But before you do you tell them a little more about the author. Now this turned out to be one of the founding fathers of digital audio, key role in developing the CD, a highly respected expert in this area, etc, etc.

The control group is asked to rate this paper but you won't tell them anything about the author. Guess what, the experimental group rates this article significantly higher than the control group.


Likewise, give both groups exactly the same text and ask them to rate it. The only difference is, at the bottom of this text there is a name, in one group this name indicates it has been written by a man, in the other group by a woman. Guess what, there is a significant difference in rating and any woman can tell you which one was the lowest.

 

Sean Olive tested loudspeaker preferences using the same speakers in both a sighted and a unsighted test. The sighted tests produced a significant increase in preference ratings for the larger, more expensive loudspeakers

 

The same effect could be observed when the speakers where positioned different.
In a sighted test the visual stimuli obvious camouflage the auditory stimuli.

 

The best way to protect yourself against your own believes is to do an unsighted test.
There are a couple of methods.
Don't ask me to explain the methodological implications; methodology is a rather complex phenomenon.
One thing is obvious: if the differences are big like night and day, blown away, the veil is lifted, the dark gets darker, etc. it must be easy to detect the differences in an unsighted test.

ABX

You have bought a high-end power cord.
Listen to your equipment with the standard power cord; call this A.
Listen to your equipment with the high-end power cord; call this B.
Now you know exactly how A and B sound.


Now ask a friend to do the X, he will connect a power cord at random and keep notes (trial 1=A, trial  2=A, trial 3=B, etc.)
You don't even want to know which one he is using, you are a scientist and you want to establish facts not prejudices.

Your keep notes.

After 16 trials, you compare notes.

If you are above change level, you can hear the differences; you have made the right choice.

If you are not, you know you better don't spend your money on another high-end power cord. You win anyway.

 

Like all unsighted testing, ABX test removes the bias of the listener.
No more, no less.
If  the score proves you can identify A and B correctly, well you have proven that you can hear the difference. Not to be mistaken for A sound better than B, that’s a judgment.
The ABX method is about perceptible differences, not about quality.
If you don’t hear a difference, well you don’t.

You have not proven that there is no difference.

You have proven that you in your specific experimental setup are not able to discern between the two. A different setup or another person might yield different results.

 

A good example of a blind test in practice, comparing speaker cables.

I admire the courage and the honesty of the poster, Mike Lavigne.

People able to admit that they spend over $ 30.000 (thirty thousand) on a speaker wire for zero audible difference are rare.

 

More about ABX: http://www.hydrogenaudio.org/forums/index.php?showtopic=16295

Memory

Echoic memory is one of the sensory memory registers; a component of short-term memory (STM) that is specific to retaining auditory information. This particular sensory store is capable of storing large amounts of auditory information that is only retained for a short period of time (3-4 seconds). This echoic sound resonates in the mind and is replayed for this brief amount of time shortly after the presentation of auditory stimuli.

…….

A short-term memory model proposed by Nelson Cowan attempts to address this problem by describing a verbal sensory memory input and storage in more detail. It suggests a pre-attentive sensory storage system that can hold a large amount of accurate information over a short period of time and consists of an initial phase input of 200-400ms and a secondary phase that transfers the information into a more long term memory store to be integrated into working memory that starts to decay after 10-20s.[5]

Echoic memory – Wikipedia

If our ability to memorize details accurately (and assessing sound quality is often about subtle differences) depends on our short time memory then we have a problem with the experimental setup.
We listen to A then to B and then to a series of X's.
Even if the decay starts after 20 seconds instead of 3-4 seconds we will not have an accurate representation of A en B in our memory for most of the test session.
As a consequence an ABX might yield too many false negatives.

You can test the impact of a small delay here: Sieveking Sound.

If an ABX test shows a statistically significant difference then you can reasonably conclude that there is a difference. Unfortunately, a likely outcome will be that the test will not show a statistically significant difference. In that event it would be improper to conclude that there is no difference. You will be back where you started.
Absence of evidence is not evidence of absence.
Tony Lauck

ABC/HR

ABC/Hidden Reference ( ITU-R BS.1116-1) is pretty much like ABX.
This time the listener also rates the difference on a standardized score.
This method allows for qualifying the difference.

It is a recommended method for testing for small differences between codecs and the original.

 

More: http://wiki.hydrogenaudio.org/index.php?title=ABC/HR

MUSHRA

MUlti Stimulus test with Hidden Reference and Anchors is recommended for audio quality assessment by EBU/ITU-R.
The MUSHRA approach is recommended when there are obvious differences between codecs and original, but small differences between codecs tested.


The listener is presented with the reference (labeled as such), a certain number of test samples, a hidden version of the reference and one or more anchors. The recommendation specifies that one anchor must be a 3.5 kHz low-pass version of the reference. The purpose of the anchor(s) is to make the scale be closer to an "absolute scale", making sure that minor artifacts are not rated as having very bad quality.

http://en.wikipedia.org/wiki/MUSHRA

It is used to compare lossy codecs.

More: http://www.ebu.ch/en/technical/trev/trev_283-kozamernik.pdf

BS.1116

This is method by the ITU for detecting small impairments in audio systems.

It covers almost everuthing like

Experimental setup

All these testing methods have one thing in common, removing the bias of the participant. They won’t prevent you against errors in your experimental setup.
If in an ABX test you hear a significant difference between Toslink and USB.

If Toslink is set to full range and USB to laptop speakers (Windows default), this is not a surprise.

If you overlook these aspects you draw the wrong conclusion using the right method.

More about the right conditions: Statistical Analysis of ABX Results Using Signal Detection Theory

How long should you listen?

Much to my surprise, sometimes very short.

Tomasz, regarding memory-based testing and instant switching:

Such things are determined by the test interface, not necessarily the test procedure. In my opinion, an interface which does not allow the listener to define playback loops of arbitrary length (down to 1 sec or so) or to switch between codecs during playback (maybe with some short crossfading or fade-out and fade-in to prevent clicks) is a bad interface. Regardless of whether it's an ABX, ABX/HR, or MUSHRA test. For me, personally, loops of 2 - 3 sec reduce bias due to my memory to a minimum. Indeed, if I make the loops longer, I start to forget the "sound details" of the beginning of the loop when I reach its end, and that makes it difficult to compare two loops. I usually don't go lower than 2 sec, though, because then the looped segment loses stationarity, and I run into the risk of focusing more on loop effects than on the recording itself. But these numbers vary somewhat from person to person.
C.R.Helmrich

Loudness matching

Loudness matching is very important when comparing products because the perception of timbre, spatial and dynamic attributes are level dependent.

A lot of sales man knows this. Just turn up the volume a little when demonstrating the expensive one does wonders.

Listening position

An important aspect is your seating position.

You listen to your system.

You swap a speaker cable and start listening again.

You do hear a difference.

Your conclusion: you do hear differences between speaker cables.

Another conclusion might be that you do hear a difference because your seating position has changed. In fact, a shift of just 4 inches makes a difference. Not an ‘audiophile’ difference but a measurable difference.

 

Frequency response at two locations four inches apart

Ethan Winer: Why We Believe. A common-sense explanation of audiophile beliefs.

 

Me and my perception

Got some music in a M4A format containing Apple Lossless.
MusicBee, WMP, none of my media players would play them.
Even dBpoweramp wasn’t able to convert them to FLAC (well it did but the content was static).
Obvious the files where corrupted.

Then I found a player (MusiCHI) able to convert them to high bit rate MP3.
As no audio is better than no audio at all, I decided to convert to MP3.
At least I could play them.

 

Sometime later I got the same M4A, this time not corrupted.
Converted them to FLAC.

Could not resist comparing a couple of tracks with their MP3 equivalent.
Listened to the MP3 first, then the FLAC.
The difference was striking; the improved transparency offered by FLAC is obvious.
Far better inner detail, more life like.

 

I tried a second set.
Looking at my screen to select another set I saw the media player had sorted them by track and then by file type.
Obvious the first track was not the MP3 as I thought but the FLAC!

Lesson learned (for the 1000’th time): trust your ears but don’t trust your perception. It will let you hear what you believe.

 

Reference
  1. The Allegory of the Cave - Plato
  2. Links to blind listening tests - Pio2001, Hydrogenaudio
  3. Why We Believe - Ethan Winer
  4. EBU listening test on internet audio codecs - G. Stoll, IRT & F. Kozamernik, EBU
  5. The Dishonesty of Sighted Listening Tests - Sean Olive
  6. Statistical Analysis of ABX Results Using Signal Detection Theory - Jon Boley and Michael Lester, LSB Audio, Lafayette, USA
  7. Visual Psychophysics - Mark E. McCourt. 1997

  8. On Some Biases Encountered in Modern Audio Quality Listening Tests—A Review - Slawomir Zielinskit, Francis Rumsey, Søren Bech
  9. Testing audiophile claims and myths - Head-Fi
  10. Assessing audio quality - Opticom
  11. BS.1116 : Methods for the subjective assessment of small impairments in audio systems including multichannel sound systems - ITU 1997
  12. Your aural memory...did I really hear that? What's Best Forum