WASAPI

WASAPI is a low latency driver when used in exclusive mode talks directly to the driver of the audio device. It is Microsoft’s own ASIO.

ASIO is a proprietary protocol. You can only use it your audio device supports it.
WASAPI is integral part of Windows (Vista and higher).
In principle it works with all audio devices.
In practice not all combinations of audio applications and audio device drivers will work correctly using this interface.

 

In exclusive mode, no other application can use the sound card.
No more system sounds at full blast over the stereo!

 

As WASAPI in exclusive mode talks straight to the driver of the audio device, the stream send to this device must match the capabilities of this device in terms of bit depth, sample rate, number of channels and audio format (PCM most of the time) otherwise it is silence.

The application using WASAPI can do this by configuring the audio device to match the source.
In this case we have bit perfect playback.
This allows for automatic sample rate switching as well.

 

Automatic sample rate switching and hardware
In case of a USB DAC (using native mode drivers) you get automatic sample rate switching using WASAPI exclusive.

Most of the time the onboard audio allows for automatic sample rate switching as well.
A lot of discrete sound cards don’t allow automatic switching using WASAPI.
If the discrete sound card comes with ASIO, you better use this driver if you want automatic sample rate switching.

The developer can also choose to adept the source to the capabilities of the audio device.
If the source is mono and the audio device 2 channel, the developer might decide to send the same signal to both channels.
If the sample rate of the source is not supported by the hardware e.g. 192 kHz source with a 96 kHz audio device, the program using WASAPI has to do the SRC (Sample Rate Conversion).
This can be done by calling the SRC provided by Windows or one provided by the application.

Windows audio architecture

Vista has a completely new audio mixing engine, so WASAPI gives you the chance to plug directly into it rather than going through a layer of abstraction. The reasons for the new audio engine are:

  • A move to 32 bit floating point rather than 16 bit, which greatly improves audio quality when dealing with multiple audio streams or effects.
  • A move from kernel mode into user mode in a bid to increase system stability (bad drivers can't take the system down).
  • The concept of endpoints rather than audio devices - making it easier for Windows users to send sounds to "headphones" or record sound from "microphone" rather than requiring them to know technical details about the soundcard's installed on their system
  • Grouping audio streams. In Vista, you can group together all audio streams out of a single application and control their volume separately. In other words, a per-application volume control. This is a bit more involved than might be at first thought, because some applications such as IE host all kinds of processes and plugins that all play sound in their own way.
  • Support pro audio applications which needed to be as close to the metal as possible, and keep latency to a bare minimum. (see Larry Osterman's Where does WASAPI fit in the big multimedia API picture?)

Source: Mark .Net

Windows audio diagram (Vista and higher)

 

By default all sounds are send to the mixer.
The mixer converts the audio to 32 bit float and does the mixing.
The result is dithered and converted back to a format the audio driver accepts (most of the time 16 or 24 bit).

 

The applications sending sound to the mixer must see to it that the sample rate matches the default rate of the mixer. This default is set in the Advanced tab of the audio panel.

Even if the source matches the default sample rate, dithering will be applied.

Q: If you
•don't apply any per-stream or global effects and
•only have one application outputting audio and
•the sample rate and bit-depth set for the sound card matches the material's sample rate
then there should theoretically be no difference to the original because a conversion from even 24-bit integer to 32-bit float is lossless.

 

A: Not quite. Since we can not assure that there was nothing added, no gain controls changed, etc, we must dither the final float->fix conversion, so you will incur one step of dithering at your card's level. As annoying as this is for gain=1 with single sources, we can't possibly single-source in general.

If you don't want even that, there is exclusive mode, which is roughly speaking a memcopy.
J. D. (JJ) Johnston

Exclusive mode

WASAPI in exclusive mode bypasses the audio engine (the mixer).

The conversion to 32 float and the dither as applied by the mixer are avoided.

It also locks the audio driver; no other application can use the audio device.

Shared mode

This is equivalent to DS (Direct Sound).

All audio is send to the mixer.

The application must invoke sample rate conversion if the sample rate differs from the value set in the win audio panel.

Typically, the application is responsible for providing the Audio Engine audio buffers in a format that is supported by the Audio Engine. Audio sample formats consist of the sampling frequency, the bit depth, and the number of channels. The native bit depth of samples that the Audio Engine uses internally is 32-bit float. However, the Audio Engine accepts most integer formats that are up to 32-bits. Additionally, the Audio Engine converts most formats to the floating point representation internally. The Audio Control Panel specifies the required sampling frequency as the “Default format.” The Default format specifies the format that is used to provide the content by the audio device. The number of channels that the Audio Engine supports is generally the number of speakers in the audio device.

Changing the sampling frequency and data bit depth is called sample rate conversion. An application may decide to write its own sample rate converter. Alternatively, an application may decide to use APIs such as PlaySound, WAVE, Musical Instrument Digital Interface (MIDI), or Mixer. In these APIs, the conversion occurs automatically. When it is required, Windows Media Player performs sample rate conversion in its internal Media Foundation pipeline. However, if Windows Media Player is playing audio that the Audio Engine can handle natively, Windows Media Player rebuilds its own pipeline without a sample rate converter. This behavior occurs to reduce the intermediate audio transformations and to improve performance.

Microsoft

 

Event style

WASAPI can be used in push and in pull mode (event style).

A couple of asynchronous USB DAC’s had all kind of problems using push mode due to buffer problems in WASAPI.
This has been solved by using WASAPI – Event style.
The audio device pulls the data from the system.

 

Most of the time you can't choose the mode. It simply depends on how the programmer implemented WASAPI in the media player.

 

The difference between doing push or doing event is only who is responsible to know when the host has to send audio to the hardware.

Event based:

- Host tells API that it wants to be informed when it is the appropiate moment to send audio
- Host might prepare some audio in a separate thread so that it is ready when the API asks for it
- API asks host for more audio
- Host sends the prepared buffer if it was ready, or prepares then the buffer and sends it.

Push based:

- Host tells API that it will ask when it is the appropiate moment to send the audio.
- Hosts prepares some audio so that it is ready when the API is ready.
- Hosts asks the API if it is ready.
- If it is not ready, waits some time, and asks again
- When the API replies that it is ready, the host sends the prepared buffer. It might also prepare the buffer at this time and send it.

[JAZ]

 

WASAPI - Event Style

The output mode lets a sound device pull data from Media Center. This method is not supported by all hardware, but is recommended when supported.

WASAPI - Event Style has several advantages:

  • It lets the audio subsystem pull data (when events are set) instead of pushing data to the system. This allows lower latency buffer sizes, and removes an unreliable Microsoft layer.
  • It creates, uses, and destroys all WASAPI interfaces from a single thread.
  • The hardware (or WASAPI interface) never sees any pause or flush calls. Instead, on pause or flush, silence is delivered in the pull loop. This removes the need for hacks for cards that circle their buffers on pause, flush, etc. (ATI HDMI, etc.).
  • It allows for a more direct data path to the driver / hardware.
  • The main 'pull loop' uses a lock-free circle buffer (a system that J. River built for ASIO), so that fullfilling a pull request is as fast as possible.

WASAPI – JRiver Wiki

Practice

Using WASAPI requires a media player supporting this driver in exclusive mode.
Players like MusicBee or Foobar do, WMP don’t.

 

I do think WASAPI exclusive sounds a bit more transparent than DS (Direct Sound), the Win default audio engine.
However, as all that is send to the audio endpoint must match the capabilities of this device exactly, WASAPI is also more troublesome. The slightest mismatch in number of channels, bit depth or sample rate and it is silence or static.

 

Configure your media player for WASAPI and DS and do a listening test.
If you don’t hear a difference, stick to DS.
If you do hear a difference, use the one you prefer.

 

WASAPI in general don't work with discrete sound cards.

In case of a USB DAC it is the way to go.

Conclusion

WASAPI is a low latency interface to the driver of the audio device.

Bypassing the mixer is all what it does.
It is up to the developer or the user of the application using WASAPI to see to it that the properties of the audio file and the capabilities of the audio device do match.

References
  1. User-Mode Audio Components - MSDN
  2. Exclusive-Mode Streams - MSDN
  3. What's up with WASAPI? - Mark Heath
  4. Where does WASAPI fit in the big multimedia API picture? - Larry Osterman
  5. WASAPI – JRiver Wiki