An introduction to computer audio
The USB audio class implementation uses the isochronous mode of data transfer over the USB bus, with three possible types of synchronization:
synchronous, adaptive and asynchronous.
I will go over these in detail a little further down, covering how they work and their jitter ramifications.
But first some detail on isochronous mode and covering some myths about USB. The USB bus works very differently than an ethernet connection works and this has given rise to some erroneous assumptions I hear a lot of around here. First off the host (computer) is in complete control of the bus, there is no way too many devices trying to talk at once can "overload" the bus such as can happen on an ethernet. Data is sent out in frames that go out every millisecond. This happens whether there is any data in them or not. The rate at which the frames go out is determined by the oscillator in the host transmitter, not by the speed at which the computer is running, or the software running on it. Some piece of software busily running in the background cannot "slow down" the bus. It might prevent a program from keeping up with the bus (providing data to it) or accepting data from it, but the data rate on the bus stays constant (or at least fairly constant within the limits of the oscillator in the host PHY)
Now on to isochronous, this means that the host reserves bandwidth on the bus for an endpoint. (your DAC is an endpoint) This is easy for it to do since it is in complete control of the bus, nobody else can "steal" it. It doesn't matter that you have a 300gig hard drive, a scanner, a mouse or whatever else on the bus, once an isochronous endpoint is setup they all have to work around it, they get what's left over. If you try and setup too many isochronous endpoints such that the bus cannot handle it, the host will not allow them all.
One interesting tidbit about isochronous streams is that there is no error correction. There is a rudimentary error detection, but no mechanism for doing anything about it, no retries or ECC codes. I read somewhere that its estimated that for a 44.1 stream running 24 hours a day there will be an error about once a month or so. The standards committee did not consider this worth doing anything about. I guess if you detect an error you can either flash a light on the front panel to let people know an error occurred, or you can just play the previous sample. (or maybe interpolate with the next)
Now on to the fun part, the synchronization modes. In all cases the data from the bus goes into a buffer and gets clocked out by a clock, how that clock is generated and how it interacts with the bus is the differences between the modes.
Synchronous: in this mode the readout clock is directly derived from the 1KHz frame rate. There is a PLL that takes in the start of frame signal and generates a clock. Using this scheme its rather difficult to generate 44.1, but very easy to generate 48KHz. This is a primary reason why many early USB audio devices only supports 48KHz, they used this mode. As you can guess this mode is very susceptible to jitter on the bus, pretty much anything that causes the output from the host to be jittered (PS noise, vibrations, interference etc) AND things that can cause jitter on the interconnect (interference, reflections, ground noise etc) will wind up with jitter on the readout clock. This is a VERY poor mode to use for decent quality audio.
Adaptive: in this mode the clock comes from a separate clock generator (usually implemented as a PLL referenced by a crystal oscillator) that can have its frequency adjusted in small increments over a wide range. A control circuit (either hardware or firmware running on an embedded processor) measures the average rate of the DATA coming over the bus and adjusts the clock to match that. Since the clock is not directly derived from a bus signal it is far less sensitive to bus jitter than synchronous mode, but what is going on on the bus still can effect it. Its still generated by a PLL that takes its control from the circuits that see the jitter on the bus. Its a lot better than synchronous mode, but still not perfect by a long shot. This is the mode that MOST USB audio devices use today.
Asynchronous: in this mode an external clock is used to clock the data out of the buffer and a feedback stream is setup to tell the host how fast to send the data. A control circuit monitors the status of the buffer and tells the host to speed up if the buffer is getting too empty or slow down if its getting too full. Note this is still isochronous, the host is continuously sending samples, there is no "per packet handshake" going on. Since the readout clock is not dependant on anything going on with the bus, it can be fed directly from a low jitter oscillator, no PLL need apply. This mode can be made to be VERY insensitive to bus jitter.
Source: John Swensen