I’ve been developing a project that aims to leverage the DSP processing capabilities of FMOD in Unity game/ virtual environment development. I am also looking to employ the capabilities of a network-based Immersive Sound system.
Essentially, I’m looking to pass audio samples across to a server which can then playback the samples using the ASIO SDK.
To this end, I’ve generated a custom FMOD DSP plugin that has an embedded socket client that communicates with the server on every process callback. A sample buffer is filled with all the samples in the process callback function loop, and is transmitted after the “processing” is complete. Essentially, each FMOD event (bound to object or not) will have a unique instance of the DSP plugin, and will therefore act as a unique client. The server will receive (store in a buffer) and playback samples from each client, and assign each client an output channel (which the Immersive sound system will then use to spatialize the sound upon receiving 3D coordinates from the relevant client/FMOD event).
I’ve been successful in receiving the samples on the server side and playing them back. However, the audio is nested with artefacts. This leads me to believe that I have a synchronization error (my audio driver is consuming samples at a slower rate than the processing callback is producing).
I was wondering whether anyone could shed some light on how I would be able to synchronize both processes.
At the moment, I’m looking to implement circular buffers on the server, so that I minimize the chances of the server receiving samples while the ASIO driver is reading and playing them back.
If you require any additional information, please don’t hesitate to ask. Other important information:
-> The ASIO driver uses a buffersize of 512, so I’ve accordingly used the setDSPBufferSize function to assign FMOD DSPs with a buffersize of 512.
-> The ASIO driver has a sample rate of 44.1kHz, therefore I’ve set the samplerate of the FMOD system to the same rate using the setSoftwareFormate() function before the system is initialized.
technically these settings should mean that the driver consumes data at the same rate as the processing (around 11ms).
A way to debug this is firstly, on the receive side, create a large circular buffer.
This will cause large latency, but is a good way to debug first.
You need to determine if your stream is coming through uncorrupted and in order on the receive side, and if it does, then you know that side of it is ok, and you might have a rate synchronization issue.
The way to debug that is if you have a 2-5 second buffer , and after a while it starts artefacting (not immediately, but after 20 or 40 or 60 seconds for example), then you can tell there is a slight drift happening.
You should be able to programmatically determine that drift, by the position you are writing to, and the position you are reading from.
If they stay the same distance apart, then they are not drifting.
If the write cursor starts approaching the read cursor (slowly) one way or the other (backwards or forwards) then the only successful approach we have found with the most minimal artefact, is to adjust the playback frequency of the channel on the receive side.
If the receive side is playing back at 44100 (and so is the send side, but there is some inaccuracy in the different hardware) then to control the gap you can do things like setFrequency(44105) to widen the gap or 44095 to reduce the gap, until it is back to an acceptable delta, then set it back to 44100 again.
The frequency change shouldnt be noticable on most content. You might want to play with those values/granularity as it is a trade off between
minimizing pitch distortion of your content (you could have something finer/coarser/more clever than just 5hz changes)
speed of the delta adjustment getting back to the acceptable value without failing to do its job (ie your adjustment is too slow and it stutters)
Also if your playback events are 3d and have doppler and contain the receiving dsp , then they will pull the data faster or slower if the doppler changes the pitch of the source. Better to turn doppler off in this case.
I also used setFrequency as an example of pitch control, which doesnt exist in Studio events, so you will have to fine tune using eventinstance::setPitch instead, or play the sound back through the low level API as a ‘programmer sound’ maybe.