More accurate getPosition values?

,

Hi ! :slight_smile:

I’m trying to replace UE’s unpredictable audio playback behavior by switching to FMOD, which is completely new to me.
We’re making a VR rhythm game where score is based on timing precision, and we would need :

  • getPosition values updated faster or in sync with the VR framerate (around 11ms) so that it’s new every frame
  • Stable and quantifiable latency : it doesn’t need to be very small but we need to know how much it is so we can compensate.

I have successfully played sounds with the core audio API in C++ (I’d rather not use studio banks).
Changing the hardware sample rate, the DSP buffer length and count, the file buffer size or studio update period in the project settings have no effect at all on the refresh rate of the getPosition function.

I’ve tried to follow this topic which seems very close to my needs :
https://qa.fmod.com/t/more-accuracy-than-channel-getposition-possible/9920/2
And it seems that I can get a callback to fire after every buffer read but it greatly exceeds the knowledge I have of FMOD for now.

Could someone please elaborate on how to implement the solution proposed in the thread ?
Thanks a lot !

1 Like

The DSP buffer size should be the thing you need to tweak, although do make sure it’s set before calling System::init otherwise it will not apply.

The mixing engine defaults to a buffer size of 512 or 1024 samples depending on the platform, so that’s ~10 or ~21 milliseconds granularity respectively when run at 48KHz.

Thanks for your answer !
I figured out that it needed to be set before the init just a few hours ago :slight_smile:
I’ve pushed it to 128*2 to get 5.33ms of buffer time and I offset by half of that to get 2.66ms of error max which is small enough for my needs.
What are the drawbacks of such low values ? Are there risks of audio malfunctioning for some users ?

Ideally I would love to be able to know the timestamp at which the last time update was done, which would allow me to use any buffer size and still offset by the correct amount of time, is there a way to do that ?

Depending on how complex your project is, there is a fixed CPU cost per mixer render, this is regardless of the DSP buffer size. By making the buffer really small (i.e. 5ms), that fixed cost is paid every 5ms, compared to 20ms. You can measure this by looking at System::getCPUStats, focusing on the “DSP” value at different buffer sizes. Additionally, reducing the number of buffers from the default 4 to 2, you reduce the latency between calling an API and hearing its result, go too low and you will get stuttering as the OS will request more audio than is available.

These settings are largely platform dependent and there is no one-size-fits-all answer. In general, we find four buffers of 512 samples to be a good baseline, going lower should be carefully tested on all target hardware to ensure stability for your project workload.

If you are looking to do sample accurate playback, this can be achieved with Channel::setDelay. This is what the Studio engine uses to implement sample accurate playback of the Event timeline. For timestamping you can use ChannelControll::getDSPClock.

So would it be correct to say that buffer size impacts CPU costs but not stability, and the number of buffers impact stability ?

I’m not sure I get what you suggest with setDelay and getDSPClock. From what I understood it can be used to set in advance to playback time of a sound to make sure it’s correctly timed in relationship to the other sounds in this DSP chain (which would be very handy to trigger the audio feedbacks in time for example), but I don’t see how it can help for a whole song playback ?

What I’m looking for is a way to know what was the system/game time when the last buffer played (and thus updated the DSPClock value), so I can compare it to the system/game time when my game thread ticks and compute at what time I am in the song playback with a formula like :

“audio current time when in game thread tick” = getDSPClock() + (“system time in the game thread tick” - “system time when the last playback time was sent by the audio thread”)

Right now I have to guess that the last playback time was half of the latency on average, which is okay when latency is low but not accurate enought if it’s (relatively) high.

Sorry if this isn’t very clear, I’m quite new to using an audio engine with this much accuracy…

So would it be correct to say that buffer size impacts CPU costs but not stability, and the number of buffers impact stability ?

That’s a fair assessment.

From what I understood it can be used to set in advance to playback time of a sound to make sure it’s correctly timed in relationship to the other sounds in this DSP chain

Yes, that is correct.

What I’m looking for is a way to know what was the system/game time when the last buffer played

By this do you mean a wall clock time for when each DSP buffer is produced?
We don’t have that information, but you can capture it by taking a timestamp in the premix or postmix callback. These callbacks occur on the mixer thread before we start generating the next DSP buffer worth of audio and after respectively.