Sound buffer having crack/pop sounds when trying to make voice chat

Hello, I am trying to make voice chat using Steam voice record and FMOD for playing players voices. Before that I was using XAudio2 which was working fine except one issue with playing multiple voices, so I swapped to FMOD instead. Now in FMOD I can’t play player’s voice without having popping/cracking sounds in the start/end of the received buffer (With XAudio2 i didn’t have that issue).

Here’s my code for playing received voice data:

void Multiplayer::Voice::AddVoiceData(const uint8* voiceData, uint32 length)
{
    FMOD_CREATESOUNDEXINFO exinfo = {0};
    exinfo.cbsize = sizeof(FMOD_CREATESOUNDEXINFO);
    exinfo.format = FMOD_SOUND_FORMAT_PCM16;
    exinfo.defaultfrequency = VOICE_OUTPUT_SAMPLE_RATE;
    exinfo.numchannels = 1;
    exinfo.length = length;

    FMOD::Sound* snd = 0;

    std::cout << "Create sound res: " << fmod->createSound((const char*) voiceData, FMOD_LOOP_OFF | FMOD_CREATESAMPLE | FMOD_OPENMEMORY | FMOD_OPENRAW, &exinfo, &snd) << '\n';

    std::cout << "playSound res: " << fmod->playSound(snd, 0, false, &fmodChannel) << '\n';
}

// runs every frame
void Multiplayer::Voice::Update()
{
    fmod->update();

    // ...
    // recording voice data and sending over the steam network
    // ...
}

I tried using new channel for each created sound, i tried to release sound right after playSound or after calling update first (this resulted sound just not being played what’s make sense), I tried using cached FMOD::Sound and doing Sound::lock/unlock, and also tried FMOD_OPENUSER, nothing helped at all.

I’ve also recorded popping/cracking sound in FMOD Studio via profiling if it would help: Recorded profiling

So AddVoiceData is being called every time new samples become available from Steam? If this is the case, then the reason for the crackling and pops would be due to short gaps of time between AddVoiceData calls when the FMOD::Sound has finished playing and there are no more samples to play, so it outputs silence. This is could loosely be described as “buffer starvation”, and the way to avoid it is to do three things:

  1. Rather than playing a new FMOD::Sound every time samples become available, use AddVoiceData to write your samples to an intermediate ring buffer, then in your Update loop read from the buffer and add it to your FMOD::Sound using Sound::lock/Sound::unlock. This will ensure samples are being given to the FMOD::Sound without any gaps.
  2. Introduce some latency to give yourself a little more time for samples to become available. You can do this by waiting for the write position of your buffer to reach some value before starting to read from it and filling your FMOD::Sound.
  3. Keep track of the distance between your read and write positions, and when the distance is greater than your target latency, use Channel::setFrequency to speed-up the sample rate of your FMOD::Channel to get the FMOD::Sound to consume samples faster. Likewise when the distance is less than your target latency, reduce the sample rate of your FMOD::Channel until you get back to your target latency. This will account for any irregularities in the rate at which samples arrive from AddVoiceData, and will ensure you never run out of samples to play. If you speed up or slow down too quickly however you will get pitch distortions, so only make very small changes to the sample rate.

I recently wrote an example which applies this approach to hook into a Video’s audio source in Unity and get it playing through FMOD- it is the same concept and should help demonstrate how to avoid buffer starvation.