DSP optimization for accessing FMOD PCM data

Howdy all,

I’m working on a project to try piping FMOD’s output to a Discord Bot’s Voice Chat output. To do this, I’ve followed a few of the DSP examples and advice from primarily C# and Unity threads on this forum doing much the same thing. I’ve got it partly working, but I’m struggling with making everything run correctly and in real-time (i.e. it takes longer to fill my PCM buffer than for my bot to play it back, and the played-back audio is at correct pitch but sounds like someone scrubbing quickly forward through a piece of audio.

I would love some help understanding what I’m doing wrong, and/or if there’s a better approach. I considered Output Plugins but couldn’t wrap my head around them nor figure out exactly how to write and use one for my purposes. Below is the essence of my code, with some irrelevant functions and definitions removed. Full code visible on Github for more context.

Using FMOD API 2.02.16.

//---FMOD Declarations---//
FMOD::Studio::System* pSystem = nullptr;				//overall system
FMOD::System* pCoreSystem = nullptr;					//overall core system
FMOD::Studio::Bank* pMasterBank = nullptr;				//Master Bank
FMOD::Studio::Bank* pMasterStringsBank = nullptr;		//Master Strings
FMOD::Studio::Bus* pMasterBus = nullptr;				//Master bus
FMOD::ChannelGroup* pMasterBusGroup = nullptr;			//Channel Group of the master bus
FMOD::DSP* mCaptureDSP = nullptr;						//DSP to attach to Master Channel Group for stealing output

//---Misc Bot Declarations---//
dpp::discord_voice_client* currentClient = nullptr;
std::vector<uint16_t> myPCMData;						//Main buffer of PCM audio data, which FMOD adds to and D++ cuts "frames" from
std::mutex pcmDataMutex;
bool isRunning = true;
bool isConnected = false;
bool fromSilence = true;

//FMOD and Audio Functions
FMOD_RESULT F_CALLBACK captureDSPReadCallback(FMOD_DSP_STATE* dsp_state, float* inbuffer, float* outbuffer, unsigned int length, int inchannels, int* outchannels) {
	//inchannels and *outchannels = 2, error checking for that excluded from example
	FMOD::DSP* thisdsp = (FMOD::DSP*)dsp_state->instance;
	std::vector<uint16_t> pcmdata;
	if (isConnected) {				//Evals True when connected to Voice Chat
		std::lock_guard lk(pcmDataMutex);
		for (unsigned int samp = 0; samp < length; samp++) {
			for (int chan = 0; chan < *outchannels; chan++) {
				outbuffer[(samp * *outchannels) + chan] = 0.0f;				//"Mutes" system output.
				pcmdata.push_back(floatToPCM(inbuffer[(samp*inchannels) + chan]));
			}
		}
		//Pass PCM data to our larger buffer
		myPCMData.insert(myPCMData.end(), pcmdata.cbegin(), pcmdata.cend());
	}
	else {
		return FMOD_ERR_DSP_SILENCE;		//Unsure if I'm using this correctly
	}
	return FMOD_OK;
}

void init() {

	//FMOD Init
	ERRCHECK(FMOD::Studio::System::create(&pSystem));
	ERRCHECK(pSystem->initialize(128, FMOD_STUDIO_INIT_NORMAL, FMOD_INIT_NORMAL, nullptr));
	ERRCHECK(pSystem->getCoreSystem(&pCoreSystem));

	//Load Master Bank and Master Strings
	ERRCHECK(pSystem->loadBankFile(..., FMOD_STUDIO_LOAD_BANK_NORMAL, &pMasterBank));
	ERRCHECK(pSystem->loadBankFile(..., FMOD_STUDIO_LOAD_BANK_NORMAL, &pMasterStringsBank));

	//Also get the Master Bus, set volume, and get the related Channel Group
	ERRCHECK(pSystem->getBus("bus:/", &pMasterBus));
	ERRCHECK(pMasterBus->setVolume(dBToFloat(-10.0f)));
	ERRCHECK(pMasterBus->lockChannelGroup());					//Tell the Master Channel Group to always exist even when events arn't playing...
	ERRCHECK(pSystem->flushCommands());							//And wait until all previous commands are done (ensuring Channel Group exists)...
	ERRCHECK(pMasterBus->getChannelGroup(&pMasterBusGroup));	//Or else this fails immediately, and we'll have DSP problems.

	//Define and create our capture DSP on the Master Channel Group.
	//Copied from FMOD's examples, unsure why this particular configuration works and why it must be in brackets
	{
		FMOD_DSP_DESCRIPTION dspdesc;
		memset(&dspdesc, 0, sizeof(dspdesc));
		strncpy_s(dspdesc.name, "LH_captureDSP", sizeof(dspdesc.name));
		dspdesc.version = 0x00010000;
		dspdesc.numinputbuffers = 8;
		dspdesc.numoutputbuffers = 8;
		dspdesc.read = captureDSPReadCallback;
		//dspdesc.userdata = (void*)0x12345678;
		ERRCHECK(pCoreSystem->createDSP(&dspdesc, &mCaptureDSP));
	}
	ERRCHECK(pMasterBusGroup->addDSP(FMOD_CHANNELCONTROL_DSP_TAIL, mCaptureDSP));		//Adds the newly defined dsp

	// Create event instance
	std::cout << "Creating Test Event Instance...";
	ERRCHECK(pSystem->getEvent("event:/Master/Music/TitleTheme", &pEventDescription));
	ERRCHECK(pEventDescription->createInstance(&pEventInstance));
	ERRCHECK(pEventInstance->start());									//Start test audio event...
	ERRCHECK(pEventInstance->release());								//...and release its resources when it stops playing.
}

int main() {

	init();

	/* Start the bot */
	bot.start();

	//Program loop
	while (isRunning) {

		//Update FMOD processes
		pSystem->update();
		
		//Send PCM data to D++, if applicable
		//Evals True when bot is in voice chat
		if (isConnected) {

			//protect all data in this block. Not usually a problem, but all the same.
			std::lock_guard lk(pcmDataMutex);
			
			//If larger than the amount needed for an Opus packet
			if (myPCMData.size() > dpp::send_audio_raw_max_length) {
				while (myPCMData.size() > dpp::send_audio_raw_max_length) {
					currentClient->send_audio_raw(myPCMData.data(), dpp::send_audio_raw_max_length);		//Send usable chunk of main buffer
					myPCMData.erase(myPCMData.begin(), myPCMData.begin() + dpp::send_audio_raw_max_length);	//Trim main buffer of the data just sent
				}
			}
		} 
		Sleep(10);
	}

	// Program quit. We never actually reach here as it stands,
	// but we'll deal with proper protocol for this when it all actually works.
	pMasterBusGroup->removeDSP(mCaptureDSP);
	mCaptureDSP->release();

	//Unload and release System
	pSystem->unloadAll();
	pSystem->release();

	return 0;
}

Wanted to give an update to this thread with some tests I’ve run.

  • When listening to system playback (i.e. filling the outbuffer with the same samples as the inbuffer instead of setting them to 0.0f), and filling the myPCMData vector which exists on the main thread, the sound stutters.
  • When filling the local vector (pcmdata) but never passing it to the buffer on the main thread (myPCMData), everything works great, though obviously the data never makes it to the main thread so it doesn’t accomplish the goal.
  • If I don’t touch the outbuffer at all, only pass the data of inbuffer to the main thread (where I converted it and added it to the main buffer), and return FMOD_ERR_DSP_SILENCE, the sound from the bot stutters. This is about as efficient as I can think to make the process to put minimal strain on the FMOD Mixer thread.

Based on the above, the bottleneck seems to be getting the PCM data out of that function to the main thread, regardless of mutex use or what type it was in (float, int, etc), but I don’t know of any more effective ways to do that. Surely it’s too much to put in the DSP’s User Data and access from elsewhere? How exactly do these callback functions work?

(Github link I forgot before with full context)

1 Like

Hi,

Thank you for sharing the project! This will be invaluable to a lot of our users!

I have cloned your repo and am trying to run it on my machine. Unfortunately, I am getting a copy /y command exited with code 4 error. Is there any additional setup I need? To confirm, I add the BOT_TOKEN on line 20 of SessionMusicBot.cpp?

I hope we can get this working!

1 Like

Hi!

Apologies for those issues, I didn’t anticipate anybody cloning my repo yet so I haven’t fully cleaned up my code or worked out the kinks between machines. The error you’re encountering appears to be part of the Post-Build step, which simply copies the DLL files for D++ and FMOD, as well as the token.config file, to your Output directory. I’m not great with Macros yet for variable build paths, so make sure all the paths evaluate correctly (example below for 64-bit Debug). If you have a different FMOD API install location, that could do it, but if you don’t have the Bot Token portion set up that may also be failing.

copy /y "$(ProjectDir)dependencies\64\debug\bin\*.dll" "$(OutDir)"    #D++
copy /y "C:\Program Files (x86)\FMOD SoundSystem\FMOD Studio API Windows\api\core\lib\x64\*.dll" "$(OutDir)"    #FMOD Core
copy /y "C:\Program Files (x86)\FMOD SoundSystem\FMOD Studio API Windows\api\studio\lib\x64\*.dll" "$(OutDir)"    #FMOD Studio
for /r "$(MSBuildProjectDirectory)\soundbanks" %%D in (*) do xcopy /y /d /f /i "%%D" "$(OutDir)soundbanks\"    #Sound banks and loose files
xcopy /f /d /k $(MSBuildProjectDirectory)\token.config $(OutDir)token.config    #Discord Bot Token

For the last line, you’ll need to find the file at MyBot\token.config.example, rename it to token.config, and replace all text it contains with your Discord Bot Token. That token is uniquely generated from Discord and can be set up using this guide. Then once running you must add the bot to a server, build and run the program, and give it a go with the slash commands in a voice chat.

1 Like

Thank you for the info, I got it running and connected to a Discord server and was able to reproduce the issue.

Here are a couple of things to try:

  1. Instead of adding the sample data in the captureDSPCallback to the intermediate buffer pcmdata. Add it directly to the myPCMData to be more efficient.
  2. I found removing the std::lock_guard lk(pcmDataMutex); slightly improved the stuttering as locking was causing the mixer thread to stop too.
  3. Implement a form of drift compensation, an example of this can be found here: Unity Integration | Scripting Examples: Video Playback. Sample data passed from the captureDSPCallback represents the SampelsFramesAvailable buffer and the amount of PCM data passed to the currentClient->send_audio_raw() is the mTotalSamplesRead.

If you have any questions about implementing drift compensation, please do not hesitate to ask.

1 Like

Thank you for the suggestions! I’ve started a new branch to toy around with these changes, but I’m not sure exactly how to implement drift compensation here. While I can control the frequency at which I send packets to D++, they have to be the same number of samples each time, and the playback in Discord from there is out of my hands. In the example script, a single sound’s playback can be sped up and slowed down, but I’m not sure I have that flexibility as the input is the entire 48khz output from FMOD Studio.

That said, some findings and changes from the linked branch:

  • Implemented changes 1 & 2 in your list for efficiency. No more intermediate buffer, nor mutex locking, which is still fairly memory safe because:
  • I now update the FMOD Studio system after sending off packets and just before Sleeping in the main loop. This allows FMOD and the main thread to take turns and not fight, increasing performance and breathing space for both, a gap increased by sleeping for 20 ms instead of 10.
  • I also reworked the callback function to account for mono and stereo inputs. Because D++ expects interleaved stereo, a mono input will simply have its samples added to the buffer twice per-sample. This happens sometimes right at the beginning of playback, or sometimes with mono inputs.

So, FMOD is working well enough now, consistently claiming to spit out 96k samples per second (which I’m not sure about, unless the sample rate of FMOD is per-channel since this is running in stereo). For some reason though, the amount of audio received by D++ isn’t the same length of time as the samples it’s getting. D++ is being given 11520 samples with each chunk (necessary to convert the audio to Opus packets), which in stereo at 48khz should be 0.12 seconds of audio; instead, it’s reporting only receiving 0.06.

Clearly there’s some mismatch here that I’m not fully understanding. I will discuss with the D++ devs, but I’m hopeful the solution is near, I just can’t wrap my head around it yet.

(Not necessarily an FMOD solution, but please read the thread for context)

Last update: I DID have a misunderstanding with D++, or more specifically with Opus.

Opus expects 16-bit signed integer PCM data. But because it’s a C library, it wants that data cast to unsigned char, which is 8-bits each. I assumed Opus wanted 11520 samples per frame, but instead, it wanted 11520 bytes, and thus only half of what I expected would be sent with each Opus frame, before I deleted it from the PCM buffer.

Thank you sincerely for the help, and hopefully this thread can help somebody in the future!

2 Likes