Implementing Vivox voice chat and Fmod in Unreal Engine

Luisjakg · March 14, 2024, 7:42pm

I’ve been searching for guidance on integrating Vivox voice chat with Fmod within Unreal Engine 5. While I stumbled upon various forum discussions on the topic, I haven’t been able to find definitive solutions. Any assistance or pointers in the right direction would be immensely valued!

Connor_FMOD · March 18, 2024, 3:48am

Hi,

There is currently a thread discussing something similar here: Trying to play a Voice Data stream using a ByteArray in Unreal Engine 5 - #5 by RobPotter.

We are still working out some poping issues but hopefully, it will point you in the right direction. If there are any updates I will continue to update that thread.

Luisjakg · March 20, 2024, 11:47pm

Thanks for the directions! After following that thread and some of the steps I think I’m getting a little bit closer, however I currently only hear static whenever I try to talk. I am using the following callbacks in order to get voice data from the Vivox SDK:

This callback is the most appropriate to inject audio to replace the captured audio:

void AfterCaptureReadCallback(
	void* callback_handle,
	const char* session_group_handle,
	const char* initial_target_uri,
	short* pcm_frames,
	int pcm_frame_count,
	int audio_frame_rate,
	int channels_per_frame
)
{
	// Access and process captured audio data in pcm_frames
	if (GNew_PCM_Frames != nullptr)
	{
		*pcm_frames = *GNew_PCM_Frames;
	}

	// PrintString(FString::FromInt(pcm_frames[0]));
	// Example: Log audio data
	// ...
}

and

This callback is the most appropriate for recording applications that are designed to capture a player’s speech:

void BeforeCaptureSentCallback(
	void* callback_handle,
	const char* session_group_handle,
	const char* initial_target_uri,
	short* pcm_frames,
	int pcm_frame_count,
	int audio_frame_rate,
	int channels_per_frame,
	int is_speaking
)
{
	// Access and process captured audio data in pcm_frames

	if (GBuffer.IsEmpty())
	{
		PrintString("Initializing GBuffer...");
		GBuffer.Init(0, pcm_frame_count);
	}
	
	for (int i = 0; i < pcm_frame_count; i++ )
	{
		GBuffer.Add(pcm_frames[i]);

		// Check if GBuffer has frames that arent in pcm frames
		if (GBuffer.Num() > pcm_frame_count)
		{
			GBuffer.RemoveAt(0, GBuffer.Num() - pcm_frame_count);
		}
	}
	
	// GBuffer.Append(reinterpret_cast<uint8*>(pcm_frames), pcm_frame_count * sizeof(short));
	
	// Example: Log audio data
	// ...
}

Then, following the thread I ended up with the following:

void UVoiceChatSubsystem::SetupFMODSystem()
{
	FMODSystem = IFMODStudioModule::Get().GetStudioSystem(EFMODSystemContext::Runtime);
	FMODSystem->getCoreSystem(&FMODStudioCoreSystem);
	
	DriftThreshold = static_cast<uint32>(SampleRate * DRIFT_MS) / 1000;
	TargetLatency = static_cast<uint32>(SampleRate * LATENCY_MS) / 1000;
	AdjustedLatency = TargetLatency;
	ActualLatency = static_cast<int>(TargetLatency);
	
	SoundExInfo.cbsize = sizeof(SoundExInfo);
	SoundExInfo.format = FMOD_SOUND_FORMAT_PCM16;
	SoundExInfo.defaultfrequency = SampleRate;
	SoundExInfo.numchannels = 1;
	SoundExInfo.length = TargetLatency * sizeof(uint8);
	SoundExInfo.pcmreadcallback = PCMReadCallback;		
}

void UVoiceChatSubsystem::ProcessAudioData()
{
	if (!IsConnectedToChannel) return;
	if (!FMODChannel && TotalSamplesWritten > AdjustedLatency)
	{
		const FMOD_RESULT result = FMODStudioCoreSystem->createSound("PlayerVoice", FMOD_OPENUSER | FMOD_LOOP_NORMAL, &SoundExInfo, &FMODSound);
		FMODStudioCoreSystem->playSound(FMODSound, nullptr, false, &FMODChannel);
	}

	if (GBuffer.Num() > 0 && FMODChannel)
	{
		uint32 readPosition;
		FMODChannel->getPosition(&readPosition, FMOD_TIMEUNIT_PCMBYTES);
		UE_LOG(LogTemp, Warning, TEXT("Channel read position: %d"), readPosition)

		int bytesRead = readPosition - LastReadPosition;
		if (readPosition < LastReadPosition)
		{
			bytesRead += SoundExInfo.length;
		}

		if (bytesRead > 0 && GBuffer.Num() >= bytesRead)
		{
			void* ptr1;
			void* ptr2;
			uint32 len1 , len2;

			FMOD_RESULT result = FMODSound->lock(LastReadPosition, bytesRead, &ptr1, &ptr2, &len1, &len2);

			if (result != FMOD_OK) UE_LOG(LogTemp, Warning, TEXT("VOIP Manager Size: %d"), result);

			int sampleLen1 = static_cast<int>(len1 / sizeof(uint8));
			int sampleLen2 = static_cast<int>(len2 / sizeof(uint8));
			int samplesRead = sampleLen1 + sampleLen2;
			TArray<uint8> TmpBuffer;

			TmpBuffer.Init(0, samplesRead);
			TmpBuffer.SetNum(GBuffer.Num(), false);

			// copy the data from Buffer to tmpBuffer using FMemory::Memcpy
			if (GBuffer.Num() > 0 && samplesRead <= GBuffer.Num())
			{
				FMemory::Memcpy(TmpBuffer.GetData(), GBuffer.GetData(), samplesRead);
			}

			//Remove the copied range from GBuffer
			GBuffer.RemoveAt(0, TmpBuffer.Num());

			if (len1 > 0)
			{
				// Directly copy from the start of TmpBuffer
				FMemory::Memcpy(ptr1, TmpBuffer.GetData(), sampleLen1);
			}
			if (len2 > 0)
			{
				// Calculate the start index for the second copy by offsetting the source pointer
				// Note: This assumes sampleLen1 is the byte offset to start from for the second copy
				uint8* sourcePtrOffset = TmpBuffer.GetData() + sampleLen1;
				FMemory::Memcpy(ptr2, sourcePtrOffset, sampleLen2);
			}

			result = FMODSound->unlock(ptr1, ptr2, len1, len2);

			if (result != FMOD_OK) UE_LOG(LogTemp, Warning, TEXT("VOIP Manager Size: %d"), result);
			LastReadPosition = readPosition;
			TotalSamplesRead += static_cast<uint32>(samplesRead);
		}
	}
	//Drift compensation
	uint32 samplesWritten = GBuffer.Num();

	TotalSamplesWritten += samplesWritten;

	if (samplesWritten != 0 && samplesWritten < MinimumSamplesWritten)
	{
		MinimumSamplesWritten = samplesWritten;
		AdjustedLatency = FMath::Max(samplesWritten, TargetLatency);
	}

	int32 latency = TotalSamplesWritten - TotalSamplesRead;
	ActualLatency = static_cast<uint32>((0.93f * ActualLatency) + (0.03f * latency));

	
	int32 PlaybackRate = SampleRate;
	if (ActualLatency < (AdjustedLatency - DriftThreshold))
	{
		PlaybackRate = SampleRate - static_cast<int32>(SampleRate * (DRIFT_CORRECTION_PERCENTAGE / 100.0f));
	}
	else if (ActualLatency > (AdjustedLatency + DriftThreshold))
	{
		PlaybackRate = SampleRate + static_cast<int32>(SampleRate * (DRIFT_CORRECTION_PERCENTAGE / 100.0f));
	}
	FMODChannel->setFrequency(PlaybackRate);
	
}

FMOD_RESULT F_CALLBACK PCMReadCallback(FMOD_SOUND* sound, void *data, unsigned int datalen)
{
	// Populate New_PCM_Frames with the data from the sound
	return FMOD_OK;
}

I am currently trying to get data from the microphone through vivox’s BeforeCaptureSentCallback, then process it through fmod in order to apply voice filters and such, and then inject the audio data from fmod into vivox’s AfterCaptureReadCallback in order to send the processed audio (with filters) and such to other players.

Im still not sure which parameters of the callback give me access to the audio data,
as it is my first time working with audio streams in general so its a bit confusing.

Sorry for the long post! Any help would be extremely appreciated!

Connor_FMOD · March 21, 2024, 12:37am

Hi,

Good to hear it helped a bit.

Would it be possible to get a copy of your project or a stripped-down version displaying the issue uploaded to your profile? Please note you must register a project with us before uploading files.

Hopefully, we will be able to find a solution for everyone together

Luisjakg · April 3, 2024, 3:44pm

Hey Connor!

Sorry for the wait. I’ve uploaded a stripped-down version of the project that shows the problem. Just a heads up, I took out the Vivox credentials from the VoiceChatSubsystem script which contains all the relevant code, so you’ll need to set it up again.

Thanks again for all your help!

Connor_FMOD · April 9, 2024, 5:07am

Hi,

Unfortunately, I have not been able to find a solution. However, there is a task to improve this workflow and I have noted your interest. Once there are updates I will post them here. Thank you for sharing your project and I apologize I cannot assist further.

Luisjakg · April 11, 2024, 7:06pm

Hi Connor,

No worries! Thanks for the support along the way. In case it might be useful in the future, I believe Odin’s voice chat has documentation on how to integrate its audio with FMOD. I’ll keep working on it, and if I find a working solution, I will share it here.

Connor_FMOD · April 11, 2024, 10:58pm

Hi,

Thank you, that would be a massive help!

Connor_FMOD · April 12, 2024, 6:37am

Hi, I will just post this here: Vivox to Unity OnAudioFilterRead to FMOD programmer sound. Stutters/crackling - #15 by dougmolina.

A user may have found a solution, I have not had time to test it in Unreal Engine yet, but I will continue investigating.