Hi there!
My goal is to generate text-to-speech, play that via FMOD, and do lipsync based on the audio data. The first thing I do therefore is once the audio is generated, I play it via FMOD:
private void PlayFMODSound(Sound sound)
{
_fmodBus.lockChannelGroup();
// this is to make sure the bus is loaded before you get the channel group
// otherwise an error will be thrown and FMOD will enter an unstable state
FMODUnity.RuntimeManager.StudioSystem.flushCommands();
_fmodBus.getChannelGroup(out var channelGroup);
FMOD.RESULT result = RuntimeManager.CoreSystem.playSound(sound, channelGroup, false, out var channel);
if (result != FMOD.RESULT.OK)
{
Debug.LogError("Error playing FMOD sound: " + result);
_fmodBus.unlockChannelGroup();
return;
}
_currentPlaybackChannel = channel;
OnFMODChannelGroupPlaybackStarted?.Invoke(channelGroup);
}
_fmodBus is a FMOD.Studio.Bus that I set like this in the beginning:
_fmodBus = RuntimeManager.GetBus("bus:/Dialogue/Character");
As soon as the playback stops, I unlock the bus again. So far so good.
Next, I need to analyse the audio and feed that analysis to Salsa lipsync. For that, I created a custom DSP that I’m adding to the channel group I got when playing the sound above.
First, I’m creating a custom DSP (similar to how it is showcases in the example)
private void InitializeCustomDSP()
{
// Assign the callback to a member variable to avoid garbage collection
_readCallback = CustomDspReadCallback;
// Allocate a data buffer large enough for 8 channels, pin the memory to avoid garbage collection
uint bufferLength;
int numBuffers;
FMODUnity.RuntimeManager.CoreSystem.getDSPBufferSize(out bufferLength, out numBuffers);
_dataBuffer = new float[bufferLength * 8];
_bufferLength = bufferLength;
_objHandle = GCHandle.Alloc(this);
if (_objHandle != null)
{
FMOD.DSP_DESCRIPTION dspDesc = new FMOD.DSP_DESCRIPTION();
dspDesc.numinputbuffers = 1;
dspDesc.numoutputbuffers = 1;
dspDesc.read = CustomDspReadCallback;
dspDesc.userdata = GCHandle.ToIntPtr(_objHandle);
var result = RuntimeManager.CoreSystem.createDSP(ref dspDesc, out _customDsp);
if (result != FMOD.RESULT.OK)
{
Debug.LogError("Error creating custom DSP: " + result);
return;
}
}
}
Once the audio playback starts and I have the channel group, I add the DSP:
_channelGroup.Value.addDSP(FMOD.CHANNELCONTROL_DSP_INDEX.HEAD, _customDsp);
Now to the part that I don’t understand. My goal is to analyze the raw audio, before effects like a spatializer are applied to the signal (the lips should not move differently just because the character is further away). To my understanding, an index of “HEAD” should do exactly that. But it behaves the other way around: when i add the DSP to head, the values received by the custom DSP were already processed by all DSPs added via FMOD studio, for instance a delay effect that I added to the bus just for testing purposes. When I add the custom DSP to “TAIL” however, it does not get affected by that effect (just as I want it to). What’s interesting too is that the standard fader then does not affect the values received by the custom DSP unless the volume is set to minus infinity, in which case it is affected (lips don’t move). I suspect that the default fader is treated in some special way.
Anyways, I am wondering why “TAIL” behaves like I would except “HEAD” to function; From all documentation i’ve seen, “HEAD” should be the right index for what I want to achieve.
Hope someone can point me in the right direction!
// edit: Hmm, I just realized that I can’t add a spatializer effect to an entire channel group; that seems to be working only for individual events. But I can’t work with events when I have a procedural sound like this?