How to get waveform data for lipsync when audio is muted?

So I have separate volume controls for various buses in game.

If the player sets volume to zero for voice bus, dsp won’t have anything to process.

What’s the correct setup I should be using here? Should I set up a bus that the user cannot control for raw voice that feeds into one of the controllable ones? Is there a way to hook a DSP directly to the underlying events?

Also, second question - is there a way to automatically collapse multiple channels in the dsp buffer? If I am reading a 5.1 setup from a 2d stereo event, it distributes the two real channels across all the channels - so the array for channel 0 will be 4/6 empty. That’s a lot of wasted processing if I need to manually copy through all that.

Third question:
Working from your example code: https://www.fmod.com/docs/2.02/unity/examples-dsp-capture.html

If I’m getting 2 channel stereo output, but I modify:

        // Copy the incoming buffer to process later
        int lengthElements = (int)length * inchannels;
        Marshal.Copy(inbuffer, obj.mDataBuffer, 0, lengthElements);

        // Copy the inbuffer to the outbuffer so we can still hear it
        Marshal.Copy(obj.mDataBuffer, 0, outbuffer, lengthElements);

If I modify this so that it only copies length (not length*channels) to outbuffer, instead of only getting audio only from one headphone, I get a very garbled audio in both. What’s happening here?

Hi,

Correct, I will note that if the parent bus has its volume set to 0.0f then its children buses and events will be virtualized and the data will stop. So I suggest setting the volume to 0.0001f to keep data processing.

Yes, you could add a DSP to individual event instances using EventInstance::getChannelGroup() (FMOD Engine | Studio API Reference - Studio::EventInstance), from which you can add the DSP. Keeping in mind, that if the parent bus is muted the event will still be virtualized.

Yes, this can be done in Studio:


You can control the mixing in the event either on the master track or the audio track.

Could you elaborate on the desired behavior here?

So I did the collapse via DSP with: m_DSP.setChannelFormat( 0, 1, SPEAKERMODE.MONO );

Is this collapse (learned this was called downmixing) isolated to the DSP or am I setting the format on the underlying channel?

Forgive me, I know nothing about low level audio. This is all black magic to me. I had never heard of DSP a week ago.


“Could you elaborate on the desired behavior here?”

No behavior is desired particularly, but I would expect that if I copy half of input (2 channels, sequential in array) to output, that output would be 1 channel mute and other full, instead its garbled. I was trying to understand what was happening.


I’m a little concerned about the volume issue. For lipsync / facial animation is there a more reliable method than setting volume to .0001 (needing to multiply by 1/.0001 in order to convert to consistent baseline for calculations will introduce a lot of floating point inaccuracy, and many of calcs are pretty sensitive).

I’d also like to do some bulk processing to build voice profiles, obviously I don’t want to sit there and play 1000 audio clips realtime, and ideally these should match runtime values as closely as possible. Is there a reasonably easy way to do this?

I’m pretty new to audio and my c++ is very weak, but I am a programmer by trade.

Hi,

This will be isolated to the DSP, the channelGroup that the DSP is attached to will maintain its channel configuration. You can check this by connecting to Live Update (Unity Integration | User Guide - Connecting Using Live Update) and checking the Master Bus mix on the far right of the Mixer (FMOD Studio | Mixing - Anatomy of the Mixer).

The data buffer is not sequential e.g. left audio followed by right audio, rather it is interleaved:
image
So rather than just getting all the first channels data, you are getting half a combination of both. Hope this makes sense.

In your suggestion:

If I understand correctly, you should be able to do all the calculations on the bus (child bus) feeding into the player-controlled bus (parent bus) which won’t have any volume changes applied to it so the values shouldn’t be affected. Let me know if I am misunderstanding the situation.

Thank you for the explanation, I’m still a bit confused though given the example code:


    void Update()
    {
        // Do what you want with the captured data

        if (mChannels != 0)
        {
            float yOffset = 5.7f;

            for (int j = 0; j < mChannels; j++)
            {
                var pos = Vector3.zero;
                pos.x = WIDTH * -0.5f;
                for (int i = 0; i < mBufferLength; ++i)
                {
                    pos.x += (WIDTH / mBufferLength);
                    pos.y = mDataBuffer[i + j * mBufferLength] * HEIGHT;
                    // Make sure Gizmos is enabled in the Unity Editor to show debug line draw for the captured channel data
                    Debug.DrawLine(new Vector3(pos.x, yOffset + pos.y, 0), new Vector3(pos.x, yOffset - pos.y, 0), Color.green);
                }
                yOffset -= 1.9f;
            }
        }
    }

This loops through the copied buffer sequentially, separating each channel vertically. Is this example oversimplifying the layout or is there some other black magic happening?

You’re right, sorry I confused myself. I have the dialog bus parented only to master now and setting master to .0001 has no effect on dialog. So that’ll be ok. Kinda not super ideal to not actually be muted but not that bad.


I’m gonna guess that bulk capture isn’t possible and that DSPs are only really setup to run against live feed. Kinda sucks, but it is what it is I guess.


Are there any docs on how DSPs work and what exactly they all do?

I tried doing a bunch of parameter grabs and they seem to change depending on the order of whats requested. For example, if I “DSP.getMeteringInfo( IntPtr.Zero, out var meteringInfo )” after I grab a “DSP_LOUDNESS_METER_INFO_TYPE” the rms values are different from if I don’t grab the METER_INFO - it appears to reflect the raw data unless i pull the info type, in which case it is based on the loudness. Or at least that’s my best guess.

Is there a doc on how all this works outside of API docs which are clearly aimed at people who know what they’re doing already.

Thank you very much for your help so far.


EDIT: Also, I occasionally get an error: NotSupportedException: Delegates cannot be marshalled from native code into a domain other than their home domain when creating a DSP - I’m doing this from unity’s update/start/awake functions. Any suggestions for preventing this?

1 Like

Hi,
The scripting example is wrong, I have made a task to fix it. Thank you for bringing this to our attention. You are correct that is it going through the buffer sequentially, however, it should not be. Here is the updated code:

Updated code
const float WIDTH = 0.01f;
const float HEIGHT = 10.0f;

void Update()
{
    // Do what you want with the captured data

    if (mChannels != 0)
    {
        for (int i = 0; i < mBufferLength; i += mChannels)
        {
            var pos = Vector3.zero;
            pos.x = WIDTH * -0.5f;
            float yOffset = 5.7f;
            for (int j = 0; j < mChannels; j++)
            {
                pos.x = (i + j) * WIDTH;
                pos.y = mDataBuffer[i + j * mBufferLength] * HEIGHT;
                // Make sure Gizmos is enabled in the Unity Editor to show debug line draw for the captured channel data
                Debug.DrawLine(new Vector3(pos.x, yOffset + pos.y, 0), new Vector3(pos.x, yOffset - pos.y, 0), Color.green);
                yOffset -= 1.9f;
            }
        }
    }
}

Apologies for the confusion.

I am still a bit confused about what you are wanting to do with the bulk processing, could you explain further?

Yes, we have a glossary entry for here: FMOD Engine | Glossary DSP and a white paper outlining how to use a DSP here: FMOD Engine | White Papers - DSP Architecture and Usage.

Yes, this can be caused by interacting with a Unity GCHandle after it has been freed. If you are following the DSP Capture example this would be the objHandle. Check that it is not being freed till the event has stopped so that the CaptureDSPReadCallback isn’t being called anymore. If you have made changes to the example script, would it be possible to see your script?