Accessing real-time audio samples of incoming mic capture via DSPCallback

akwerius · March 4, 2022, 11:03pm

Hi everyone,

We are working on a game. Our aim is to access in real-time buffer by buffer the samples incoming from a microphone capture. We leveraged the custom DSP example from the API and attached the DSP to a channel group that receives input samples coming from the microphone in-buffer. Here is the code:

// The CustomDSPCallback code was adapted from: https://fmod.com/resources/documentation-unity?version=2.02&page=examples-dsp-capture.html


using System;
using FMODUnity;
using UnityEngine;
using System.Runtime.InteropServices;

public class mic_in : MonoBehaviour
{
    //public variables
    [Header("Capture Device details")]
    public int captureDeviceIndex = 0;
    [TextArea] public string captureDeviceName = null;

    FMOD.CREATESOUNDEXINFO exinfo;
      	
	// Custom DSPCallback variables 
	private FMOD.DSP_READCALLBACK mReadCallback;
    private FMOD.DSP mCaptureDSP;
    public float[] mDataBuffer;
    private GCHandle mObjHandle;
    private uint mBufferLength;
	
	[AOT.MonoPInvokeCallback(typeof(FMOD.DSP_READCALLBACK))]
    static FMOD.RESULT CaptureDSPReadCallback(ref FMOD.DSP_STATE dsp_state, IntPtr inbuffer, IntPtr outbuffer, uint length, int inchannels, ref int outchannels)
    {
        FMOD.DSP_STATE_FUNCTIONS functions = (FMOD.DSP_STATE_FUNCTIONS)Marshal.PtrToStructure(dsp_state.functions, typeof(FMOD.DSP_STATE_FUNCTIONS));

        IntPtr userData;
        functions.getuserdata(ref dsp_state, out userData);

        GCHandle objHandle = GCHandle.FromIntPtr(userData);
        mic_in obj = objHandle.Target as mic_in;

		Debug.Log("inchannels:"+inchannels);
		Debug.Log("outchannels:"+outchannels);
		
        // Copy the incoming buffer to process later
        int lengthElements = (int)length * inchannels;
        Marshal.Copy(inbuffer, obj.mDataBuffer, 0, lengthElements);

        // Copy the inbuffer to the outbuffer so we can still hear it
        Marshal.Copy(obj.mDataBuffer, 0, outbuffer, lengthElements);

        return FMOD.RESULT.OK;
    }
	
    // Start is called before the first frame update
    void Start()
    {        
        // how many capture devices are plugged in for us to use.
	    int numOfDriversConnected;
        int numofDrivers;
	    FMOD.RESULT res = RuntimeManager.CoreSystem.getRecordNumDrivers(out numofDrivers, out numOfDriversConnected);
        
        if(res != FMOD.RESULT.OK)
        {
            Debug.Log("Failed to retrieve driver details: " + res);
            return;
        }

        if(numOfDriversConnected == 0)
        {
            Debug.Log("No capture devices detected!");
            return;
        }
        else
            Debug.Log("You have " + numOfDriversConnected + " capture devices available to record with.");
            

        // info about the device we're recording with.
        System.Guid micGUID;
        FMOD.DRIVER_STATE driverState;
        FMOD.SPEAKERMODE speakerMode;  
        int captureSrate;
        int captureNumChannels;
        RuntimeManager.CoreSystem.getRecordDriverInfo(captureDeviceIndex, out captureDeviceName, 50,
            out micGUID, out captureSrate, out speakerMode, out captureNumChannels, out driverState);
            

        Debug.Log("captureNumChannels of capture device: " + captureNumChannels);
		Debug.Log("captureSrate: " + captureSrate);


        // create sound where capture is recorded
        FMOD.Sound sound;
		exinfo.cbsize = System.Runtime.InteropServices.Marshal.SizeOf(typeof(FMOD.CREATESOUNDEXINFO));
        exinfo.numchannels = captureNumChannels;
        exinfo.format = FMOD.SOUND_FORMAT.PCM16;
        exinfo.defaultfrequency = captureSrate;
        exinfo.length = (uint)captureSrate * sizeof(short) * (uint)captureNumChannels;

        RuntimeManager.CoreSystem.createSound(exinfo.userdata, FMOD.MODE.LOOP_NORMAL | FMOD.MODE.OPENUSER, 
            ref exinfo, out sound);

        // start recording    
        RuntimeManager.CoreSystem.recordStart(captureDeviceIndex, sound, true);

        // play sound on dedicated channel in master channel group
        FMOD.ChannelGroup masterCG;
        FMOD.Channel channel;

        if (FMODUnity.RuntimeManager.CoreSystem.getMasterChannelGroup(out masterCG) != FMOD.RESULT.OK)
            Debug.LogWarningFormat("FMOD: Unable to create a master channel group: masterCG");

        FMODUnity.RuntimeManager.CoreSystem.getMasterChannelGroup(out masterCG);
        RuntimeManager.CoreSystem.playSound(sound, masterCG, true, out channel);
        channel.setPaused(false);





        // Assign the callback to a member variable to avoid garbage collection
        mReadCallback = CaptureDSPReadCallback;

        // Allocate a data buffer large enough for 8 channels, pin the memory to avoid garbage collection
        uint bufferLength;
        int numBuffers;
        FMODUnity.RuntimeManager.CoreSystem.getDSPBufferSize(out bufferLength, out numBuffers);
        mDataBuffer = new float[bufferLength * 8];
        mBufferLength = bufferLength;

		// Tentatively changed buffer length by calling setDSPBufferSize in file Assets/Plugins/FMOD/src/RuntimeManager.cs	
		// Tentatively changed buffer length by calling setDSPBufferSize in file Assets/Plugins/FMOD/src/fmod.cs - line 1150
		
		Debug.Log("buffer length:" + bufferLength);
		
		// Get a handle to this object to pass into the callback
        mObjHandle = GCHandle.Alloc(this);
        if (mObjHandle != null)
        {
            // Define a basic DSP that receives a callback each mix to capture audio
            FMOD.DSP_DESCRIPTION desc = new FMOD.DSP_DESCRIPTION();
            desc.numinputbuffers = 1;
            desc.numoutputbuffers = 1;
            desc.read = mReadCallback;
            desc.userdata = GCHandle.ToIntPtr(mObjHandle);

            // Create an instance of the capture DSP and attach it to the master channel group to capture all audio            
            if (FMODUnity.RuntimeManager.CoreSystem.createDSP(ref desc, out mCaptureDSP) == FMOD.RESULT.OK)
            {
                if (masterCG.addDSP(0, mCaptureDSP) != FMOD.RESULT.OK)
                {
                    Debug.LogWarningFormat("FMOD: Unable to add mCaptureDSP to the master channel group");
                }
            }
            else
            {
                Debug.LogWarningFormat("FMOD: Unable to create a DSP: mCaptureDSP");
            }
        }
        else
        {
            Debug.LogWarningFormat("FMOD: Unable to create a GCHandle: mObjHandle");
        }		
	



    }        
}

It sort of works except 2 main issues:
1. There is an echo. DSP is a passthrough copying the inbuffer to the out buffer, but every sounds is creating an echo (repeated twice) even with headphones
2. The latency is huge, and doesn’t seem to be affected at all by the size of the audio buffer set in FMOD settings. Any idea on how we could fix this problem.

Maybe a stream solution is better for the capture of the sound?

Side note: Even if the mic is mono the custom DSP receives 6 channels. We would understand 2 (stereo output), but why the extra 4 channels, which are always silent?

Any help or insights would be much appreciated!

kwe

jeff_fmod · March 11, 2022, 5:21am

The echo is because you are copying the inbuffer into the outbuffer, and because of the latency you have created a delay line and thus an echo. Commenting out this line should prevent that:

Marshal.Copy(obj.mDataBuffer, 0, outbuffer, lengthElements);

The latency I am less sure of, I see you haven’t implemented any latency or drift compensation- in the example it waits 50ms before starting to read back from the buffer, I found that adding in that feature resulted in improved latency compared to the code sample you shared. Very much just copied and pasted from the C++ example, but perhaps try this:

Unity recording example

// The CustomDSPCallback code was adapted from: https://fmod.com/resources/documentation-unity?version=2.02&page=examples-dsp-capture.html


using System;
using FMODUnity;
using UnityEngine;
using System.Runtime.InteropServices;

public class mic_in : MonoBehaviour
{
    //public variables
    [Header("Capture Device details")]
    public int captureDeviceIndex = 0;
    [TextArea] public string captureDeviceName = null;

    FMOD.CREATESOUNDEXINFO exinfo;

    // Custom DSPCallback variables 
    private FMOD.DSP_READCALLBACK mReadCallback;
    private FMOD.DSP mCaptureDSP;
    public float[] mDataBuffer;
    private GCHandle mObjHandle;
    private uint mBufferLength;
    private uint soundLength;
    int captureSrate;
    const int DRIFT_MS = 1;
    const int LATENCY_MS = 1000;
    uint driftThreshold;
    uint desiredLatency;
    uint adjustedLatency;
    uint actualLatency;
    uint lastRecordPos = 0;
    uint samplesRecorded = 0;
    uint samplesPlayed = 0;
    uint minRecordDelta = (uint)uint.MaxValue;
    uint lastPlayPos = 0;

    bool recordingStarted = false;

    FMOD.ChannelGroup masterCG;
    FMOD.Channel channel;
    FMOD.Sound sound;


    [AOT.MonoPInvokeCallback(typeof(FMOD.DSP_READCALLBACK))]
    static FMOD.RESULT CaptureDSPReadCallback(ref FMOD.DSP_STATE dsp_state, IntPtr inbuffer, IntPtr outbuffer, uint length, int inchannels, ref int outchannels)
    {
        FMOD.DSP_STATE_FUNCTIONS functions = (FMOD.DSP_STATE_FUNCTIONS)Marshal.PtrToStructure(dsp_state.functions, typeof(FMOD.DSP_STATE_FUNCTIONS));

        IntPtr userData;
        functions.getuserdata(ref dsp_state, out userData);

        GCHandle objHandle = GCHandle.FromIntPtr(userData);
        mic_in obj = objHandle.Target as mic_in;

        Debug.Log("inchannels:" + inchannels);
        Debug.Log("outchannels:" + outchannels);

        // Copy the incoming buffer to process later
        int lengthElements = (int)length * inchannels;
        Marshal.Copy(inbuffer, obj.mDataBuffer, 0, lengthElements);

        // Copy the inbuffer to the outbuffer so we can still hear it
        //Marshal.Copy(obj.mDataBuffer, 0, outbuffer, lengthElements);

        return FMOD.RESULT.OK;
    }

    // Start is called before the first frame update
    void Start()
    {
        // how many capture devices are plugged in for us to use.
        int numOfDriversConnected;
        int numofDrivers;
        FMOD.RESULT res = RuntimeManager.CoreSystem.getRecordNumDrivers(out numofDrivers, out numOfDriversConnected);

        if (res != FMOD.RESULT.OK)
        {
            Debug.Log("Failed to retrieve driver details: " + res);
            return;
        }

        if (numOfDriversConnected == 0)
        {
            Debug.Log("No capture devices detected!");
            return;
        }
        else
            Debug.Log("You have " + numOfDriversConnected + " capture devices available to record with.");


        // info about the device we're recording with.
        System.Guid micGUID;
        FMOD.DRIVER_STATE driverState;
        FMOD.SPEAKERMODE speakerMode;
        int captureNumChannels;
        RuntimeManager.CoreSystem.getRecordDriverInfo(captureDeviceIndex, out captureDeviceName, 50,
            out micGUID, out captureSrate, out speakerMode, out captureNumChannels, out driverState);

        driftThreshold = (uint)(captureSrate * DRIFT_MS) / 1000;       /* The point where we start compensating for drift */
        desiredLatency = (uint)(captureSrate * LATENCY_MS) / 1000;     /* User specified latency */
        adjustedLatency = (uint)desiredLatency;                      /* User specified latency adjusted for driver update granularity */
        actualLatency = (uint)desiredLatency;                                 /* Latency measured once playback begins (smoothened for jitter) */


        Debug.Log("captureNumChannels of capture device: " + captureNumChannels);
        Debug.Log("captureSrate: " + captureSrate);


        // create sound where capture is recorded
        exinfo.cbsize = System.Runtime.InteropServices.Marshal.SizeOf(typeof(FMOD.CREATESOUNDEXINFO));
        exinfo.numchannels = captureNumChannels;
        exinfo.format = FMOD.SOUND_FORMAT.PCM16;
        exinfo.defaultfrequency = captureSrate;
        exinfo.length = (uint)captureSrate * sizeof(short) * (uint)captureNumChannels;

        RuntimeManager.CoreSystem.createSound(exinfo.userdata, FMOD.MODE.LOOP_NORMAL | FMOD.MODE.OPENUSER,
            ref exinfo, out sound);

        // start recording    
        RuntimeManager.CoreSystem.recordStart(captureDeviceIndex, sound, true);


        sound.getLength(out soundLength, FMOD.TIMEUNIT.PCM);

        // play sound on dedicated channel in master channel group

        if (FMODUnity.RuntimeManager.CoreSystem.getMasterChannelGroup(out masterCG) != FMOD.RESULT.OK)
            Debug.LogWarningFormat("FMOD: Unable to create a master channel group: masterCG");

        FMODUnity.RuntimeManager.CoreSystem.getMasterChannelGroup(out masterCG);
        RuntimeManager.CoreSystem.playSound(sound, masterCG, true, out channel);
        channel.setPaused(true);

        // Assign the callback to a member variable to avoid garbage collection
        mReadCallback = CaptureDSPReadCallback;

        // Allocate a data buffer large enough for 8 channels, pin the memory to avoid garbage collection
        uint bufferLength;
        int numBuffers;
        FMODUnity.RuntimeManager.CoreSystem.getDSPBufferSize(out bufferLength, out numBuffers);
        mDataBuffer = new float[bufferLength * 8];
        mBufferLength = bufferLength;

        // Tentatively changed buffer length by calling setDSPBufferSize in file Assets/Plugins/FMOD/src/RuntimeManager.cs	
        // Tentatively changed buffer length by calling setDSPBufferSize in file Assets/Plugins/FMOD/src/fmod.cs - line 1150

        Debug.Log("buffer length:" + bufferLength);

        // Get a handle to this object to pass into the callback
        mObjHandle = GCHandle.Alloc(this);
        if (mObjHandle != null)
        {
            // Define a basic DSP that receives a callback each mix to capture audio
            FMOD.DSP_DESCRIPTION desc = new FMOD.DSP_DESCRIPTION();
            desc.numinputbuffers = 2;
            desc.numoutputbuffers = 2;
            desc.read = mReadCallback;
            desc.userdata = GCHandle.ToIntPtr(mObjHandle);

            // Create an instance of the capture DSP and attach it to the master channel group to capture all audio            
            if (FMODUnity.RuntimeManager.CoreSystem.createDSP(ref desc, out mCaptureDSP) == FMOD.RESULT.OK)
            {
                if (masterCG.addDSP(0, mCaptureDSP) != FMOD.RESULT.OK)
                {
                    Debug.LogWarningFormat("FMOD: Unable to add mCaptureDSP to the master channel group");
                }
            }
            else
            {
                Debug.LogWarningFormat("FMOD: Unable to create a DSP: mCaptureDSP");
            }
        }
        else
        {
            Debug.LogWarningFormat("FMOD: Unable to create a GCHandle: mObjHandle");
        }
    }

    private void FixedUpdate()
    {
        RuntimeManager.CoreSystem.getRecordPosition(captureDeviceIndex, out uint recordPos);

        uint recordDelta = (recordPos >= lastRecordPos) ? (recordPos - lastRecordPos) : (recordPos + soundLength - lastRecordPos);
        lastRecordPos = recordPos;
        samplesRecorded += recordDelta;

        if (recordDelta != 0 && (recordDelta < minRecordDelta))
        {
            minRecordDelta = recordDelta; /* Smallest driver granularity seen so far */
            adjustedLatency = (recordDelta <= desiredLatency) ? desiredLatency : recordDelta; /* Adjust our latency if driver granularity is high */
        }

        if (!recordingStarted)
        {
            if (samplesRecorded >= adjustedLatency)
            {
                channel.setPaused(false);
                recordingStarted = true;
            }
        }

        /*
            Delay playback until our desired latency is reached.
        */
        if(recordingStarted)
        {
            sound.@lock(recordPos, soundLength, out IntPtr one, out IntPtr two, out uint lone, out uint ltwo); 
            int lengthElements = (int)soundLength * 1;
            Marshal.Copy(one, mDataBuffer, 0, lengthElements);

            /*
                Stop playback if recording stops.
            */
            RuntimeManager.CoreSystem.isRecording(captureDeviceIndex, out bool isRecording);
 
            if (!isRecording)
            {
                channel.setPaused(true);
            }

            /*
                Determine how much has been played since we last checked.
            */
            channel.getPosition(out uint playPos, FMOD.TIMEUNIT.PCM);

            uint playDelta = (playPos >= lastPlayPos) ? (playPos - lastPlayPos) : (playPos + soundLength - lastPlayPos);
            lastPlayPos = playPos;
            samplesPlayed += playDelta;

            /*
                Compensate for any drift.
            */
            uint latency = samplesRecorded - samplesPlayed;
            actualLatency = (uint)((0.97f * actualLatency) + (0.03f * latency));

            int playbackRate = captureSrate;
            if (actualLatency < (int)(adjustedLatency - driftThreshold))
            {
                /* Play position is catching up to the record position, slow playback down by 2% */
                playbackRate = captureSrate - (captureSrate / 50);
            }
            else if (actualLatency > (int)(adjustedLatency + driftThreshold))
            {
                /* Play position is falling behind the record position, speed playback up by 2% */
                playbackRate = captureSrate + (captureSrate / 50);
            }

            channel.setFrequency((float)playbackRate);
        }

    }
}

I am also getting a lot of crackling when running this script, are you finding the same on your end?

akwerius · March 11, 2022, 9:22pm

Hi Jeff,

Thank you for the reply…this is very helpful! Will need to delve deeper into the code to better understand what it’s doing. But we’re already confused by the fact that you comment out the Marshall.Copy in the DSLCallback and copy the samples from the sound instead in FixedUpdate. Is the DSPCallback even being used then?

No sound is heard running the script as is, only after uncommenting the Marshall.Copy line in the DSP. It doesn’t sound crackling either.

Thank you again!

jeff_fmod · March 18, 2022, 12:22am

The DSP callback is being used to fill the mDataBuffer- not sure why I left that part in FixedUpdate, must have been experimenting with the crackling noise and forgot to remove it.
I commented out that line because that was causing echo for me, are you still getting echo when commenting that line back in or is it working fine now?

akwerius · March 21, 2022, 7:50pm

Hi Jeff,
Thank you for the reply and clarification. It is working fine for me after commenting that line back in, no echo or crackling. Regarding the extra channels, were there also 6 channels on your end?
All the best,
kwe

jeff_fmod · March 23, 2022, 4:32am

I’m only getting 1 in and 1 out channel in the callback. If I plug in a duel input interface then I get 2 ins and 2 outs. Do you have an external interface plugged or something like that?

akwerius · April 23, 2022, 1:14am

Hi Jeff,

Sorry for the late reply. I don’t have an external interface plugged in. But that is a minor issue at this point as we can always skip the empty samples.

After several attempts, I’m still not sure I fully understand the FixedUpdate method, so not sure how to modify it in order to record the inbuffer samples to an mp3 file. I keep reading about an output_mp3 example on the forum but have yet to find it anywhere. Is the sound from createSound stored somewhere?

Thanks,
kwe

jeff_fmod · April 28, 2022, 2:27am

The output_mp3 script should be in “C:\Program Files (x86)\FMOD SoundSystem\FMOD Studio API Windows 2.02.06\api\core\examples\plugins”.

The Sound gets cleaned up by FMOD when running in real-time. If you use the output plugin approach that shouldn’t be an issue though as the output plugin will be reading from the mixer.

rich.ireland · January 6, 2023, 10:12pm

Apologies for resurrecting an old thread. But this is confusing:

Can you explain? With that line commented out, I get only silence from this example.
Is not the purpose of the callback to return valid data in the outbuffer?

jeff_fmod · January 8, 2023, 10:50pm

That is the purpose of this callback, I am not sure why I suggested otherwise, and I am finding that the script I posted does indeed require the copy in the DSP_READ_CALLBACK- apologies for the misinformation.
I am not sure where the original echo was coming from- maybe something like mic monitoring being enabled in the headset?
In any case, feel free to comment that line back in and let me know if you run into any other issues.

rich.ireland · January 9, 2023, 4:31pm

Thanks for the clarification. Now I need to understand where that echo is coming from. This is a good clue:

jeff_fmod · January 15, 2023, 10:01pm

I am not getting echo with any of these scripts, is your script identical to any of the previously posted ones? If not, if you could please share it I will try reproducing it on my end.

rich.ireland · January 16, 2023, 5:29pm

I have confirmed that the echo I was hearing was from the bluetooth headphones I was using at the time.

hardsmoke · January 10, 2024, 1:09pm

Hi Jeff, using code you are writen, for some reasons i have 6 inchannels and 6 outchannels in DSP_READ_CALLBACK method, but in sampleBuffer 4 of them are filled with zeros, and second one filled with the same value as first one, what could be the reason?

Leah_FMOD · January 15, 2024, 1:42am

Hi, I’ve responded to your question in the other thread you posted here: DSP_READ_CALLBACK use 6 in/out channels instead of 1

Topic		Replies	Views
DSP_READ_CALLBACK use 6 in/out channels instead of 1 Unity unity , csharp	1	269	January 15, 2024
Record fmod in an video Unity	15	1512	August 17, 2024
Custom DSP, Copy to Side Buffer Unity	4	1274	December 10, 2018
FMOD Voice Recording access to real-time playable bytes/data (C++) FMOD Engine	10	117	February 10, 2025
How to get waveform data for lipsync when audio is muted? Unity	5	453	July 28, 2023

Accessing real-time audio samples of incoming mic capture via DSPCallback

Related topics