Accessing real-time audio samples of incoming mic capture via DSPCallback

Hi everyone,

We are working on a game. Our aim is to access in real-time buffer by buffer the samples incoming from a microphone capture. We leveraged the custom DSP example from the API and attached the DSP to a channel group that receives input samples coming from the microphone in-buffer. Here is the code:

// The CustomDSPCallback code was adapted from: https://fmod.com/resources/documentation-unity?version=2.02&page=examples-dsp-capture.html


using System;
using FMODUnity;
using UnityEngine;
using System.Runtime.InteropServices;

public class mic_in : MonoBehaviour
{
    //public variables
    [Header("Capture Device details")]
    public int captureDeviceIndex = 0;
    [TextArea] public string captureDeviceName = null;

    FMOD.CREATESOUNDEXINFO exinfo;
      	
	// Custom DSPCallback variables 
	private FMOD.DSP_READCALLBACK mReadCallback;
    private FMOD.DSP mCaptureDSP;
    public float[] mDataBuffer;
    private GCHandle mObjHandle;
    private uint mBufferLength;
	
	[AOT.MonoPInvokeCallback(typeof(FMOD.DSP_READCALLBACK))]
    static FMOD.RESULT CaptureDSPReadCallback(ref FMOD.DSP_STATE dsp_state, IntPtr inbuffer, IntPtr outbuffer, uint length, int inchannels, ref int outchannels)
    {
        FMOD.DSP_STATE_FUNCTIONS functions = (FMOD.DSP_STATE_FUNCTIONS)Marshal.PtrToStructure(dsp_state.functions, typeof(FMOD.DSP_STATE_FUNCTIONS));

        IntPtr userData;
        functions.getuserdata(ref dsp_state, out userData);

        GCHandle objHandle = GCHandle.FromIntPtr(userData);
        mic_in obj = objHandle.Target as mic_in;

		Debug.Log("inchannels:"+inchannels);
		Debug.Log("outchannels:"+outchannels);
		
        // Copy the incoming buffer to process later
        int lengthElements = (int)length * inchannels;
        Marshal.Copy(inbuffer, obj.mDataBuffer, 0, lengthElements);

        // Copy the inbuffer to the outbuffer so we can still hear it
        Marshal.Copy(obj.mDataBuffer, 0, outbuffer, lengthElements);

        return FMOD.RESULT.OK;
    }
	
    // Start is called before the first frame update
    void Start()
    {        
        // how many capture devices are plugged in for us to use.
	    int numOfDriversConnected;
        int numofDrivers;
	    FMOD.RESULT res = RuntimeManager.CoreSystem.getRecordNumDrivers(out numofDrivers, out numOfDriversConnected);
        
        if(res != FMOD.RESULT.OK)
        {
            Debug.Log("Failed to retrieve driver details: " + res);
            return;
        }

        if(numOfDriversConnected == 0)
        {
            Debug.Log("No capture devices detected!");
            return;
        }
        else
            Debug.Log("You have " + numOfDriversConnected + " capture devices available to record with.");
            

        // info about the device we're recording with.
        System.Guid micGUID;
        FMOD.DRIVER_STATE driverState;
        FMOD.SPEAKERMODE speakerMode;  
        int captureSrate;
        int captureNumChannels;
        RuntimeManager.CoreSystem.getRecordDriverInfo(captureDeviceIndex, out captureDeviceName, 50,
            out micGUID, out captureSrate, out speakerMode, out captureNumChannels, out driverState);
            

        Debug.Log("captureNumChannels of capture device: " + captureNumChannels);
		Debug.Log("captureSrate: " + captureSrate);


        // create sound where capture is recorded
        FMOD.Sound sound;
		exinfo.cbsize = System.Runtime.InteropServices.Marshal.SizeOf(typeof(FMOD.CREATESOUNDEXINFO));
        exinfo.numchannels = captureNumChannels;
        exinfo.format = FMOD.SOUND_FORMAT.PCM16;
        exinfo.defaultfrequency = captureSrate;
        exinfo.length = (uint)captureSrate * sizeof(short) * (uint)captureNumChannels;

        RuntimeManager.CoreSystem.createSound(exinfo.userdata, FMOD.MODE.LOOP_NORMAL | FMOD.MODE.OPENUSER, 
            ref exinfo, out sound);

        // start recording    
        RuntimeManager.CoreSystem.recordStart(captureDeviceIndex, sound, true);

        // play sound on dedicated channel in master channel group
        FMOD.ChannelGroup masterCG;
        FMOD.Channel channel;

        if (FMODUnity.RuntimeManager.CoreSystem.getMasterChannelGroup(out masterCG) != FMOD.RESULT.OK)
            Debug.LogWarningFormat("FMOD: Unable to create a master channel group: masterCG");

        FMODUnity.RuntimeManager.CoreSystem.getMasterChannelGroup(out masterCG);
        RuntimeManager.CoreSystem.playSound(sound, masterCG, true, out channel);
        channel.setPaused(false);





        // Assign the callback to a member variable to avoid garbage collection
        mReadCallback = CaptureDSPReadCallback;

        // Allocate a data buffer large enough for 8 channels, pin the memory to avoid garbage collection
        uint bufferLength;
        int numBuffers;
        FMODUnity.RuntimeManager.CoreSystem.getDSPBufferSize(out bufferLength, out numBuffers);
        mDataBuffer = new float[bufferLength * 8];
        mBufferLength = bufferLength;

		// Tentatively changed buffer length by calling setDSPBufferSize in file Assets/Plugins/FMOD/src/RuntimeManager.cs	
		// Tentatively changed buffer length by calling setDSPBufferSize in file Assets/Plugins/FMOD/src/fmod.cs - line 1150
		
		Debug.Log("buffer length:" + bufferLength);
		
		// Get a handle to this object to pass into the callback
        mObjHandle = GCHandle.Alloc(this);
        if (mObjHandle != null)
        {
            // Define a basic DSP that receives a callback each mix to capture audio
            FMOD.DSP_DESCRIPTION desc = new FMOD.DSP_DESCRIPTION();
            desc.numinputbuffers = 1;
            desc.numoutputbuffers = 1;
            desc.read = mReadCallback;
            desc.userdata = GCHandle.ToIntPtr(mObjHandle);

            // Create an instance of the capture DSP and attach it to the master channel group to capture all audio            
            if (FMODUnity.RuntimeManager.CoreSystem.createDSP(ref desc, out mCaptureDSP) == FMOD.RESULT.OK)
            {
                if (masterCG.addDSP(0, mCaptureDSP) != FMOD.RESULT.OK)
                {
                    Debug.LogWarningFormat("FMOD: Unable to add mCaptureDSP to the master channel group");
                }
            }
            else
            {
                Debug.LogWarningFormat("FMOD: Unable to create a DSP: mCaptureDSP");
            }
        }
        else
        {
            Debug.LogWarningFormat("FMOD: Unable to create a GCHandle: mObjHandle");
        }		
	



    }        
}

It sort of works except 2 main issues:
1. There is an echo. DSP is a passthrough copying the inbuffer to the out buffer, but every sounds is creating an echo (repeated twice) even with headphones
2. The latency is huge, and doesn’t seem to be affected at all by the size of the audio buffer set in FMOD settings. Any idea on how we could fix this problem.

Maybe a stream solution is better for the capture of the sound?

Side note: Even if the mic is mono the custom DSP receives 6 channels. We would understand 2 (stereo output), but why the extra 4 channels, which are always silent?

Any help or insights would be much appreciated!

kwe

The echo is because you are copying the inbuffer into the outbuffer, and because of the latency you have created a delay line and thus an echo. Commenting out this line should prevent that:

Marshal.Copy(obj.mDataBuffer, 0, outbuffer, lengthElements);

The latency I am less sure of, I see you haven’t implemented any latency or drift compensation- in the example it waits 50ms before starting to read back from the buffer, I found that adding in that feature resulted in improved latency compared to the code sample you shared. Very much just copied and pasted from the C++ example, but perhaps try this:

Unity recording example
// The CustomDSPCallback code was adapted from: https://fmod.com/resources/documentation-unity?version=2.02&page=examples-dsp-capture.html


using System;
using FMODUnity;
using UnityEngine;
using System.Runtime.InteropServices;

public class mic_in : MonoBehaviour
{
    //public variables
    [Header("Capture Device details")]
    public int captureDeviceIndex = 0;
    [TextArea] public string captureDeviceName = null;

    FMOD.CREATESOUNDEXINFO exinfo;

    // Custom DSPCallback variables 
    private FMOD.DSP_READCALLBACK mReadCallback;
    private FMOD.DSP mCaptureDSP;
    public float[] mDataBuffer;
    private GCHandle mObjHandle;
    private uint mBufferLength;
    private uint soundLength;
    int captureSrate;
    const int DRIFT_MS = 1;
    const int LATENCY_MS = 1000;
    uint driftThreshold;
    uint desiredLatency;
    uint adjustedLatency;
    uint actualLatency;
    uint lastRecordPos = 0;
    uint samplesRecorded = 0;
    uint samplesPlayed = 0;
    uint minRecordDelta = (uint)uint.MaxValue;
    uint lastPlayPos = 0;

    bool recordingStarted = false;

    FMOD.ChannelGroup masterCG;
    FMOD.Channel channel;
    FMOD.Sound sound;


    [AOT.MonoPInvokeCallback(typeof(FMOD.DSP_READCALLBACK))]
    static FMOD.RESULT CaptureDSPReadCallback(ref FMOD.DSP_STATE dsp_state, IntPtr inbuffer, IntPtr outbuffer, uint length, int inchannels, ref int outchannels)
    {
        FMOD.DSP_STATE_FUNCTIONS functions = (FMOD.DSP_STATE_FUNCTIONS)Marshal.PtrToStructure(dsp_state.functions, typeof(FMOD.DSP_STATE_FUNCTIONS));

        IntPtr userData;
        functions.getuserdata(ref dsp_state, out userData);

        GCHandle objHandle = GCHandle.FromIntPtr(userData);
        mic_in obj = objHandle.Target as mic_in;

        Debug.Log("inchannels:" + inchannels);
        Debug.Log("outchannels:" + outchannels);

        // Copy the incoming buffer to process later
        int lengthElements = (int)length * inchannels;
        Marshal.Copy(inbuffer, obj.mDataBuffer, 0, lengthElements);

        // Copy the inbuffer to the outbuffer so we can still hear it
        //Marshal.Copy(obj.mDataBuffer, 0, outbuffer, lengthElements);

        return FMOD.RESULT.OK;
    }

    // Start is called before the first frame update
    void Start()
    {
        // how many capture devices are plugged in for us to use.
        int numOfDriversConnected;
        int numofDrivers;
        FMOD.RESULT res = RuntimeManager.CoreSystem.getRecordNumDrivers(out numofDrivers, out numOfDriversConnected);

        if (res != FMOD.RESULT.OK)
        {
            Debug.Log("Failed to retrieve driver details: " + res);
            return;
        }

        if (numOfDriversConnected == 0)
        {
            Debug.Log("No capture devices detected!");
            return;
        }
        else
            Debug.Log("You have " + numOfDriversConnected + " capture devices available to record with.");


        // info about the device we're recording with.
        System.Guid micGUID;
        FMOD.DRIVER_STATE driverState;
        FMOD.SPEAKERMODE speakerMode;
        int captureNumChannels;
        RuntimeManager.CoreSystem.getRecordDriverInfo(captureDeviceIndex, out captureDeviceName, 50,
            out micGUID, out captureSrate, out speakerMode, out captureNumChannels, out driverState);

        driftThreshold = (uint)(captureSrate * DRIFT_MS) / 1000;       /* The point where we start compensating for drift */
        desiredLatency = (uint)(captureSrate * LATENCY_MS) / 1000;     /* User specified latency */
        adjustedLatency = (uint)desiredLatency;                      /* User specified latency adjusted for driver update granularity */
        actualLatency = (uint)desiredLatency;                                 /* Latency measured once playback begins (smoothened for jitter) */


        Debug.Log("captureNumChannels of capture device: " + captureNumChannels);
        Debug.Log("captureSrate: " + captureSrate);


        // create sound where capture is recorded
        exinfo.cbsize = System.Runtime.InteropServices.Marshal.SizeOf(typeof(FMOD.CREATESOUNDEXINFO));
        exinfo.numchannels = captureNumChannels;
        exinfo.format = FMOD.SOUND_FORMAT.PCM16;
        exinfo.defaultfrequency = captureSrate;
        exinfo.length = (uint)captureSrate * sizeof(short) * (uint)captureNumChannels;

        RuntimeManager.CoreSystem.createSound(exinfo.userdata, FMOD.MODE.LOOP_NORMAL | FMOD.MODE.OPENUSER,
            ref exinfo, out sound);

        // start recording    
        RuntimeManager.CoreSystem.recordStart(captureDeviceIndex, sound, true);


        sound.getLength(out soundLength, FMOD.TIMEUNIT.PCM);

        // play sound on dedicated channel in master channel group

        if (FMODUnity.RuntimeManager.CoreSystem.getMasterChannelGroup(out masterCG) != FMOD.RESULT.OK)
            Debug.LogWarningFormat("FMOD: Unable to create a master channel group: masterCG");

        FMODUnity.RuntimeManager.CoreSystem.getMasterChannelGroup(out masterCG);
        RuntimeManager.CoreSystem.playSound(sound, masterCG, true, out channel);
        channel.setPaused(true);

        // Assign the callback to a member variable to avoid garbage collection
        mReadCallback = CaptureDSPReadCallback;

        // Allocate a data buffer large enough for 8 channels, pin the memory to avoid garbage collection
        uint bufferLength;
        int numBuffers;
        FMODUnity.RuntimeManager.CoreSystem.getDSPBufferSize(out bufferLength, out numBuffers);
        mDataBuffer = new float[bufferLength * 8];
        mBufferLength = bufferLength;

        // Tentatively changed buffer length by calling setDSPBufferSize in file Assets/Plugins/FMOD/src/RuntimeManager.cs	
        // Tentatively changed buffer length by calling setDSPBufferSize in file Assets/Plugins/FMOD/src/fmod.cs - line 1150

        Debug.Log("buffer length:" + bufferLength);

        // Get a handle to this object to pass into the callback
        mObjHandle = GCHandle.Alloc(this);
        if (mObjHandle != null)
        {
            // Define a basic DSP that receives a callback each mix to capture audio
            FMOD.DSP_DESCRIPTION desc = new FMOD.DSP_DESCRIPTION();
            desc.numinputbuffers = 2;
            desc.numoutputbuffers = 2;
            desc.read = mReadCallback;
            desc.userdata = GCHandle.ToIntPtr(mObjHandle);

            // Create an instance of the capture DSP and attach it to the master channel group to capture all audio            
            if (FMODUnity.RuntimeManager.CoreSystem.createDSP(ref desc, out mCaptureDSP) == FMOD.RESULT.OK)
            {
                if (masterCG.addDSP(0, mCaptureDSP) != FMOD.RESULT.OK)
                {
                    Debug.LogWarningFormat("FMOD: Unable to add mCaptureDSP to the master channel group");
                }
            }
            else
            {
                Debug.LogWarningFormat("FMOD: Unable to create a DSP: mCaptureDSP");
            }
        }
        else
        {
            Debug.LogWarningFormat("FMOD: Unable to create a GCHandle: mObjHandle");
        }
    }

    private void FixedUpdate()
    {
        RuntimeManager.CoreSystem.getRecordPosition(captureDeviceIndex, out uint recordPos);

        uint recordDelta = (recordPos >= lastRecordPos) ? (recordPos - lastRecordPos) : (recordPos + soundLength - lastRecordPos);
        lastRecordPos = recordPos;
        samplesRecorded += recordDelta;

        if (recordDelta != 0 && (recordDelta < minRecordDelta))
        {
            minRecordDelta = recordDelta; /* Smallest driver granularity seen so far */
            adjustedLatency = (recordDelta <= desiredLatency) ? desiredLatency : recordDelta; /* Adjust our latency if driver granularity is high */
        }

        if (!recordingStarted)
        {
            if (samplesRecorded >= adjustedLatency)
            {
                channel.setPaused(false);
                recordingStarted = true;
            }
        }

        /*
            Delay playback until our desired latency is reached.
        */
        if(recordingStarted)
        {
            sound.@lock(recordPos, soundLength, out IntPtr one, out IntPtr two, out uint lone, out uint ltwo); 
            int lengthElements = (int)soundLength * 1;
            Marshal.Copy(one, mDataBuffer, 0, lengthElements);

            /*
                Stop playback if recording stops.
            */
            RuntimeManager.CoreSystem.isRecording(captureDeviceIndex, out bool isRecording);
 
            if (!isRecording)
            {
                channel.setPaused(true);
            }

            /*
                Determine how much has been played since we last checked.
            */
            channel.getPosition(out uint playPos, FMOD.TIMEUNIT.PCM);

            uint playDelta = (playPos >= lastPlayPos) ? (playPos - lastPlayPos) : (playPos + soundLength - lastPlayPos);
            lastPlayPos = playPos;
            samplesPlayed += playDelta;

            /*
                Compensate for any drift.
            */
            uint latency = samplesRecorded - samplesPlayed;
            actualLatency = (uint)((0.97f * actualLatency) + (0.03f * latency));

            int playbackRate = captureSrate;
            if (actualLatency < (int)(adjustedLatency - driftThreshold))
            {
                /* Play position is catching up to the record position, slow playback down by 2% */
                playbackRate = captureSrate - (captureSrate / 50);
            }
            else if (actualLatency > (int)(adjustedLatency + driftThreshold))
            {
                /* Play position is falling behind the record position, speed playback up by 2% */
                playbackRate = captureSrate + (captureSrate / 50);
            }

            channel.setFrequency((float)playbackRate);
        }

    }
}

I am also getting a lot of crackling when running this script, are you finding the same on your end?

Hi Jeff,

Thank you for the reply…this is very helpful! Will need to delve deeper into the code to better understand what it’s doing. But we’re already confused by the fact that you comment out the Marshall.Copy in the DSLCallback and copy the samples from the sound instead in FixedUpdate. Is the DSPCallback even being used then?

No sound is heard running the script as is, only after uncommenting the Marshall.Copy line in the DSP. It doesn’t sound crackling either.

Thank you again!

The DSP callback is being used to fill the mDataBuffer- not sure why I left that part in FixedUpdate, must have been experimenting with the crackling noise and forgot to remove it.
I commented out that line because that was causing echo for me, are you still getting echo when commenting that line back in or is it working fine now?

Hi Jeff,
Thank you for the reply and clarification. It is working fine for me after commenting that line back in, no echo or crackling. Regarding the extra channels, were there also 6 channels on your end?
All the best,
kwe

I’m only getting 1 in and 1 out channel in the callback. If I plug in a duel input interface then I get 2 ins and 2 outs. Do you have an external interface plugged or something like that?

Hi Jeff,

Sorry for the late reply. I don’t have an external interface plugged in. But that is a minor issue at this point as we can always skip the empty samples.

After several attempts, I’m still not sure I fully understand the FixedUpdate method, so not sure how to modify it in order to record the inbuffer samples to an mp3 file. I keep reading about an output_mp3 example on the forum but have yet to find it anywhere. Is the sound from createSound stored somewhere?

Thanks,
kwe

The output_mp3 script should be in “C:\Program Files (x86)\FMOD SoundSystem\FMOD Studio API Windows 2.02.06\api\core\examples\plugins”.

The Sound gets cleaned up by FMOD when running in real-time. If you use the output plugin approach that shouldn’t be an issue though as the output plugin will be reading from the mixer.