Low Latency Microphone Input and DSP

I’ve tried a lot of different approaches and am hitting some dead ends!

I’m developing low latency pitch detection for a Unity application for iOS devices. Is it possible to get low latency input from the microphone using FMOD that is as low as what I can do with a regular audio source? When I use a regular Unity audio source in a test scene the latency is very fast, unfortunately since FMOD is integrated into our project, when I tested this method in the project there seems to be added latency when using the regular audiosource. I like using the regular audiosource because there is a built in GetSpectrumData function that works well with my pitch detection.

I tried using the method where you use recordstart in fmod, play the recording as a sound and then use a DSP object to get the spectrum data. This may be the way to go if I can get the latency short enough and get it working with my pitch detectors. Unfortunately I ran into a few problems.

  1. The data resulted in both my pitch detectors outputting the wrong frequency. It was always way too flat. This still happens even when I disable the speed up and slow down, change the window size and change the window type (the pitch detectors are both correct when fed data from a regular unity audiosource).
  2. After running a profiler, it seems like every time I grab the data from the DSP object I get ~38 ms of latency.

Here is the code:

private float[] spectrumData;
    private uint LATENCY_MS                         = 10;
    private uint DRIFT_MS                           = 1;

    private uint samplesRecorded, samplesPlayed     = 0;
    private int nativeRate, nativeChannels          = 0;
    private uint recSoundLength                     = 0;
    uint lastPlayPos                                = 0;
    uint lastRecordPos                              = 0;
    private uint driftThreshold                     = 0;
    private uint desiredLatency                     = 0;
    private uint adjustLatency                      = 0;
    private int actualLatency                       = 0;
    private System.IntPtr _data;

    private FMOD.CREATESOUNDEXINFO exInfo           = new FMOD.CREATESOUNDEXINFO();
    private float timer = 0f;
    private float interval = .08f;

    private FMOD.Sound recSound;
    private FMOD.Channel channel;

    // Start is called before the first frame update
    void Start()
    {
        /*
            Determine latency in samples.
        */

        FMODUnity.RuntimeManager.CoreSystem.getRecordDriverInfo(0, out _, 0, out _, out nativeRate, out _, out nativeChannels, out _);

        driftThreshold          = (uint)(nativeRate * DRIFT_MS) / 1000;
        desiredLatency          = (uint)(nativeRate * LATENCY_MS) / 1000;
        adjustLatency           = desiredLatency;
        actualLatency           = (int)desiredLatency;

        /*
            Create user sound to record into, then start recording.
        */
        exInfo.cbsize           = Marshal.SizeOf(typeof(FMOD.CREATESOUNDEXINFO));
        exInfo.numchannels      = nativeChannels;
        exInfo.format           = FMOD.SOUND_FORMAT.PCM16;
        exInfo.defaultfrequency = nativeRate;
        Debug.Log("Native rate " + nativeRate);
        exInfo.length           = (uint)(nativeRate * sizeof(short) * nativeChannels);

        Debug.Log(exInfo.cbsize + "num channels" + exInfo.numchannels + " format " + exInfo.format + " defualt frequency " + exInfo.defaultfrequency + " length " + exInfo.length );
        FMODUnity.RuntimeManager.CoreSystem.createSound("", FMOD.MODE.LOOP_NORMAL | FMOD.MODE.OPENUSER, ref exInfo, out recSound);

        FMODUnity.RuntimeManager.CoreSystem.recordStart(0, recSound, true);

        recSound.getLength(out recSoundLength, FMOD.TIMEUNIT.PCM);
        // Create DSP for spectrum analysis
        FMODUnity.RuntimeManager.CoreSystem.createDSPByType(FMOD.DSP_TYPE.FFT, out dsp);
        dsp.setParameterInt((int)FMOD.DSP_FFT.WINDOWTYPE, (int)FMOD.DSP_FFT_WINDOW.BLACKMAN);
        dsp.setParameterInt((int)FMOD.DSP_FFT.WINDOWSIZE, (spectrumSize));

        // Get spectrum data size
        spectrumData = new float[spectrumSize];

        // Add DSP to the channel group
    }

    // Update is called once per frame
    void Update()
    {
        //Debug.Log("Final Frequency " + EstimateHPS(GetSpectrumData()));
        timer += Time.deltaTime;

        if (timer > interval)
        {
            timer = 0;
            Debug.Log("Frequency: " + EstimateHPS(GetSpectrumData()));
            Debug.Log("SRH Freq:" + estimator.Estimate(GetSpectrumData()));
        }
        
        /*
            Determine how much has been recorded since we last checked
        */
        uint recordPos      = 0;
        FMODUnity.RuntimeManager.CoreSystem.getRecordPosition(0, out recordPos);
        
        uint recordDelta    = (recordPos >= lastRecordPos) ? (recordPos - lastRecordPos) : (recordPos + recSoundLength - lastRecordPos);
        lastRecordPos       = recordPos;
        samplesRecorded     += recordDelta;

        uint minRecordDelta = 0;
        if (recordDelta != 0 && (recordDelta < minRecordDelta))
        {
            minRecordDelta = recordDelta; // Smallest driver granularity seen so far
            adjustLatency = (recordDelta <= desiredLatency) ? desiredLatency : recordDelta; // Adjust our latency if driver granularity is high
        }

        /*
            Delay playback until our desired latency is reached.
        */

        if (!channel.hasHandle() && samplesRecorded >= adjustLatency)
        {
            FMODUnity.RuntimeManager.CoreSystem.getMasterChannelGroup(out FMOD.ChannelGroup mCG);
            FMODUnity.RuntimeManager.CoreSystem.playSound(recSound, mCG, false, out channel);
            channel.addDSP(FMOD.CHANNELCONTROL_DSP_INDEX.HEAD, dsp);
            //Debug.Log("First If statement called!");
        }

        /*
            Determine how much has been played since we last checked.
        */
        if (channel.hasHandle())
        {
            uint playPos = 0;
            channel.getPosition(out playPos, FMOD.TIMEUNIT.PCM);
            //Debug.Log("Get Position() = playPos)");
            uint playDelta  = (playPos >= lastPlayPos) ? (playPos - lastPlayPos) : (playPos + recSoundLength - lastPlayPos);
            lastPlayPos     = playPos;
            samplesPlayed   += playDelta;

            //Debug.Log("Samples recorded: " + samplesRecorded + " Samples Played " + samplesPlayed);
            // Compensate for any drift.
            int latency     = (int)(samplesRecorded - samplesPlayed);
            actualLatency   = (int)((0.97f * actualLatency) + (0.03f * latency));
            //Debug.Log("Latency: " + latency + " Actual Latency: " + actualLatency);
            int playbackRate = nativeRate;
            //Debug.Log("Adjust Latency: " + adjustLatency);
            if (actualLatency < (int)(adjustLatency - driftThreshold))
            {
                // Playback position is catching up to the record position, slow playback down by 2%
                playbackRate = nativeRate - (nativeRate / 50);
            }

            else if (actualLatency > (int)(adjustLatency + driftThreshold))
            {
                // Playback is falling behind the record position, speed playback up by 2%
                playbackRate = nativeRate + (nativeRate / 50);
            }

            channel.setFrequency((float)playbackRate);
        }
    }
    public float[] GetSpectrumData()
    {
        
        uint _length;
        dsp.getParameterData((int)FMOD.DSP_FFT.SPECTRUMDATA, out _data, out _length);
        _fftparam = (FMOD.DSP_PARAMETER_FFT)Marshal.PtrToStructure(_data, typeof(FMOD.DSP_PARAMETER_FFT));

        if (_fftparam.numchannels >= 1)
        {
            Debug.Log(_fftparam.numchannels);
            for (int s = 0; s < spectrumSize; s++)
            {
                float _totalChannelData = 0f;
                for (int c = 0; c < _fftparam.numchannels; c++)
                    _totalChannelData += _fftparam.spectrum[c][s];
                spectrumData[s] = _totalChannelData / _fftparam.numchannels;
            }
        }

        string spectrumLog = "Spectrum Data: ";
        for (int i = 0; i < spectrumData.Length; i++)
        {
           spectrumLog += spectrumData[i] + ", ";
        }
        Debug.Log(spectrumLog);
        Debug.Log("speclength " + spectrumData.Length);
        return spectrumData;
    }

My next attempt is to use the sound.lock and grab the sample data directly. This seems like it could be the best approach because I don’t have to worry about the overhead of playing audio and using a dsp object. All I need is the audio data so I can analyze it for real-time pitch detection. I have attempted it, but I think I am accessing the data incorrectly. My sample data is filled with NaNs and when the buffer wraps around ptr2 and len2 don’t change.

Here is the code for that attempt:

//Microphone Input
    public int spectrumSize = 2048;
    private float[] spectrumData;
    private int nativeRate, nativeChannels;
    private uint recSoundLength;
    private FMOD.Sound recSound;
    private float timer = 0f;
    private float interval = 0.08f;

    private uint recordPosition = 0;

    // Start is called before the first frame update
    void Start()
    {
        FMODUnity.RuntimeManager.CoreSystem.getRecordDriverInfo(0, out _, 0, out _, out nativeRate, out _, out nativeChannels, out _);
        var exInfo = new FMOD.CREATESOUNDEXINFO
        {
            cbsize = Marshal.SizeOf(typeof(FMOD.CREATESOUNDEXINFO)),
            numchannels = nativeChannels,
            format = FMOD.SOUND_FORMAT.PCM16,
            defaultfrequency = nativeRate,
            length = (uint)(nativeRate * sizeof(short) * nativeChannels)
        };
        FMODUnity.RuntimeManager.CoreSystem.createSound("", FMOD.MODE.LOOP_NORMAL | FMOD.MODE.OPENUSER, ref exInfo, out recSound);
        FMODUnity.RuntimeManager.CoreSystem.recordStart(0, recSound, true);
        recSound.getLength(out recSoundLength, FMOD.TIMEUNIT.PCM);
        spectrumData = new float[spectrumSize];
        FMODUnity.RuntimeManager.CoreSystem.getRecordPosition(0, out recordPosition);

    }

    // Update is called once per frame
    void Update()
    {
        timer += Time.deltaTime;
        if (true)
        {
            timer = 0f;
            GetRawAudioSamples(spectrumSize);
        }
    }

    private float[] GetRawAudioSamples(int windowSize)
    {
        // Validate window size
        Debug.Log("native channels:" + nativeChannels);
        Debug.Log("Offset: " + (uint)(recordPosition - windowSize * sizeof(short) * nativeChannels));
        Debug.Log("Length: " + (uint)(spectrumSize * sizeof(short) * nativeChannels));

        if (windowSize * sizeof(short) * nativeChannels > recSoundLength)
        {
            Debug.LogError("Window size is too large for the sound buffer length.");
            return null;
        }

        // Lock the sound to access the latest data
        IntPtr ptr1, ptr2;
        uint len1, len2;

        //Get current length of recording
        FMODUnity.RuntimeManager.CoreSystem.getRecordPosition(0, out recordPosition);


        Debug.Log("Position of sound: " + recordPosition);

        FMOD.RESULT result = recSound.@lock((uint)(recordPosition - windowSize * sizeof(short) * nativeChannels), (uint)(spectrumSize * sizeof(short) * nativeChannels), out ptr1, out ptr2, out len1, out len2);

        // Check the result of the lock operation
        if (result != FMOD.RESULT.OK)
        {
            Debug.LogError($"Failed to lock sound buffer: {result}");
            return null;
        }

        Debug.Log("len1 " + len1);
        Debug.Log("len2 " + len2);
        
        // Validate the pointers and lengths
        if (ptr1 == IntPtr.Zero || len1 == 0)
        {
            Debug.LogError("Invalid pointer or length after locking sound.");
            recSound.unlock(ptr1, ptr2, len1, len2); // Ensure unlock is called even on failure
            return null;
        }
        if (ptr2 != IntPtr.Zero)
        {
           Debug.Log("HELLOOOO PTR2 not zero!");
        }

        if(len2 !=0)
        {
            Debug.Log("len2 not 0! " + len2);
        }
        if(ptr2 != null)
        {
            Debug.Log("ptr2 not null");
            Debug.Log(ptr2);
            Debug.Log(ptr1);
        }

        // Copy data for analysis (assuming mono for simplicity)
        // Ensure that the length does not exceed the buffer size
        int copyLength = Mathf.Min((int)len1 / sizeof(short), spectrumData.Length);
        Marshal.Copy(ptr1, spectrumData, 0, copyLength);

        // Validate copied data (check for NaNs)
        if (Array.Exists(spectrumData, float.IsNaN))
        {
            Debug.LogError("Copied data contains NaN values.");
        }

        // Unlock the sound
        result = recSound.unlock(ptr1, ptr2, len1, len2);
        if (result != FMOD.RESULT.OK)
        {
            Debug.LogError($"Failed to unlock sound buffer: {result}");
        }

        Debug.Log($"Spectrum Data Length: {spectrumData.Length}");
        Debug.Log("Spectrum Data: " + floatArrayToString(spectrumData));
        return spectrumData;
    }

Are there any resources where microphone input is recorded and analyzed that I could take a look at in the FMOD api examples? What advice do you have around getting the real-time microphone input for analysis? I see four options:

  1. Try to reduce latency when using a regular unity audiosource object alongside FMOD
  2. Use the DSP object in Unity to analyze the fft of a played sound
  3. Use sound.lock to directly access the data and use a 3rd party fft tool to get the spectrum
  4. Use Swift in the ios app and see if I can feed that data into my unity project

Right now I’m leaning towards the third and trying to get sound.lock to correctly get the most recent microphone data.

Here is are examples of sample data I am currently recieving from sound.lock:

Spectrum Data: NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 2.802597E-45, 2.755135E-40, 1.83671E-40, 1.401298E-45, NaN, 9.183129E-41, 0, 0, 4.591803E-40, 6.428597E-40, 1.285711E-39, 1.377552E-39, 1.744897E-39, 1.469395E-39, 1.285717E-39, 1.19388E-39, 1.102046E-39, 1.285714E-39, 1.285719E-39, 1.469389E-39, 1.193877E-39, 1.102047E-39, 1.010207E-39, 1.010207E-39, 1.10204E-39, 1.010204E-39, 1.377546E-39, 1.285721E-39, 1.285718E-39, 1.469388E-39, 1.377555E-39, 1.65306E-39, 1.469392E-39, 1.285721E-39, 1.469388E-39, 1.193882E-39, 1.19388E-39, 1.193875E-39, 7.346994E-40, 7.347008E-40, 1.10204E-39, 1.469388E-39, 1.377551E-39, 8.265335E-40, 6.428611E-40, 9.183704E-40, 9.183648E-40, 5.510256E-40, 4.591873E-40, 3.673476E-40, 3.673476E-40, 3.673448E-40, 3.673462E-40...

Update on this! I made progress with the sound.lock method and found a few small errors in my logic that were causing the NaN values. My pitch detection now works in the editor but not the build. I am currently working on understanding why the audio data is incorrect in the build. I’m guessing it might have to do with microphone permissions.

1 Like

Hi,

I would love to see your fixed code if that would be ok?

This can be very difficult to sort out, are you just building to iOS or Android as well?