Is it possible to detect completely blank audio within Unity?

nilavazhagan · April 16, 2024, 7:20pm

Hi,

Total newbie here. I am recording audio within Unity to send it to a speech to text (STT) model and use it in game. Unfortunately the STT doesn’t process audio that is completely blank. For example, the user pressed the record button, but doesn’t speak anything.

In this case I want to be able to differentiate between such blank audio and audio that actually has speech in it, so that I can handle it myself before sending it to the STT.

So, is there any way to do it? Also, is there a way to trim out silence parts of a recording and retain only the parts where the user actually spoke?

I tried googling, looked at docs… But it’s hard for me to find anything, cos as I already mentioned I’m a newbie and don’t even understand many terms that are being used.

So, I would appreciate greatly, if someone could point me to some resource which can help me or help me directly.

Thanks a lot

Connor_FMOD · April 17, 2024, 5:20am

Hi,

We have a scripting example outlining adding a DSP to the master ChannelGroup to create a visualizer: Unity Integration | Scripting Examples - DSP Capture. We can then track the value of mDataBuffer, a non-zero value indicating audio. Please note, that this will track all audio being output as the DSP is on the master channel group, if you only want to track the mic input then the DSP will have to be added to its own channel group.

In terms of only processing an audible signal, an option could be only passing in the sample data when mDataBuffer has a non-zero value.

Hope this helps!

nilavazhagan · April 17, 2024, 7:25am

I already have the float sample data with me. But, it seems to have non-zero values for empty audio too. Any idea what could be causing it? Is it just the background noise the mic is picking up and if so, is there a way to handle it?

The recording is done in RuntimeManager.CoreSystem.recordStart so, I think it’s using the Master Channel Group. Could that be the cause of the issue? If so, how can I use a separate channel group for this?

nilavazhagan · April 18, 2024, 4:54am

An update: I tried playing the audio that was being recorded. I noticed the audio volume is too low. It is barely audible. I think that’s causing the values to be very similar to empty noise. Any idea why this could be the case and how it can be fixed?

Connor_FMOD · April 23, 2024, 12:16am

Yes, you can choose a value that the float has to be higher/lower than before you track it e.g. -0.1 0.1, this should ignore background sounds.

Depending on how you are playing the sound you pass to RuntimeManager.CoreSystem.recordStart you may beable to just add the DSP to its channel: FMODUnity.RuntimeManager.CoreSystem.playSound(recSound, mCG, false, out channel /* add the DSP to this channel */);

Could I please get the full code that you are testing?

nilavazhagan · April 25, 2024, 5:31am

This is how we record the sound:

public bool StartRecording()
        {
            if (IsRecording)
                return true;

            _activeMicrophone = _microphones.FirstOrDefault(obj => obj.IsConnected);
            if (_activeMicrophone.IsValid == false)
                return false;

            const uint maxLength = 30; // seconds

            _createSoundInfo.cbsize           = Marshal.SizeOf(typeof(CREATESOUNDEXINFO));
            _createSoundInfo.numchannels      = _activeMicrophone.ChannelsCount;
            _createSoundInfo.format           = SOUND_FORMAT.PCMFLOAT;
            _createSoundInfo.defaultfrequency = _activeMicrophone.SampleRate;
            _createSoundInfo.length           = maxLength * (uint)_activeMicrophone.SampleRate * sizeof(float) * (uint)_activeMicrophone.ChannelsCount;

            var result = RuntimeManager.CoreSystem.createSound(
                _createSoundInfo.userdata, MODE.LOOP_NORMAL | MODE.OPENUSER, ref _createSoundInfo, out _sound
            );
            if (result != RESULT.OK)
            {
                StopRecording();
                return false;
            }

            result = RuntimeManager.CoreSystem.recordStart(_activeMicrophone.Index, _sound, true);
            if (result != RESULT.OK)
            {
                StopRecording();
                return false;
            }

            return true;
        }

And, this is how we extract the sound data:

private RecordedAudio ExtractAudioAndRelease()
        {
            _ = _sound.getLength(out var byteCount, TIMEUNIT.PCMBYTES);
            _ = _sound.@lock(0, byteCount, out var readData, out var ptr2, out var readBytes, out var len2);
            Assert.IsTrue(readBytes <= byteCount);

            _ = _sound.getFormat(out _, out _, out var channels, out _);
            _ = _sound.getDefaults(out var frequency, out _);

            // TODO: If the following allocation starts creating problems with GC performance consider implementing a global pool allocator
            var bytes = new byte[readBytes];
            Marshal.Copy(readData, bytes, 0, (int)readBytes);

            _ = _sound.unlock(readData, ptr2, readBytes, len2);
            _ = _sound.release();

            var length = bytes.FindLastIndex(value => value != 0);
            if (length == -1)
                return default;

            return new RecordedAudio {
                Data      = BytesToFloats(bytes, length),
                Frequency = (int)frequency,
                Channels  = channels
            };
        }

Connor_FMOD · April 29, 2024, 2:16am

Hi,

Thank you for the code. Unfortunately, I cannot see how you are starting the _sound. I believe you could add the filter to the bytes, could you try iterating through readData and only adding the values that are outside of your filter?

Hope this helps.

nilavazhagan · April 29, 2024, 4:56am

So, I don’t want to play the sound back to the user. Instead I’m sending the audio data to a STT model which would then transcribe the audio to text. I tried to play the sound back only for debugging purposes, and this is how I did it.

RuntimeManager.CoreSystem.getMasterChannelGroup(out var channelGroup);
RuntimeManager.CoreSystem.playSound(_sound, channelGroup, false, out _);

And, the thing here is I’m not trying to remove silences from the recorded audio… I’m only trying to differentiate between audio which has some speech in it and audio which just has background noise.

The problem seems to be that the values for both the empty audio and speech audio are very similar. Sometimes the empty audio has higher values than the speech audio too. So, I am not able to define a specific threshold to filter out the empty audio.

I suspect this is because of the low volume of the recorded audio. It is barely audible and it is very close to the empty sound. So, if we can find a way to increase the volume of the speech, we can probably differentiate it from the empty audio.

Connor_FMOD · May 2, 2024, 12:43am

I see, thanks for the info.

Could you try increasing the record level of the mic in the Sound settings of your operating system:

Let me know if that helps differentiate between the background noise and actual voice.

nilavazhagan · May 2, 2024, 10:56am

It is already at maximum, unfortunately.

Connor_FMOD · May 7, 2024, 5:43am

I see, are you able to test different microphones to make sure this is a hardware issue?

nilavazhagan · May 7, 2024, 6:30am

Yes, I’ve tried different microphones and also asked some of my colleagues to try with their hardware. Still the same.

Connor_FMOD · May 9, 2024, 2:34am

I see,

Would it be possible to get a copy of your project or
a stripped-down version displaying the issue uploaded to your profile? Please note you must register a project with us before uploading files.

Topic		Replies	Views
Mic state when recording Unity unity , csharp	11	1684	April 10, 2021
Accessing real-time audio samples of incoming mic capture via DSPCallback Unity	14	1511	January 15, 2024
Can't record with Unity Recorder Unity unity	8	3010	June 26, 2025
Voice recorder records insanely accelerated voice Unity unity , csharp	2	192	January 18, 2024
Recording audio signal from microphone Unity unity	1	742	August 17, 2020

Is it possible to detect completely blank audio within Unity?

Related topics