When recording mic input to wav file, second of silence exists at beginning of file

Whenever I record microphone input to a wave file, silence is always inserted at the beginning of the track. I am using two FMOD systems (one for recording and another for wav writing the currently recording sound) method seen here: Save a recorded sound to .wav - #3 by jeff_fmod

After trying many things, I have discovered that the amount of silence seems to be directly connected to the PCM format.

PCM8: 2 seconds of silence
PCM16: 1 second of silence
PCM24: 0.75 seconds of silence
PCM32 or PCMFLOAT: 0.5 seconds of silence.

It seems to always be those exact amounts of silence with no variation. Tweaking things like DSPBufferSize, number of channels, and frequency have no effect.

On top of that, when I stop the recording, the same amount of recording gets cut off from the end of the track. For example: if I record in PCM16, the first second of the track will be empty and the last second of microphone input will be missing entirely.

Any idea what could be causing this?

I pulled the code out of my engine and set up a C# console program. It is still having the exact same issue with the empty second of silence in the exported wave file. Here is the code:

static void Main(string[] args)
{
	Console.WriteLine("Press any key to record");
	Console.ReadKey();

	ChannelGroup channelGroup = new ChannelGroup();
	CREATESOUNDEXINFO exInfo = new CREATESOUNDEXINFO();

	exInfo.cbsize = Marshal.SizeOf(typeof(CREATESOUNDEXINFO));
	exInfo.numchannels = 1;
	exInfo.format = SOUND_FORMAT.PCM16;
	exInfo.defaultfrequency = 44100;
	exInfo.length = 44100 * sizeof(short) * 1;

	Factory.System_Create(out FMOD.System micRecordSystem);
	Factory.System_Create(out FMOD.System wavWriterSystem);

	micRecordSystem.init(64, INITFLAGS.NORMAL, IntPtr.Zero);
	wavWriterSystem.init(64, INITFLAGS.NORMAL, IntPtr.Zero);

	micRecordSystem.createSound(exInfo.userdata, MODE.LOOP_NORMAL | MODE.OPENUSER, ref exInfo, out Sound Sound);
	micRecordSystem.recordStart(0, Sound, true);

	wavWriterSystem.setOutput(OUTPUTTYPE.WAVWRITER);
	wavWriterSystem.playSound(Sound, channelGroup, false, out Channel wavChannel);

	Console.WriteLine("Press any key to stop");

	while (!Console.KeyAvailable)
	{
		micRecordSystem.update();
		wavWriterSystem.update();

		System.Threading.Thread.Sleep(16);
	}

	micRecordSystem.recordStop(0);
	wavChannel.stop();
	Sound.release();
	wavWriterSystem.setOutput(OUTPUTTYPE.NOSOUND);
	wavWriterSystem.close();
	wavWriterSystem.release();
	micRecordSystem.close();
	micRecordSystem.release();

	Console.WriteLine("Done!");
}

After looking at record.cpp and researching other threads, I see that playing back the recording sound immediately can cause a big delay. Instead, you need to intentionally delay the playback a small amount. I took the cpp latency code and csharpified it for a working example.

Sorry that this code isn’t super clean. Hopefully this will help others!

ChannelGroup channelGroup = new ChannelGroup();
CREATESOUNDEXINFO exInfo = new CREATESOUNDEXINFO();

exInfo.cbsize = Marshal.SizeOf(typeof(CREATESOUNDEXINFO));
exInfo.numchannels = 1;
exInfo.format = SOUND_FORMAT.PCM16;
exInfo.defaultfrequency = 44100;
exInfo.length = (uint)exInfo.defaultfrequency * sizeof(short) * (uint)exInfo.numchannels;

Factory.System_Create(out FMOD.System micRecordSystem);
Factory.System_Create(out FMOD.System wavWriterSystem);


wavWriterSystem.setOutput(OUTPUTTYPE.NOSOUND);
micRecordSystem.init(32, INITFLAGS.NORMAL, IntPtr.Zero);
wavWriterSystem.init(32, INITFLAGS.NORMAL, IntPtr.Zero);

Console.WriteLine("Press any key to record");
Console.ReadKey();

bool isWritingWav = false;
uint lastRecordPos = 0;
uint samplesRecorded = 0;
uint minRecordDelta = uint.MaxValue;
uint desiredLatency = ((uint)exInfo.defaultfrequency * 200) / 1000;
uint adjustedLatency = desiredLatency;
Channel wavChannel = new Channel();

micRecordSystem.createSound(exInfo.userdata, MODE.LOOP_NORMAL | MODE.OPENUSER, ref exInfo, out Sound Sound);
micRecordSystem.recordStart(0, Sound, true);

Sound.getLength(out uint soundLength, TIMEUNIT.PCM);

Console.WriteLine("Press any key to stop");

while (!Console.KeyAvailable)
{
    micRecordSystem.getRecordPosition(0, out uint recordPos);

    uint recordDelta;
    if (recordPos >= lastRecordPos)
    {
        recordDelta = recordPos - lastRecordPos;
    }
    else
    {
        recordDelta = recordPos + soundLength - lastRecordPos;
    }
    samplesRecorded += recordDelta;

    if (recordDelta > 0 && recordDelta < minRecordDelta)
    {
        minRecordDelta = recordDelta;
        if (recordDelta <= desiredLatency)
        {
            adjustedLatency = desiredLatency;
        }
        else
        {
            adjustedLatency = recordDelta;
        }
    }

    if (!isWritingWav && samplesRecorded >= adjustedLatency)
    {
        wavWriterSystem.setOutput(OUTPUTTYPE.WAVWRITER);
        wavWriterSystem.playSound(Sound, channelGroup, false, out wavChannel);
        isWritingWav = true;
    }

    micRecordSystem.update();
    wavWriterSystem.update();

    System.Threading.Thread.Sleep(16);
}

micRecordSystem.recordStop(0);
wavChannel.stop();
            
Sound.release();
wavWriterSystem.setOutput(OUTPUTTYPE.NOSOUND);
micRecordSystem.close();
micRecordSystem.release();
wavWriterSystem.close();
wavWriterSystem.release();

Console.WriteLine("Done!");

Also, if this helps anyone… I tried getting a version working with just one fmod system and writing to a file while locking/unlocking the audio. While it worked, it had a few pops in audio here and there, so I prob got something wrong, but I’ll post anyway in case this helps anyone in the future.

static void SingleSystemVersion()
{
    CREATESOUNDEXINFO exInfo = new CREATESOUNDEXINFO();
    exInfo.cbsize = Marshal.SizeOf(typeof(CREATESOUNDEXINFO));
    exInfo.numchannels = 1;
    exInfo.format = SOUND_FORMAT.PCM16;
    exInfo.defaultfrequency = 44100;
    exInfo.length = (uint)exInfo.defaultfrequency * sizeof(short) * (uint)exInfo.numchannels * 2;

    Factory.System_Create(out FMOD.System micRecordSystem);

    micRecordSystem.init(32, INITFLAGS.NORMAL, IntPtr.Zero);

    Console.WriteLine("Press any key to record");
    Console.ReadKey();

    micRecordSystem.createSound(exInfo.userdata, MODE.LOOP_NORMAL | MODE.OPENUSER, ref exInfo, out Sound Sound);
    micRecordSystem.recordStart(0, Sound, true);

    FileStream fs = File.Create("record.wav");
    BinaryWriter bw = new BinaryWriter(fs);

    uint soundLength, dataLength = 0;

    WriteWavHeader(bw, Sound, dataLength);

    Sound.getLength(out soundLength, TIMEUNIT.PCM);

    uint lastRecordPos = 0;

    Console.WriteLine("Press any key to stop");

    while (!Console.KeyAvailable)
    {
        uint recordPos;

        micRecordSystem.getRecordPosition(0, out recordPos);

        if (recordPos != lastRecordPos)
        {
            IntPtr ptr1, ptr2;
            int blockLength;
            uint length1, length2;

            blockLength = (int)recordPos - (int)lastRecordPos;
            if (blockLength < 0)
            {
                blockLength += (int)soundLength;
            }

            Sound.@lock(lastRecordPos * (uint)exInfo.numchannels * 2, (uint)blockLength * (uint)exInfo.numchannels * 2, out ptr1, out ptr2, out length1, out length2);

            if (ptr1 != IntPtr.Zero && length1 > 0)
            {
                byte[] bytes = new byte[length1];
                Marshal.Copy(ptr1, bytes, 0, (int)length1);
                fs.Write(bytes, 0, (int)length1);
                dataLength += length1;
            }
            if (ptr2 != IntPtr.Zero && length2 > 0)
            {
                byte[] bytes = new byte[length2];
                Marshal.Copy(ptr2, bytes, 0, (int)length2);
                fs.Write(bytes, 0, (int)length2);
                dataLength += length2;
            }    

            Sound.unlock(ptr1, ptr2, length1, length2);
        }

        lastRecordPos = recordPos;

        micRecordSystem.update();

        System.Threading.Thread.Sleep(16);
    }

    WriteWavHeader(bw, Sound, dataLength);

    bw.Close();
    fs.Close();

    micRecordSystem.recordStop(0);
    Sound.release();


    micRecordSystem.close();
    micRecordSystem.release();

    Console.WriteLine("Done!");
}

static void WriteWavHeader(BinaryWriter bw, Sound sound, uint dataLength)
{
    sound.getFormat(out _, out _, out int numChannels, out int bits);
    sound.getDefaults(out float sampleRate, out _);

    bw.Seek(0, SeekOrigin.Begin);

    bw.Write(System.Text.Encoding.ASCII.GetBytes("RIFF"));      //RIFF                          4 bytes chars
    bw.Write(32 + dataLength);                                  //File Size (after this chunk)  4 bytes int     (32 for rest of header + wave data)
    bw.Write(System.Text.Encoding.ASCII.GetBytes("WAVEfmt "));  //WAVEfmt                       8 bytes chars
    bw.Write(16);                                               //Length of above fmt data      4 bytes int
    bw.Write((short)1);                                         //Fomrat 1 is PCM               2 bytes short
    bw.Write((short)numChannels);                               //Number of Channels            2 bytes short
    bw.Write((int)sampleRate);                                  //Sample Rate                   4 bytes int
    bw.Write((int)(sampleRate * bits / 8 * numChannels));       //                              4 bytes int
    bw.Write((short)(bits / 8 * numChannels));                  //                              2 bytes short
    bw.Write((short)bits);                                      //Bits per sample               2 bytes short
    bw.Write(System.Text.Encoding.ASCII.GetBytes("data"));      //data                          4 bytes chars
    bw.Write(dataLength);                                       //Size of data section          4 bytes int
}

1 Like

Hi,

You are correct, this is the solution, and thank you for sharing the code. Please do not hesitate to ask if there is anything else we can assist with!