Audio artifacts when playing audio generated via code

Hello.

I’m working on recording voice from a microphone for further transmission over the network and playback. For this I use lock/unlock.

In the script recording the voice I use everything from the Start method from the example on recording the voice. The only thing I change here is the sample rate and set it to 16000 instead of the native one.

In this same example, the Update method uses a script to play the audio and compensate for the latency. I noticed that if you don’t play the audio with a delay, then the audio may have artifacts, but I don’t need to play this audio, only extract the PCM data. So the question is:

  1. if I need this audio only for extracting samples, do I need to wait for the delay to get rid of the artifacts;
  2. do i need use a latency compensator that speeds up/slows down the playback rate (as far as I understand no because it works exclusively for the channel, which is only needed for correct listening the sound, but I don’t need to listen to this particular sound).
Play Sound with Latency
if (!channel.hasHandle() && samplesRecorded >= adjustLatency)
{
        FMODUnity.RuntimeManager.CoreSystem.getMasterChannelGroup(out FMOD.ChannelGroup mCG);
        FMODUnity.RuntimeManager.CoreSystem.playSound(recSound, mCG, false, out channel);
}
Latency Compensator
 /*
Determine how much has been played since we last checked.
*/
if (channel.hasHandle())
{
        uint playPos = 0;
        channel.getPosition(out playPos, FMOD.TIMEUNIT.PCM);

        uint playDelta  = (playPos >= lastPlayPos) ? (playPos - lastPlayPos) : (playPos +                 recSoundLength - lastPlayPos);
        lastPlayPos     = playPos;
        samplesPlayed   += playDelta;

        // Compensate for any drift.
        int latency     = (int)(samplesRecorded - samplesPlayed);
        actualLatency   = (int)((0.97f * actualLatency) + (0.03f * latency));

        int playbackRate = nativeRate;
        if (actualLatency < (int)(adjustLatency - driftThreshold))
        {
                // Playback position is catching up to the record position, slow playback down by 2%
                playbackRate = nativeRate - (nativeRate / 50);
        }

        else if (actualLatency > (int)(adjustLatency + driftThreshold))
        {
                // Playback is falling behind the record position, speed playback up by 2%
                playbackRate = nativeRate + (nativeRate / 50);
        }

        channel.setFrequency((float)playbackRate);
}

After that I transfer samples to another computer via network (assuming that I did it perfectly and everything arrived safe and sound because I tested it without network transfer, just transferring samples locally), and on this new computer I need to create a sound and throw the transferred samples there. I did it successfully, again using lock/unlock sound.

I need to play the sound through the programmer. Everything works fine, but sometimes I get the same artifacts as in the first case, when I did not use a latency before playback. The question is: 3. where should I use this latency before play and should I do latency compensation as in the first example, but where the programmer is used? If latency compensation is needed, then which channel should I consider for this?

Below I have provided my script, which I use to play restored sound using the programmer. In the Initialize method I pass the restored sound itself and the EventReference of the programmer. Then I wait until eventEmitter is initialized (the Start method inside EventEmitter) and start the sound.

Restored sound and programmer
public class FMODProgrammerTest : MonoBehaviour
{
    [Header("Programmer Sound")]
    [SerializeField] private StudioEventEmitter _eventEmitter;
    public StudioEventEmitter EventEmitter => _eventEmitter;

    private bool _is3D;
    private Transform _soundSourceTransform;
    private SoundWrapper _soundWrapper;
    private EVENT_CALLBACK _programmerSoundEventCallback;

    public EventInstance EventInstance => _eventEmitter.EventInstance;

    public void Initialize(Sound sound, EventReference reference, Transform soundSourceTransform = null)
    {
        _soundWrapper = new SoundWrapper(sound);
        _is3D = soundSourceTransform != null;
        _soundSourceTransform = soundSourceTransform;
        _eventEmitter.Preload = true;
        _eventEmitter.EventReference = reference;
    }

    [ContextMenu("Play Sound")]
    public void PlaySound()
    {
        if (EventInstance.isValid() == false)
        {
            _eventEmitter.CreateInstance(_is3D);
        }

        _programmerSoundEventCallback = new EVENT_CALLBACK(ProgrammerSoundEventCallback);

        if (_soundSourceTransform != null)
        {
            RuntimeManager.AttachInstanceToGameObject(EventInstance, _soundSourceTransform.gameObject);
        }

        GCHandle soundHolderHandle = GCHandle.Alloc(_soundWrapper);
        EventInstance.setUserData(GCHandle.ToIntPtr(soundHolderHandle));
        EventInstance.setCallback(_programmerSoundEventCallback);
        EventInstance.start();
        EventInstance.release();
    }

    [ContextMenu("Stop Play Sound")]
    public void StopPlaySound()
    {
        EventInstance.clearHandle();
    }

    [AOT.MonoPInvokeCallback(typeof(EVENT_CALLBACK))]
    private static RESULT ProgrammerSoundEventCallback(EVENT_CALLBACK_TYPE type, IntPtr instancePtr, IntPtr parameterPtr)
    {
        EventInstance instance = new EventInstance(instancePtr);

        instance.getUserData(out IntPtr stringPtr);

        GCHandle stringHandle = GCHandle.FromIntPtr(stringPtr);
        SoundWrapper soundWrapper = stringHandle.Target as SoundWrapper;
        switch (type)
        {
            case EVENT_CALLBACK_TYPE.CREATE_PROGRAMMER_SOUND:
                {
                    PROGRAMMER_SOUND_PROPERTIES parameter = (PROGRAMMER_SOUND_PROPERTIES)Marshal.PtrToStructure(parameterPtr, typeof(PROGRAMMER_SOUND_PROPERTIES));
                    parameter.sound = soundWrapper.Sound.handle;

                    Marshal.StructureToPtr(parameter, parameterPtr, false);
                    break;
                }
            case EVENT_CALLBACK_TYPE.DESTROY_PROGRAMMER_SOUND:
                {
                    PROGRAMMER_SOUND_PROPERTIES parameter = (PROGRAMMER_SOUND_PROPERTIES)Marshal.PtrToStructure(parameterPtr, typeof(PROGRAMMER_SOUND_PROPERTIES));
                    Sound sound = new Sound(parameter.sound);
                    sound.release();

                    break;
                }
            case EVENT_CALLBACK_TYPE.DESTROYED:
                {
                    stringHandle.Free();
                    break;
                }
        }

        return RESULT.OK;
    }
}

My last question: 4. I tried to run the restored sound without a programmer and the sound that is created before sending it to restore (in both cases I wait for a latency before starting to play) with the CreateSound method in the script below, using something that will not cause the game to freeze when pressed (a button binding or an interface), then the sound plays at the same time as the first sound. And if I run this sound through the ContextMenu (this causes a freeze when used, because the Unity editor lags), then the sound plays with a slight delay from the original, is there any way I can track this lag from the sound to prevent it?

Restoring sound and play
public class FMODVoiceFiller : MonoBehaviour, IFMODSoundHolder
{
    private Sound _sound;
    public Sound Sound => _sound;

    private int _samplesCaptured = 0;
    private uint _bytesCaptured = 0;

    private Channel _channel;

    public static int SampleRate = 16000;
    public static int LatencyMs = 75;
    public static int DesiredLatency = SampleRate * LatencyMs / 1000;
    public static int NumChannels = 1;

    [ContextMenu("Create Sound")]
    public void CreateSound()
    {
        if (_channel.hasHandle() == false)
        {
            CREATESOUNDEXINFO exInfo = CreateSoundInfo();
            RuntimeManager.CoreSystem.createSound($"Realtime Sound", MODE.LOOP_NORMAL | MODE.OPENUSER, ref exInfo, out _sound);
        }
    }

    public void DestroySound()
    {
        if (_channel.hasHandle())
        {
            _bytesCaptured = 0;
            _samplesCaptured = 0;
            _channel.clearHandle();
            _sound.release();
        }
    }

    public void FillFrame(byte[] samplesFrameBytes)
    {
        ApplyRawData(_bytesCaptured, samplesFrameBytes);
        _bytesCaptured = (_bytesCaptured + (uint)samplesFrameBytes.Length) % 32000;

        _samplesCaptured += samplesFrameBytes.Length / sizeof(short) * NumChannels;
        if (_channel.hasHandle() == false && _samplesCaptured >= DesiredLatency)
        {
            RuntimeManager.CoreSystem.getMasterChannelGroup(out ChannelGroup channelGroup);
            RuntimeManager.CoreSystem.playSound(_sound, channelGroup, false, out _channel);
        }
    }

    private void ApplyRawData(uint startOffsetBytes, byte[] samples)
    {
        uint lenBytes = (uint)samples.Length;
        var result = _sound.@lock(startOffsetBytes, lenBytes, out IntPtr ptr1, out IntPtr ptr2, out uint l1, out uint l2);
        if (result != RESULT.OK)
            throw new InvalidOperationException($"FMOD lock failed: {result} | startOffsetBytes: {startOffsetBytes} | lenBytes: {lenBytes}");

        try
        {
            if (l1 > 0) Marshal.Copy(samples, 0, ptr1, (int)l1);
            if (l2 > 0) Marshal.Copy(samples, (int)l1, ptr2, (int)l2);
        }
        finally
        {
            _sound.unlock(ptr1, ptr2, l1, l2);
        }
    }

    private CREATESOUNDEXINFO CreateSoundInfo()
    {
        CREATESOUNDEXINFO exInfo = new CREATESOUNDEXINFO();
        exInfo.cbsize = Marshal.SizeOf(typeof(CREATESOUNDEXINFO));
        exInfo.numchannels = 1;
        exInfo.format = SOUND_FORMAT.PCM16;
        exInfo.defaultfrequency = SampleRate;
        exInfo.length = (uint)(SampleRate * sizeof(short) * NumChannels);
        return exInfo;
    }
}

Thanks.

Hi,

Thank you for the scripts and all the information.

No, delay and latency compensation are only needed when playing back the audio.

No, since it only affects playback timing.

We also have a scripting example outlining playing back Unity video audio which is similar in concept here: Unity Integration | Scripting Examples - Video Playback, this also uses the lock/unlock function. This example also implements drift latency.

I’m sorry I don’t quite follow the issue here.

Would it be possible to create a stripped out version of the project for me to test? I can be uploaded to your profile.

Thank you again for all the information

Hello. I uploaded a project with my version of the lock/unlock code and reworked it as in the example.

I did this in my version of lock/unlock where I fill the audio data without a buffer, and everything works fine if I send the data in small fixed packets (320 samples). But if I send the data as it comes in (640, 1280, 640, 0 - depends on frame rate), then I start hearing artifacts (can be heard well at low FPS, sometimes appeared at high FPS).

I did it as shown in the video playback example where I use a buffer, everything works fine in both cases (with and without fixed data packets and with low fps). But in this case I get a huge delay (about 1 second). I don’t understand what I’m doing wrong.

In my 4th question I mean that if a freeze occurs when starting playback (to debug it can use method with ContextMenu attribute - it will cause the game freeze), then the playback delay will increase - this is solved by compensating for the delay, you just need to wait a while.

Thanks.

1 Like

Thank you for the project, it greatly helped in testing!

Some things I tested:

  • Artifacts
  • Latency
    • Reducing the size of the buffer created in the exinfo
      exInfo.length = SampleRate * PCMMultiplier;, I tested halving this value.

Another option maybe using a FMOD_SOUND_PCMREAD_CALLBACK (FMOD Engine | Core API Reference - FMOD_SOUND_PCMREAD_CALLBACK) which would further decouple the recording from the frame rates and more closely attach it to the FMOD system.

Thank you for the explanation.

Let me know if that makes a difference!

Hello.
I reduced the sound size and the delay also decreased. Now the delay suits me.
If I want to implement PCM_READ_CALLBACK, then does it make sense to make a latency compensation for it and wait for several samples before starting, because this callback itself requests the number of samples it needs and there can be no discrepancies.
Thanks.

1 Like