Audio artifacts when playing audio generated via code

Hello.

I’m working on recording voice from a microphone for further transmission over the network and playback. For this I use lock/unlock.

In the script recording the voice I use everything from the Start method from the example on recording the voice. The only thing I change here is the sample rate and set it to 16000 instead of the native one.

In this same example, the Update method uses a script to play the audio and compensate for the latency. I noticed that if you don’t play the audio with a delay, then the audio may have artifacts, but I don’t need to play this audio, only extract the PCM data. So the question is:

  1. if I need this audio only for extracting samples, do I need to wait for the delay to get rid of the artifacts;
  2. do i need use a latency compensator that speeds up/slows down the playback rate (as far as I understand no because it works exclusively for the channel, which is only needed for correct listening the sound, but I don’t need to listen to this particular sound).
Play Sound with Latency
if (!channel.hasHandle() && samplesRecorded >= adjustLatency)
{
        FMODUnity.RuntimeManager.CoreSystem.getMasterChannelGroup(out FMOD.ChannelGroup mCG);
        FMODUnity.RuntimeManager.CoreSystem.playSound(recSound, mCG, false, out channel);
}
Latency Compensator
 /*
Determine how much has been played since we last checked.
*/
if (channel.hasHandle())
{
        uint playPos = 0;
        channel.getPosition(out playPos, FMOD.TIMEUNIT.PCM);

        uint playDelta  = (playPos >= lastPlayPos) ? (playPos - lastPlayPos) : (playPos +                 recSoundLength - lastPlayPos);
        lastPlayPos     = playPos;
        samplesPlayed   += playDelta;

        // Compensate for any drift.
        int latency     = (int)(samplesRecorded - samplesPlayed);
        actualLatency   = (int)((0.97f * actualLatency) + (0.03f * latency));

        int playbackRate = nativeRate;
        if (actualLatency < (int)(adjustLatency - driftThreshold))
        {
                // Playback position is catching up to the record position, slow playback down by 2%
                playbackRate = nativeRate - (nativeRate / 50);
        }

        else if (actualLatency > (int)(adjustLatency + driftThreshold))
        {
                // Playback is falling behind the record position, speed playback up by 2%
                playbackRate = nativeRate + (nativeRate / 50);
        }

        channel.setFrequency((float)playbackRate);
}

After that I transfer samples to another computer via network (assuming that I did it perfectly and everything arrived safe and sound because I tested it without network transfer, just transferring samples locally), and on this new computer I need to create a sound and throw the transferred samples there. I did it successfully, again using lock/unlock sound.

I need to play the sound through the programmer. Everything works fine, but sometimes I get the same artifacts as in the first case, when I did not use a latency before playback. The question is: 3. where should I use this latency before play and should I do latency compensation as in the first example, but where the programmer is used? If latency compensation is needed, then which channel should I consider for this?

Below I have provided my script, which I use to play restored sound using the programmer. In the Initialize method I pass the restored sound itself and the EventReference of the programmer. Then I wait until eventEmitter is initialized (the Start method inside EventEmitter) and start the sound.

Restored sound and programmer
public class FMODProgrammerTest : MonoBehaviour
{
    [Header("Programmer Sound")]
    [SerializeField] private StudioEventEmitter _eventEmitter;
    public StudioEventEmitter EventEmitter => _eventEmitter;

    private bool _is3D;
    private Transform _soundSourceTransform;
    private SoundWrapper _soundWrapper;
    private EVENT_CALLBACK _programmerSoundEventCallback;

    public EventInstance EventInstance => _eventEmitter.EventInstance;

    public void Initialize(Sound sound, EventReference reference, Transform soundSourceTransform = null)
    {
        _soundWrapper = new SoundWrapper(sound);
        _is3D = soundSourceTransform != null;
        _soundSourceTransform = soundSourceTransform;
        _eventEmitter.Preload = true;
        _eventEmitter.EventReference = reference;
    }

    [ContextMenu("Play Sound")]
    public void PlaySound()
    {
        if (EventInstance.isValid() == false)
        {
            _eventEmitter.CreateInstance(_is3D);
        }

        _programmerSoundEventCallback = new EVENT_CALLBACK(ProgrammerSoundEventCallback);

        if (_soundSourceTransform != null)
        {
            RuntimeManager.AttachInstanceToGameObject(EventInstance, _soundSourceTransform.gameObject);
        }

        GCHandle soundHolderHandle = GCHandle.Alloc(_soundWrapper);
        EventInstance.setUserData(GCHandle.ToIntPtr(soundHolderHandle));
        EventInstance.setCallback(_programmerSoundEventCallback);
        EventInstance.start();
        EventInstance.release();
    }

    [ContextMenu("Stop Play Sound")]
    public void StopPlaySound()
    {
        EventInstance.clearHandle();
    }

    [AOT.MonoPInvokeCallback(typeof(EVENT_CALLBACK))]
    private static RESULT ProgrammerSoundEventCallback(EVENT_CALLBACK_TYPE type, IntPtr instancePtr, IntPtr parameterPtr)
    {
        EventInstance instance = new EventInstance(instancePtr);

        instance.getUserData(out IntPtr stringPtr);

        GCHandle stringHandle = GCHandle.FromIntPtr(stringPtr);
        SoundWrapper soundWrapper = stringHandle.Target as SoundWrapper;
        switch (type)
        {
            case EVENT_CALLBACK_TYPE.CREATE_PROGRAMMER_SOUND:
                {
                    PROGRAMMER_SOUND_PROPERTIES parameter = (PROGRAMMER_SOUND_PROPERTIES)Marshal.PtrToStructure(parameterPtr, typeof(PROGRAMMER_SOUND_PROPERTIES));
                    parameter.sound = soundWrapper.Sound.handle;

                    Marshal.StructureToPtr(parameter, parameterPtr, false);
                    break;
                }
            case EVENT_CALLBACK_TYPE.DESTROY_PROGRAMMER_SOUND:
                {
                    PROGRAMMER_SOUND_PROPERTIES parameter = (PROGRAMMER_SOUND_PROPERTIES)Marshal.PtrToStructure(parameterPtr, typeof(PROGRAMMER_SOUND_PROPERTIES));
                    Sound sound = new Sound(parameter.sound);
                    sound.release();

                    break;
                }
            case EVENT_CALLBACK_TYPE.DESTROYED:
                {
                    stringHandle.Free();
                    break;
                }
        }

        return RESULT.OK;
    }
}

My last question: 4. I tried to run the restored sound without a programmer and the sound that is created before sending it to restore (in both cases I wait for a latency before starting to play) with the CreateSound method in the script below, using something that will not cause the game to freeze when pressed (a button binding or an interface), then the sound plays at the same time as the first sound. And if I run this sound through the ContextMenu (this causes a freeze when used, because the Unity editor lags), then the sound plays with a slight delay from the original, is there any way I can track this lag from the sound to prevent it?

Restoring sound and play
public class FMODVoiceFiller : MonoBehaviour, IFMODSoundHolder
{
    private Sound _sound;
    public Sound Sound => _sound;

    private int _samplesCaptured = 0;
    private uint _bytesCaptured = 0;

    private Channel _channel;

    public static int SampleRate = 16000;
    public static int LatencyMs = 75;
    public static int DesiredLatency = SampleRate * LatencyMs / 1000;
    public static int NumChannels = 1;

    [ContextMenu("Create Sound")]
    public void CreateSound()
    {
        if (_channel.hasHandle() == false)
        {
            CREATESOUNDEXINFO exInfo = CreateSoundInfo();
            RuntimeManager.CoreSystem.createSound($"Realtime Sound", MODE.LOOP_NORMAL | MODE.OPENUSER, ref exInfo, out _sound);
        }
    }

    public void DestroySound()
    {
        if (_channel.hasHandle())
        {
            _bytesCaptured = 0;
            _samplesCaptured = 0;
            _channel.clearHandle();
            _sound.release();
        }
    }

    public void FillFrame(byte[] samplesFrameBytes)
    {
        ApplyRawData(_bytesCaptured, samplesFrameBytes);
        _bytesCaptured = (_bytesCaptured + (uint)samplesFrameBytes.Length) % 32000;

        _samplesCaptured += samplesFrameBytes.Length / sizeof(short) * NumChannels;
        if (_channel.hasHandle() == false && _samplesCaptured >= DesiredLatency)
        {
            RuntimeManager.CoreSystem.getMasterChannelGroup(out ChannelGroup channelGroup);
            RuntimeManager.CoreSystem.playSound(_sound, channelGroup, false, out _channel);
        }
    }

    private void ApplyRawData(uint startOffsetBytes, byte[] samples)
    {
        uint lenBytes = (uint)samples.Length;
        var result = _sound.@lock(startOffsetBytes, lenBytes, out IntPtr ptr1, out IntPtr ptr2, out uint l1, out uint l2);
        if (result != RESULT.OK)
            throw new InvalidOperationException($"FMOD lock failed: {result} | startOffsetBytes: {startOffsetBytes} | lenBytes: {lenBytes}");

        try
        {
            if (l1 > 0) Marshal.Copy(samples, 0, ptr1, (int)l1);
            if (l2 > 0) Marshal.Copy(samples, (int)l1, ptr2, (int)l2);
        }
        finally
        {
            _sound.unlock(ptr1, ptr2, l1, l2);
        }
    }

    private CREATESOUNDEXINFO CreateSoundInfo()
    {
        CREATESOUNDEXINFO exInfo = new CREATESOUNDEXINFO();
        exInfo.cbsize = Marshal.SizeOf(typeof(CREATESOUNDEXINFO));
        exInfo.numchannels = 1;
        exInfo.format = SOUND_FORMAT.PCM16;
        exInfo.defaultfrequency = SampleRate;
        exInfo.length = (uint)(SampleRate * sizeof(short) * NumChannels);
        return exInfo;
    }
}

Thanks.