Can I get you to elaborate on your exact setup here? How are you extracting the audio from (presumably) your FMOD System? I recall from a previous post of yours that you were using sound.lock and sound.unlock - are you calling these in MonoBehaviour.Update(),  MonoBehaviour.FixedUpdate(), or somewhere else?
What format is the recorded audio in when it’s (presumably) retrieved from your FMOD System? i.e. sample rate, bit depth, number of channels, etc.? If you’re not using lossy compression, I would recommend using it instead of lossless. Also, if the microphone audio is attenuated in game (e.g. by spatialization, or some other means), I would recommend optimizing it out of your network stream if you’re not doing so already. If it’s not spatialized, you may be able to sum the individual audio streams.
Is this an issue with FMOD specifically, or an issue with the 3rd part library you’re using to compress the recorded audio?