Multiple 3D gameobjects with the same spatialised event

Hi! I’m trying to implement an adaptive music system for my game in which hundreds of GameObjects (flowers) emit synced spatialised events. My approach has been full of bugs: some events not playing, WASAPI output buffer starvations, desynchronizations… you name it. That’s why I’m looking for a smarter implementation (please forgive my ignorance, I’m quite new to FMOD).

There are 8 types of unique flowers and each one emits a certain looping sound which is synchronised with the rest (rhythm, harmonic context, etc.), and there’s approximately 40 copies of each flower type spread through a 3D space. I want them to be precisely synchronised (not only between the same kind of flower, but as a whole, since all the music assets have been rendered at the same tempo) and to avoid “adding” the mixes of the same type of flower. What I mean with this is that if there are two close flowers of the same type, they shouldn’t be making double the noise than if there was only one. In that case, they should be “dividing” that total flower type volume between the two spatial points.

My horrendous implementation consists in using Event Emitters for each individual flower GameObject, and referencing their corresponding event based on their flower type. Each flower type has its own mix bus in FMOD Studio in which a compressor is limiting at a high rate to create the illusion that the volumes are not stacking up (but at what cost…). As you may imagine, this not only degrades the quality of the sound but, more importantly, results in a crazy amount of commands sent to the FMOD system (and it can’t barely hold up).

I’ve been investigating about the Transceiver effect in the documentation, but I’m not quite sure if this is actually what I’m looking for (since, to my knowledge, this still requires having more event instances which will have their own processing demands, and it will not solve the stacking mix problem). Definitely, there must be a better way of doing this, perhaps by having only one event instance playing for each flower type and then having a way of controlling the volume of that one event to distribute spatially it to all near flowers of their respective flower type.

For now, I’ll try experimenting with the transceiver, and I would greatly appreciate any kind of help! There’s so much to learn with this software, great stuff. Thank you so much for your time :slight_smile:

There’s a few different parts to what you’re trying to do, so I’ll address them individually.

Keeping Things Synchronized

To perfectly synchronize multiple music assets, you must do two things.

First, set all the music assets to not stream. Because of the unique way in which they are loaded, streaming assets cannot be perfectly synchronized with sample accuracy. By setting your music assets to use either of the other two loading modes, you will make it possible to play them with perfect synchronicity.

Second, either load the assets’ sample data sufficiently ahead of when you start playing their event instances, so that they can start without needing to load sample data and so reliably begin at the exact moment when you schedule them to; or place all of the assets on different tracks of a single event and use pre-fader transceiver effects to send their signals to different event instances. Of these two methods, the former is more resource-efficient if you only plan to play one instance of each track, and the latter is more resource-efficient if you want to play a large number of event instances, as you do in this case.

This is much less expensive than you might be imagining. The resource cost of an event instance is dependent on the event’s content, and transceiver effects are much cheaper than instruments tend to be: The resource cost of a transceiver effect set to transmit is comparable to that of a send, the resource cost of a transceiver effect set to receiver is comparable to that of a gain effect, and the resource cost of a transceiver channel is comparable to that of a return bus. All of these things are cheap, so using transceiver effects would likely result in a substantial reduction in resource costs when compared to playing multiple instances of an event that contains instruments.

Preventing Multiple Simultaneously-playing Copies of an Asset Adding to Each Others’ Loudness

There are a number of ways to prevent multiple instances of the same signal from combining their amplitudes and thus producing a louder sound.

One such way is the one you have already tried:

This will work, and is cheaper than you might be imagining.

If the sounds are perfectly in sync, you won’t suffer phasing issues, which is the biggest source of sound quality degradation in this case.

FMOD Studio’s built-in virtualization system will automatically virtualize channels that aren’t currently audible, so in practice only a small number of your large event instance count will actually play and consume resources at any given time*.

As for the eight compressor effects on eight group buses, the compressor effect is classified as a “medium overhead” effect - and unless your game’s audio resource budget is painfully tight even by mobile phone game standards, eight medium overhead effect instances are unlikely to break the bank.

Still, if the default methods of keeping quality high and resource costs down don’t suit your game’s needs, there are alternatives.

As you have intuited, playing only one instance of each event is an effective way to both save resources and simplify synchronization. Unfortunately, FMOD Studio doesn’t provide any easy built-in tools for spatializing one event instance to multiple different positions without using multiple different event instances, so you’ll need to delve into your game’s code if you want to make it work - but there is at least some precedent for something similar that you could use as a reference.

In multiplayer splitscreen games, it’s common for different players to be sharing the same set of speakers, yet for their characters to be in different locations in the game world. To handle this, the FMOD Engine has a way to handle multiple listeners. You can read about it in detail here, but the short version is that when there are multiple listeners in different locations, spatialized events are attenuated based on their distance to the closest listener, but panned according to the average position of all listeners within the event’s max distance, weighted by their closeness to the event.

To create a similar effect for your use-case, your game’s code would need to keep track of the locations of each flower in your game world and the location of the listener, and use these to calculate both the distance between the listener and the closest flower of each type and the weighted average position of all the flowers of each type within the event’s max distance. This would allow you to have only a single event instance playing for each flower type, as you could continually move these event instances around to positions matching the distance and the direction of the weighted average position calculated by your code. The effect would be that each flower is spatialized to where it appears to be when there is only one instance of that flower type nearby, and to a position somewhere between all the visible flowers of that type if there’s more than one, while always being as loud as you’d expect based on the distance to the nearest flower of that type.

If you only care about the distance-based attenuation component of spatialization and don’t actually require panning, there’s a much simpler method that uses snapshots:

  1. Set each of the eight flower-specific group buses’ volume faders to -oo dB.
  2. Create one snapshot for each of your eight unique flower types.
  3. Scope the volume fader of each flower’s group bus into the corresponding snapshot, and set the volume of that bus in each snapshot to 0 dB.
  4. Add a distance parameter to your project’s parameters browser (if you’re using FMOD Studio version 2.03.00 or later) or preset parameter browser (if you’re using an earlier version).
  5. In each snapshot, automate the snapshot’s intensity on the new distance parameter such that intensity is 100% at distance 0 and 0% at max distance.
  6. In your game, create one instance of each of the flower events, and have them playing at all times. Their exact locations don’t matter, as they don’t have any spatialization.
  7. Also in your game, create 40 instances of each of your snapshots. Position one of these instances at each of the flowers of the corresponding species.

This method works by taking advantage of the way multiple active instances of the same snapshot are blended weighted by intensity to simulate distance attenuation. Its biggest disadvantage is, as I mentioned, that it only accounts for the attenuation due to distance from the listener component of spatialization, and not the panning based on direction from the listener to the instance component.


*The voice virtualization system can, under some circumstances, have the side effect of causing individual event instances’ assets to restart playing from the beginning, potentially causing those assets to be out of sync with other sounds. There are two possible ways to prevent this.

The first is to set your music assets to not stream, and to use synchronous instruments for your music assets rather than asynchronous instruments. This allows assets to resume playing in sync as if they had never stopped playing in the first place.

The second way is to place the instrument that plays an asset in one event, but use transceiver effects to send the signal to multiple other event instances, as described above. This bypasses the need for virtualization entirely, as transceiver effects are (under-the-hood) a type of routing, and so do not require any additional channels that might be virtualized.

Thank you so much, Joseph! That was incredibly insightful, I really appreciate it.

I’ve set the music assets to not stream and used transceiver effects as you described, and the desynchronization problems have been solved!

However, I’m still running into mixerThread Starvations that stutter the game every couple minutes, no matter if I disable the flower-type bus compressors or virtualize the hundreds of receiver events I have scattered around the 3D environment. If it’s of any use, here are the warnings logs I get in Play mode.

I still haven’t tried the code approach you mentioned, but I have some doubts about it before trying to implement it (since I’m a self-taught programmer, and I know this will take quite some time for me to figure out). I didn’t mention it in the original post, but as you thought of, panning is a must for my use-case, since players must be able to track the flower they’re looking for using a first-person controller (so the snapshot approach won’t do it for me). I was wondering how different is this method to using a spatialized event for each flower when it comes to panning correctly responding to the player’s direction, since the panned signal will come from the weighted average point only, and not from all surrounding flowers. I figure this could potentially lead to confusion, since only this instance is reacting to the direction of the player for the panning calculations (and so, if you were close to two flowers of the same type in somewhat opposite directions and a similar distance between them and yourself, you wouldn’t be able to tell where to go based only on your direction).

Again, thank you so much for your time, this has been very helpful.

“OutputWASAPI::mixerThread : Starvation detected in WASAPI output buffer” means that the combined CPU cost of processing all your event instances is too high.

This may be because you have too many event instances playing, which can happen if some event instances aren’t being stopped or released when they should be. Try recording a profiler session, then look at the lifespans view to see whether there’s any event instances that are living longer than they should.

It could also happen if your event instances contain expensive effects. If there are effects other than spatializers on the tracks of any of your instanced events, try moving them to buses in the mixer instead.

It is true that the code-based spatialization method I described would make it difficult for a player to tell what direction a spatialized sound is coming from when that sound is coming from multiple similarly-distant locations.

However, I should point out that it will always be difficult for a player to tell what direction a sound is coming from when it is coming from multiple similarly-distant locations. Playing two perfectly-synchronized instances of the same event in different locations will have a similar result: The sound will be distributed widely over several of the player’s speakers, making it hard for them to pick what direction it’s coming from. Getting closer to one of the places will help - but it’d help with the code-based solution, as well.

Okay, I see what you mean. Thank you for clarifying it!

I’ve tried running the profiler, both using virtualization (with 10 max instances per flower type) and with no virtualization, here are some notes on both:

When it comes to the CPU, its usage was quite consistent all thorough, with no peaks whatsoever. What surprised me is that at the point the Starvation Warning is displayed and the game stutters, there isn’t a CPU spike (in fact, there’s a very small CPU usage reduction, as you can see on the first image of this Imgur post) and the FMOD session runs smoothly without the effect it produced in the Unity Editor.

Some average values, to provide some context:

  • CPU: Mixer and CPU: Update (same values for Self and Total):
    • No virtualization: 10.57% and 11.47%
    • Virtualization: 8.63% and 10.88%
  • Memory: Data (Global):
    • No virtualization: 34.889%
    • Virtualization: 34.795%
  • Instances: Active, Instances: Playing and Instances: All (same values for Self and Total)
    • No virtualization: 297, 297, 297
    • Virtualization: 94, 297, 297
  • Voices: Active and Voices: All (same values for Self and Total)
    • No virtualization: 12, 18
    • Virtualization: 12, 17

I’ve investigated a bit, and this CPU usage isn’t something too out of the bounds, taking into consideration the needs of my game. I’m not sure what’s causing the Starvation issue, could it be related to the way Unity allocates resources for FMOD?

Besides this, during this test I have now noticed that when the end of the Loop Region of the Send Transceiver Event is reached, a small pop/duplicated sample occurs. Looking at the profiler, I noticed that at that moment the levels of some events drop to -infinity for a very short moment, and that in some waveforms you can see how a very small fragment of the beginning of the Loop Region is played twice (images 2 and 3 of the Imgur post). You can also see that around the loop point, the Send Transceiver Event suddenly doubles its voices (from 9 to 18) for 53 milliseconds (image 4).

This is a bit weird, could it also be related to sheer amount of Receive Transceiver Events in use? For reference, I’m also using Loop Regions (of the same length and position) on these Events, if that’s something relevant. I can provide files/logs if that’s of any use.

Thanks again for your help!

If the FMOD performance seems reasonable then it is also possible that the machine is running out of resources. (Eg. CPU spikes from the game or other applications.) If you are able to use the Unity profiler (or another machine wide profiler) that may indicate if that could be an issue.

I’m not able to reproduce this behavior so far, could you add a transition timeline to the loop region with a small crossfade to help this?

If you are able to share a profiler capture, packaged with banks (and any other data you would be happy sharing), and upload it to your profile I would be happy to take a look.