Hi! I’m trying to implement an adaptive music system for my game in which hundreds of GameObjects (flowers) emit synced spatialised events. My approach has been full of bugs: some events not playing, WASAPI output buffer starvations, desynchronizations… you name it. That’s why I’m looking for a smarter implementation (please forgive my ignorance, I’m quite new to FMOD).
There are 8 types of unique flowers and each one emits a certain looping sound which is synchronised with the rest (rhythm, harmonic context, etc.), and there’s approximately 40 copies of each flower type spread through a 3D space. I want them to be precisely synchronised (not only between the same kind of flower, but as a whole, since all the music assets have been rendered at the same tempo) and to avoid “adding” the mixes of the same type of flower. What I mean with this is that if there are two close flowers of the same type, they shouldn’t be making double the noise than if there was only one. In that case, they should be “dividing” that total flower type volume between the two spatial points.
My horrendous implementation consists in using Event Emitters for each individual flower GameObject, and referencing their corresponding event based on their flower type. Each flower type has its own mix bus in FMOD Studio in which a compressor is limiting at a high rate to create the illusion that the volumes are not stacking up (but at what cost…). As you may imagine, this not only degrades the quality of the sound but, more importantly, results in a crazy amount of commands sent to the FMOD system (and it can’t barely hold up).
I’ve been investigating about the Transceiver effect in the documentation, but I’m not quite sure if this is actually what I’m looking for (since, to my knowledge, this still requires having more event instances which will have their own processing demands, and it will not solve the stacking mix problem). Definitely, there must be a better way of doing this, perhaps by having only one event instance playing for each flower type and then having a way of controlling the volume of that one event to distribute spatially it to all near flowers of their respective flower type.
For now, I’ll try experimenting with the transceiver, and I would greatly appreciate any kind of help! There’s so much to learn with this software, great stuff. Thank you so much for your time
There’s a few different parts to what you’re trying to do, so I’ll address them individually.
Keeping Things Synchronized
To perfectly synchronize multiple music assets, you must do two things.
First, set all the music assets to not stream. Because of the unique way in which they are loaded, streaming assets cannot be perfectly synchronized with sample accuracy. By setting your music assets to use either of the other two loading modes, you will make it possible to play them with perfect synchronicity.
Second, either load the assets’ sample data sufficiently ahead of when you start playing their event instances, so that they can start without needing to load sample data and so reliably begin at the exact moment when you schedule them to; or place all of the assets on different tracks of a single event and use pre-fader transceiver effects to send their signals to different event instances. Of these two methods, the former is more resource-efficient if you only plan to play one instance of each track, and the latter is more resource-efficient if you want to play a large number of event instances, as you do in this case.
This is much less expensive than you might be imagining. The resource cost of an event instance is dependent on the event’s content, and transceiver effects are much cheaper than instruments tend to be: The resource cost of a transceiver effect set to transmit is comparable to that of a send, the resource cost of a transceiver effect set to receiver is comparable to that of a gain effect, and the resource cost of a transceiver channel is comparable to that of a return bus. All of these things are cheap, so using transceiver effects would likely result in a substantial reduction in resource costs when compared to playing multiple instances of an event that contains instruments.
Preventing Multiple Simultaneously-playing Copies of an Asset Adding to Each Others’ Loudness
There are a number of ways to prevent multiple instances of the same signal from combining their amplitudes and thus producing a louder sound.
One such way is the one you have already tried:
This will work, and is cheaper than you might be imagining.
If the sounds are perfectly in sync, you won’t suffer phasing issues, which is the biggest source of sound quality degradation in this case.
FMOD Studio’s built-in virtualization system will automatically virtualize channels that aren’t currently audible, so in practice only a small number of your large event instance count will actually play and consume resources at any given time*.
As for the eight compressor effects on eight group buses, the compressor effect is classified as a “medium overhead” effect - and unless your game’s audio resource budget is painfully tight even by mobile phone game standards, eight medium overhead effect instances are unlikely to break the bank.
Still, if the default methods of keeping quality high and resource costs down don’t suit your game’s needs, there are alternatives.
As you have intuited, playing only one instance of each event is an effective way to both save resources and simplify synchronization. Unfortunately, FMOD Studio doesn’t provide any easy built-in tools for spatializing one event instance to multiple different positions without using multiple different event instances, so you’ll need to delve into your game’s code if you want to make it work - but there is at least some precedent for something similar that you could use as a reference.
In multiplayer splitscreen games, it’s common for different players to be sharing the same set of speakers, yet for their characters to be in different locations in the game world. To handle this, the FMOD Engine has a way to handle multiple listeners. You can read about it in detail here, but the short version is that when there are multiple listeners in different locations, spatialized events are attenuated based on their distance to the closest listener, but panned according to the average position of all listeners within the event’s max distance, weighted by their closeness to the event.
To create a similar effect for your use-case, your game’s code would need to keep track of the locations of each flower in your game world and the location of the listener, and use these to calculate both the distance between the listener and the closest flower of each type and the weighted average position of all the flowers of each type within the event’s max distance. This would allow you to have only a single event instance playing for each flower type, as you could continually move these event instances around to positions matching the distance and the direction of the weighted average position calculated by your code. The effect would be that each flower is spatialized to where it appears to be when there is only one instance of that flower type nearby, and to a position somewhere between all the visible flowers of that type if there’s more than one, while always being as loud as you’d expect based on the distance to the nearest flower of that type.
If you only care about the distance-based attenuation component of spatialization and don’t actually require panning, there’s a much simpler method that uses snapshots:
Set each of the eight flower-specific group buses’ volume faders to -oo dB.
Create one snapshot for each of your eight unique flower types.
Scope the volume fader of each flower’s group bus into the corresponding snapshot, and set the volume of that bus in each snapshot to 0 dB.
Add a distance parameter to your project’s parameters browser (if you’re using FMOD Studio version 2.03.00 or later) or preset parameter browser (if you’re using an earlier version).
In each snapshot, automate the snapshot’s intensity on the new distance parameter such that intensity is 100% at distance 0 and 0% at max distance.
In your game, create one instance of each of the flower events, and have them playing at all times. Their exact locations don’t matter, as they don’t have any spatialization.
Also in your game, create 40 instances of each of your snapshots. Position one of these instances at each of the flowers of the corresponding species.
This method works by taking advantage of the way multiple active instances of the same snapshot are blended weighted by intensity to simulate distance attenuation. Its biggest disadvantage is, as I mentioned, that it only accounts for the attenuation due to distance from the listener component of spatialization, and not the panning based on direction from the listener to the instance component.
*The voice virtualization system can, under some circumstances, have the side effect of causing individual event instances’ assets to restart playing from the beginning, potentially causing those assets to be out of sync with other sounds. There are two possible ways to prevent this.
The first is to set your music assets to not stream, and to use synchronous instruments for your music assets rather than asynchronous instruments. This allows assets to resume playing in sync as if they had never stopped playing in the first place.
The second way is to place the instrument that plays an asset in one event, but use transceiver effects to send the signal to multiple other event instances, as described above. This bypasses the need for virtualization entirely, as transceiver effects are (under-the-hood) a type of routing, and so do not require any additional channels that might be virtualized.