How do I access the spread of sounds in Unity

So I’m trying to simulate river sounds in Unity 3D (version 2020.3.19). It’s an known issue with FMOD that it only supports point source.

The dumbest way is, of course, to get a number of sound emitters and line them up together, forming a line. That is too cumbersome, however.

After some research I decided to draw a spline and have the sound emitter follows the player when the player is within audible distance based on the closest point on the line to the current position of the player.

So far it works, kinda. It has one big issue though.

When the river, or the spline takes a sharp turn geographically. Even just one pixel (I’m exaggerating for the sake of argument) can make the sound emitter to jump from one place on the spline to another, creating this abrupt change in the perceived location of the sound source.

For example:

The illustration above shows the top-down view of a 3D space. The blue line represents the river; the red dots represent the sound source respectively; and the green dots represent the current position of the player at different time.
From this illustration we can see the problem very easily. When I move from green dot 1 to green dot the sound source (the red dot) will jump all the way from the left position to the right. Theoratically, yes, the red dot on the right is indeed the closest point to the current player position (represented by the green dot on the right). The actual change was so little and yet the output changed so radically. In other words, there has to be a way to soften this transition.

Here is a solution that I came up with. See the illustration below:

The player now has a circle area of hearing. When the circle (represented with black line) intersects with the river (represented with blue line), it gives back the locations of point A and point B. Then it calculates the angle between APB (represented with letter ‘theta’). Then set the spread angle to that number. This means that I need to be able to access and modify the spread value of a sound within FMOD to do it dynamically.

This would hypothetically solve the problem I raised previously. See the illustration:

However, here comes another problem with this solution I gave. See the illustration below:

This is a more complicated situation, but very well possible in realistic environments. This little stream has a lot of twists and turns. Where should the sound emitter be when the player is literally surrounded by river streams? Theoratically, the player should be hearing sounds from all direction. However, under this situation it is not useful at all.

After doing some more research, I came across this video online in which two audio engineers from a huge studio (it was a studio from EA iirc, but I’m not sure) demonstrate a solution they had. See the illustration below:

They used raycast to implement this. Since I’m not very familiar with programming I’ll try my best to explain it.

The idea is that the river cuts the rays off at their intersections, creating multiple sections of river that should be audible for the player. Then spawn a sound emitter and attach it onto every intersection.

So here are my questions:

  1. How is my solution? Is there a better solution?

  2. How do I access the angle of spread in FMOD through codes in Unity in realtime?

  3. Looking at the solution the pros gave, how do I accomplish this through code? I imagine I would have to access the FMOD api, how do I do that?

Hi, and congratulation for all your research, which is very interesting. I certainly don’t have a response at all your questions, but I have a few comments though.
Concerning the spread of the sound based on the angle (3d image), you could automate the sound size based on that angle (it’s possible at least in my 2.02.05 version):
You can see in the 3d preview tab that it does change the spread as you expect:
image image
I wouldn’t go in the pan override tab.

However, it comes to my mind an other solution that might work: instead of calculating the “mid” green point, you also could diffuse a sound in point A and an other sound in point B (maybe the same asset with a different offset) ; so the mix of both sounds heard by the player should always reflect the correct spreading.

That being said, there’s a way more simple solution I quite like for river sounds, which is using transcievers. Create a transmitter event which plays a long loop sound (with no direct output), and a 3D receiver event. Put receivers instances every x meters on the river, tweak the sound size and other spatialization parameters. Configure the receiver event with a max instance setting and virtualisation stealing mode, for instance:
That’s an efficient way for the player to always hear a representative mix of the river points around him.

Hi Alcibiade, thank you for the feedbacks! Very helpful!

I’m very interested in the last solution you mentioned, using a transceiver event. Would you please explain it in more details? I tried to look for manuals but they seemed very vague and abstract. For example, what is stealing?

Also, I can’t seem to find where to create a transceiver event?

If I understood it correctly, basically I put multiple (x number of) 3D receivers along the course of the river. They play the same sound all at the same time. However, with a “max instance” of 5, only 5 of them will activate according to the players current position (I guess??). Then, the virtualization stealing mode means the volume will differ across the 5 receivers so that the closer ones are louder and the farther ones are quieter. When you move along the river, the previously loud receivers fades away and the new activated receivers become louder and louder.

Is this correct?

It sounds like you’ve conflated a few concepts. I think I can explain.

First: Virtualizing an event instance means that that event instance no longer produces audio output, but alkso consumes only a tiny fraction of the resources that a non-virtualized event needs. There are a few different things that can cause an event to become virtual.

Stealing is probably not relevant to what you’re trying to do, but is one of the things that can virtualize event instances.

Because FMOD has to process all events in real time, and every playing event instance consumes some resources, there is an upper limit to how many different event instances can play at the same time. If you try and start a new event instance when the maximum allowed number of playing instances has already been reached, the new voice must “steal” the place of one of the existing instances.

A commonly-used stealing mode is “virtualization.” This stealing mode virtualiuzes any event instance that gets stolen. This is useful because, unlike other stealing modes, a stolen event instance that’s virtualized can potentially be brought back if the number of playing voices drops below the maximum allowed number.

A transceiver event isn’t a special kind of event. It’s just an event that contains an FMOD Transceiver effect.

That’s more or less correct.

By default, event instances are automatically virtualized if they are too quiet to be audible, and are automatically un-virtualized if they become loud enough to be audible again. This is unrelated to the virtualization stealing mode: The stealing mode and events being silent can both virtualize events, but they do it in different circumstances and for unrelated reasons.

As I mentioned above, the virtualization stealing mode is not relevant here. The event instances may be virtualized if they become so quiet as to be inaudible, but will not be stolen.

You are right that the event instances nearest the listener will be the loudest and that instances that are far away will be inaudible, assuming those instances are subject to spatializing effects or some other source of distance-based attenuation.


If your intention is realistic and not ultra-saving on resources I think that you conceived the fundamentals wrongly.

It’s about natural-sounding environments after all, right?

Your initial solutions, the one you posted in your thread start, is correct.

You should create, as you did, a point that travels the river line and follows the player, and set up the emitter’s parameters to create the natural distance that the river flow’s power would transmit the sound.

Then, when the listener intercepts another part of the river or any other river, create another instance of the river sound emitter.

This is what would happen in real-life and it can be localized correctly by the player too.

So, in your first drawing, the player would hear both river sounds (probably have 1-3 different loops and change playback starting points to avoid phasing), and if you set things up correctly you will have a far more immersive experience because as the listener moves towards the second point the first point’s emitter will gradually fade-out. It goes without saying that you should use attenuation over distance for this design.

If you need to control voices for optimization, you can add some control to your algorithm. Separate the circle around your listener to four quarters and once per 3 seconds check if there are more than four emitters playing from various parts of the river because the listener has entered their trigger zone, see if the emitters are coming one from each quarter and if yes don’t spawn another instance until this state goes away.

Another optimization would be to check if the new emitter instance that is going to get triggered is within 15 degrees of another emitter instance already playing and if yes, don’t spawn it.

The 3 seconds time and 15 degrees values that I state above can be changed according to your game’s needs. The time depends on the speed the listener moves through the world and the degrees to the need for accuracy (i.e. FPS vs adventure games).

A nice trick that you can add to this system is that you can get the width, depth, and particle flow speed of the river, each time you spawn a new emitter, and control the emitter’s range, volume, and mix 3 different loops in real-time to simulate the river’s strength and flow. That way, when the listener is between one part of the river that has a weak stream and another part of the (same or other) river that has stronger flow, the experience will be accurate and natural.

How’s that sound?

1 Like