Steam Audio seems to be stereo-only in FMOD?

I’ve used Steam Audio with Unity with all the speaker layouts Unity supports like stereo or 5.1 Steam Audio’s C library itself even supports custom speaker layouts.

However, when loading the Steam Audio Spatializer in an FMOD project (set to 7.1), the output of the spatializer is always stereo:

Any idea why that is the case?

EDIT: Had a quick look at the source code but it seems to just use the channel count that FMOD sends into the effect, so really weird that it is stuck at stereo: steam-audio/fmod/src/spatialize_effect.cpp at master · ValveSoftware/steam-audio · GitHub

Looking at the effect’s source code, it looks like its output buffer’s speaker layout is always FMOD_SPEAKERMODE_STEREO. I suspect this is intentional.

According to the Steam Audio documentation, Steam Audio is a tool for using HRTF to render binaural audio. Which is to say, this effect is designed to produce audio for headphones, and headphones are always stereo. I therefore suspect that this effect producing stereo output is simply what it needs to do to achieve its stated purpose.

Interesting, it’s setting the speaker layout to FMOD_SPEAKERMODE_STEREO during process() when operation is FMOD_DSP_PROCESS_QUERY. That’s the only place it does that and I overlooked that since I just took a look at the initialization code.

Have a look at the init function where it initializes the speaker layout to whatever fits the amount of channels:

There’s more places like that in the init function, so it really looks like they wanted it to work with surround.

However, FMOD_DSP_PROCESS_QUERY is the one place where FMOD determines the actual effect’s output format:

The DSP is being queried for the expected output format and whether it needs to process audio or should be bypassed.

So maybe that’s an oversight on Steam Audio’s part. Will try to get their build env running on my machine and see if this can be adapted since the initialization seems to be pretty much surround aware.

According to the Steam Audio documentation, Steam Audio is a tool for using HRTF to render binaural audio. Which is to say, this effect is designed to produce audio for headphones, and headphones are always stereo. I therefore suspect that this effect producing stereo output is simply what it needs to do to achieve its stated purpose.

Oh, Steam Audio is much more capable than that! It supports 7.1 when using it directly with Unity and Unreal and the C library supports even custom speaker setups: Audio Buffers — Steam Audio C API documentation

For some reason (likely the code you spotted in the process function) Steam Audio is limited to stereo when using it in FMOD.

I started to add surround support to the Steam Audio FMOD plugin on GitHub. There’s some confusion around how FMOD finds a matching in and out format for a spatializer plugin. Reading FMOD’s online docs, it seems that it uses FMOD_DSP_READ_CALLBACK with FMOD_DSP_PROCESS_OPERATION::FMOD_DSP_PROCESS_QUERY. It states

FMOD_DSP_PROCESS_QUERY is to be handled only by filling out the outputarray information, and returning a relevant return code.

Which makes it sound like depending on the input format the spatializer should choose an output format. Also the docs state a similar thing here:

Retrieves the output format this DSP will produce when processing based on the input specified.

So I set the output to follow the input format as long as it’s supported by Steam Audio. Doing it like that works fine over here.

But there are situations where you want to go into the spatializer with a mono event and play it out in stereo with HRTF enabled, which the spatializer should know and act upon. Also, FMOD’s own spatializer plugin seems to follow the event’s output format (or the output format of the track the spatilizer is placed). Yet I’m not finding a way to get the current track’s output format (the one in the very right of the currently selected track).

But even if I’m changing the Steam Audio spatializer to follow the format we get from FMOD in outbufferarray (which the docs made me think I’m supposed to override depending on the input format): a track with mono input and 7.1 output will make FMOD show a mono meter at the Steam Audio Spatializer’s input and output.

Debugging this part showed that the plugin was writing the 7.1 speaker mode (FMOD_SPEAKERMODE_7POINT1), channel mask (FMOD_CHANNELMASK_7POINT1) and numchannels (8) to outputbuffer during FMOD_DSP_PROCESS_QUERY . Yet, a mono in and 7.1 out will force the spatializer down to mono in mono out:

@joseph can you or one of your engineers shed some light into this situation?

  • why is FMOD giving the spatializer plugin an outbufferarray with 7.1 in it on a 7.1 track but goes down to mono output for that plugin if the plugin puts in all the info for 7.1 into outbufferarray?
  • what should a spatializer plugin do to follow the output format of it’s current track?

The output meter should display the format assigned to the output FMOD_DSP_BUFFER_ARRAY::speakermode.
Can you please try setting that during the FMOD_DSP_PROCESS_QUERY and confirm that the meter displays the correct layout? i.e:

    if (operation == FMOD_DSP_PROCESS_QUERY)
    {
        if (outBuffers)
        {
+           outBuffers->speakermode = FMOD_SPEAKERMODE_7POINT1POINT4;
            outBuffers->buffernumchannels[0] = 2;
            outBuffers->bufferchannelmask[0] = 0;
        }
...
    }

It looks like our spatializers rely on an internal part of our API to follow the event’s output layout, and I’m not sure this is something we can expose easily. Perhaps you could just output 7.1 regardless of the input layout? Or maybe expose a new parameter to manually set the output layout in the UI?

Thanks a lot! Your suggestion forces the plugin to a 7.1.4 layout like expected. Maybe I made a mistake when I tried changing the plugin from following the inBuffers speaker mode to following the outBuffers speaker mode. I think I modified these lines to this:

	auto outputSpeakerMode = outBuffers->speakermode;

	switch (outBuffers->speakermode)

Trying it out again today, the strange behavior of the plugin’s output always being mono is gone. I’m only on a laptop atm, so maybe it had something to do with my desktop system.
However, it’s still just following the input format since the speaker modes in inBuffers and outBuffers are usually at the same value at the beginning of FMOD_DSP_PROCESS_QUERY.

I see the following issues ahead when changing Steam Audio to always output 7.1:

  • Let’s suppose the sound designer wants to make a project work in binaural (stereo+hrtf), stereo, 5.1 and 7.1. Continously outputting 7.1 after all project’s spatializers will create unnecessary load for the CPU and RAM on end-user systems with 5.1 or stereo systems due to additional buffers and up/downmixing involved.
  • For sound designers working on non-surround projects, it will be confusing when they insert the Steam Audio Spatializer to always see its 7.1 output layout. They’d probably then always have to force their event’s master output to stereo layout.
  • For VR projects targeting binaural-only, Steam Audio would have to ignore its 7.1 output speaker mode when any of its HRTF settings are enabled and output stereo-only on its 7.1 output in that case, which will add further confusion to users.

An additional parameter on Steam Audio’s UI to select the output format could help to some degree with the anticipated problems. VR-only projects could set that option to stereo. Projects targeting multiple speaker formats could set this parameter via code to their current speaker layout - a bit unnecessary since it works automatically in Unreal and Unity making FMOD feel like a second class citizen to Steam Audio.

Maybe there’s a way to communicate the current speaker mode at FMOD’s final output to the spatializer plugin? That would make it easier for the spatializer plugin to just work automatically.

If you want to follow the platform’s speaker mode (Platform Channel Format in FMOD Studio terms) you can can retrieve that with
FMOD_DSP_STATE_FUNCTIONS::getspeakermode.

if (operation == FMOD_DSP_PROCESS_QUERY)
{
    if (outBuffers)
    {
+       FMOD_SPEAKERMODE mixerMode;           // Your platform's speaker mode
+       FMOD_SPEAKERMODE outputMode;          // The final speaker mode to downmix/upmix to
+
+       dsp_state->functions->getspeakermode(
+           dsp_state,
+           &mixerMode,
+           &outputMode
+       );
+
+       outBuffers->speakermode = mixerMode;  // or outputMode if you want to follow the final downmix 
        outBuffers->buffernumchannels[0] = 2;
        outBuffers->bufferchannelmask[0] = 0;
    }
...
}

Thanks a lot, this is working great!

For Steam Audio to properly adapt a project to multiple output formats, an important consideration is whether the output format is headphone stereo or speaker stereo, which should control the HRTF property. It seems that FMOD’s speaker format doesn’t differentiate between those formats, so I wonder how this can be achieved?

I thought about various solutions but none seem to work or be good enough to be upstreamed:

  1. Automating HRTF Setting: FMOD’s limitation to only automate float values makes automating the HRTF setting on the Steam Spatializer plugin impractical. Moreover, relying on sound designers to manually adjust this for each instance adds complexity.
  2. Global Parameter in FMOD Studio: adding a global parameter to the FMOD studio project that describes whether headphones are currently used seems to be inaccessible from the context of a spatializer plugin
  3. New DSP Parameter: adding a new DSP parameter that’s controlled from Steam Audio’s engine integration (Unity, Unreal) requires to iterate over all DSP indices of all tracks of an event instance and comparing the DSPs name, resulting in unnecessary overhead.

Currently, users can employ a workaround of placing two instances into a track—one with HRTF enabled and one disabled—and then selectively excluding them based on the stereo format. However, this approach is prone to errors and adds significant editing overhead.

@jeff_fmod, do you see any avenues for enabling spatializer plugins to recognize the intended stereo format? Maybe the integration could set a user-defined property within the FMOD core system that can be picked up by the spatializer plugins or something like that …

That does seem like a bit of an oversight there. I will pitch it to the Dev team and see if we can get a finer degree of semantic separation in our speaker modes.

In the meantime, I haven’t been able to determine any key differences between the “Stereo” and “Headphone” speaker modes at an API level, so the best solution currently I think would be to expose a DSP parameter that your users can set in the FMOD Studio UI, perhaps an “Enable HRTF” toggle button.

That is already the case (Steam Audio has an HRTF bool in the UI) but it doesn’t solve the issue:

  • You want it to be disabled for end user with a speaker setup
  • You want it to be enabled for end users with headphones but only if the sound designer wanted this event to be binauralized (yes, there will always be events that should be played back without HRTF even on headphones)

Effectively, the “Enable HRTF” option should be treated by the spatializer as the sound designers intent to apply HRTF if the end user is using headphones.

Having a bool DSP parameter on the spatializer’s UI that enables HRTF rendering for that instance leads to the situation I described in my previous post:

  • Sound Designers cannot automate this parameter since it’s a bool (limitation in FMOD) so the HRTF DSP parameter would be always enabled or disabled but you want it to be set depending on the end user’s setup (headphones vs. speakers)
  • Game code would have to iterate over all DSP indices of all tracks of an event and do a string compare for each of these to find a Steam Audio effect every time an event gets fired to set the HRTF DSP parameter
  • Accessing an FMOD Studio parameter from the context of a DSP plugin to read and set the HRTF DSP parameter accordingly is not possible because the DSP cannot access things that exist in the Studio system (to my understanding)

Generally this is handled with Platform Exclusion of individual effects- so you would have one spatializer with HRTF enabled, exclude that on a custom “Stereo Speaker” platform, then have a second spatializer with HRTF disabled, and exclude that on a custom “Stereo Headphones” platform. You can then save that as a Preset Effect Chain so you can easily add it to any other events.
You could then switch between these different configurations by unloading/loading the “Stereo Speaker” or “Stereo Headphones” banks, usually in some kind of audio settings page that the end user controls.

For this case, instead of using the Preset Effect Chain mentioned above, add an instance with HRTF disabled.

I don’t think the automation is necessary if you use the platform exclusion method mentioned above, but if you did want to go with a parameter automation approach, perhaps you could implement it in terms of a float parameter ranging from 0 → 1 instead of a bool?

Not if you set it up at design time in FMOD Studio using an FMOD Studio parameter, or using the platform exclusion approach.

The DSP can’t ask if it needs to use HRTF, but it could be told to use HRTF via the hypothetical parameter mentioned above.

While I generally really like the preset feature, I don’t think it fits in this case. The Steam Audio Spatializer is not something that you should drop onto any audio source from a preset. It requires careful tweaking per source for the desired effect and the available performance budget (it can really get intense on the CPU). With this in mind, the suggested approach of platform exclusion adds a significant editing overhead because you basically have to keep the other platform’s instance in sync with the one you’re currently tweaking. It’s also error prone because you could very easily forget to keep the other instance in sync, even just for a single parameter.

This is speculative on my end because I haven’t tried it out yet, but when switching between those banks via unloading and loading, wouldn’t this stop all current sound events, requiring a level reload?

Interesting. We could suggest users to make a Steam Audio Spatializer preset with the HRTF float parameters bound to a global parameter that enables/disables headphone mix and use that to create new instances (they can detach it easily from the preset after instantiation).

When the DSP parameter is a float parameter, I guess there’s no way to still show it as a button in the UI?

Any feedback from talking to the Dev team about extending the speaker modes?

They said the intended way to handle device selection is with the platform interface and platform exclusion.
But you raise a good point about this precluding any real-time switching between headphones and speakers, and I can see how it wouldn’t scale well if you want to make lots of small tweaks across many instances.

Unfortunately not.

I think this is probably the best approach. At the device level we don’t really have any way of differentiating between headphones or speakers, so in any case it would have to be something the user configures- and as you have pointed out having a “headphone” speaker mode combined with platform exclusion wouldn’t solve this problem because it would require bank reloading or system re-initialization.

Would there be any chance of discussing this further with your team? It would be incredibly helpful to have an official way to distinguish between headphone and speaker stereo setups!

As an update to this thread, I’ve implemented the missing surround capabilities across all of Steam Audio’s FMOD plugins (thanks again to @jeff_fmod for your assistance!), and this will be included in the next release of Steam Audio: Add surround to FMOD plugins by Schroedingers-Cat · Pull Request #329 · ValveSoftware/steam-audio · GitHub

Additionally, regarding the differentiation between speaker and headphone stereo, Steam Audio’s FMOD (and Unity) effects now support a new global variable, Disable HRTF Globally, which exists on the Steam Audio Settings ScriptableObject. This single option lets you disable HRTF processing globally across all of Steam Audio’s effect instances. It works as an override to each instance’s individual HRTF setting, so you can enable it when the player is using speakers (to bypass HRTFs) and disable it when they switch to headphones to apply HRTFs. This feature will also be available in the upcoming Steam Audio release: Feature/fix stereo headphone disambiguation by Schroedingers-Cat · Pull Request #356 · ValveSoftware/steam-audio · GitHub

Jumping in here because Jeff is away at a conference this week.

The differentiation between stereo speakers and stereo headphones isn’t one we can make effectively by only looking the connected hardware. It really needs to be a user choice to decide between the two, which results in game settings and ultimately parameters within FMOD effects.

The global override sounds like a good solution to this problem, it gives you a single setting that can be wired up to the game settings allowing the user to choose.

Thanks a lot for chiming in!

Thanks for your response! Just to clarify, I’m not asking FMOD to automatically detect headphones vs. speakers. I’m proposing a way for FMOD to differentiate internally between stereo for headphones and stereo for speakers.

This would allow FMOD plugins - such as spatializers and potentially even some built-in effects - to treat these formats differently (e.g., in terms of HRTF processing) without requiring complex workarounds or changes to the plugin’s source code, which isn’t possible with all spatializers.

I see, thanks for the clarification. There is a larger task that we are hoping to tackle in the more distant future that would satisfy this, but nothing on the near horizon that will help. We have plans to support more complex speaker modes, allowing different speaker modes to use the same channel count, currently every channel count aligns exactly with one speaker mode, i.e. 6 = five point one, 8 = seven point one. The future would allow 4 = quad or first order ambisonics, another possibility is 2 = stereo vs headphones. This would still require FMOD to be told which one to produce (speaker or headphone) but it means the spatializer could receive that information directly without needing custom settings.