Surround to Stereo Downmixing in v2.02: discrepancies between auditioning stereo in Studio vs in Engine

This is a question that was already brought up but it seems like it still stands.

There is a very distinct difference between auditioning a 5.1 mix in Studio through a stereo Windows setup and forcing the Master Bus to Stereo.

According to that previous post, the answer could have been that 2 different downmix techniques were employed (ATSC and SRS). However the same post suggest that SRS was deprecated with v1.11.

In other words, in the first case I am hearing LtRt and in the second I am hearing LoRo.

Is my understanding correct?

SRS is no longer available in FMOD, we replaced it with a Dolby PL2 downmix, however that downmix is opt-in via FMOD_INIT_PREFER_DOLBY_DOWNMIX. The default downmix is a simple mix matrix performed by either the Core API or the Studio API.

When the master bus is set to 5.1 and the output device is stereo, the Core API performs a downmix like this:

{ 1, 0, 0.707, 0, 0.707, 0 }
{ 0, 1, 0.707, 0, 0, 0.707 }

When a Studio bus takes 5.1 input and is forced to stereo output the Studio API performs a downmix like the following. This is equivalent to adding a panner DSP effect with 5.1 input and stereo output (center panned).

{ 0.720, 0, 0.360, 0.499, 0.578, 0.129 }
{ 0, 0.720, 0.360, 0.499, 0.129, 0.578 }

Thanks Mathew for the prompt response.

The first example looks like a straight LoRo matrix downmix.
Would you be able to clarify what the second matrix example mean? Is it equivalent to any other specific standard?

Looking at the numbers, I am confused by why the center channel is so quiet compared to all other ones.

The first matrix is pretty standard, consider that a mastering downmix, it happens at the end of the signal chain and is inline with what you’d expect. Within the signal chain however, up and down mixing is performed by our panner / spatializer. So in the case of a 5.1 → 2.0 downmix, the algorithm calculates a 2.0 → 5.1 up-mix first, this is a distribution up-mix in that it treats the stereo signal as covering two hemispheres (left and right) and distributes the signal among the speakers. The resulting matrix is flipped (inputs to outputs) and used as a downmix.

We use this method because within the signal chain we can handle any channel count to any channel count, speakers can be disabled and rotations / extent can be applied anywhere. The upmix algorithm uses standard VBAP to achieve the result matrix and flipping it gives a set of predictable downmixes (when needed).

Usually the only place you have a downmix is at the final output, at that point we have known speaker modes in and out and can apply traditional well-known algorithms.