What is the ideal mixing process like, from mono to 3D?

I did music for 25 years and am now embarking on an Unreal Engine journey.
Typically to get the the best balance in music mixing, you start with mono to ensure the balance is good regardless of end user configuration, and then pan the tracks after all effects have been applied and faders and level automation is in the box.
My project will include headphones 3D quite a bit and I’m curious about how seasoned professionals approach this in FMOD integration with games, especially with regards to phase cancellations.
If the game can’t deliver it in mono, it means we’re compensating for poor initial sound design and might end up with a muddy mix, breaking cheap speakers and headphones.
Is this a good process, or should I be aware of 3D panning from the get go, even though it’s merely for placement in the environment, bullets flying past your ears for example.

There’s no hard and fast rule for mixing music into your game. There are a lot of features available that can help with getting your music to comfortably sit in the mix, such as sidechaining and the multiband EQ. If you use the Loudness Meter on the master bus in the mixer you should be able to see if your mix is getting too loud. As long as you’re within your platform’s suggested LUFS then you shouldn’t need to worry about breaking cheap speakers/headphones.