Fmod Programmer Sound and memory


I am using Programmer Sound for all the voice lines in the game (approximately over 10,000 lines)

After looking how each line is called by code here :

Do I need to create a separated bank for for memory optimization? (Like group every 200 lines which result to 50 banks?)
It seems like it’s based on streaming, then would it be okay to just use 1 bank for all?

Thank you in advance.


There’s a few factors to keep in mind when thinking of optimizing your audio table. The first applies to the whole project, which is the way you’re handling your audio assets: their format, size, loading mode, whether they’re set to stream, etc. The same principles apply to the assets in your audio table as well. Take a look at the Studio Documentation on Compression and Platform Encoding, as well as the Core API Glossary entry on Sounds for more info.

The second is the size of the audio table itself, and by extension the bank that it’s stored in - while asset/entity lookup for the bank and audio table should remain fairly fast, a larger bank will impact the amount of data loaded into memory at any one time, and will also cause loading/unloading the bank to be slower. If you are concerned about optimization and are noticing some kind of negative impact from the 10000+ audio files that you’re loading at once, I would recommend splitting your audio tables into separate banks.

Your audio table assets won’t be streamed unless they’re set to stream in FMOD Studio. Setting them to stream will decrease the amount of memory being used at once, which may make it more suitable to use a single bank for all of your voice lines, but streams do come with the downside of increased CPU usage.

Streams also have behavior that may make then undesirable for voice lines - as they’re loaded and played in real-time, it’s impossible to schedule streaming assets to be played with exact, sample-accurate timing, and they can also increase your audio latency. This can occasionally make them unsuitable if you’re trying to sync them up with in-game behavior (i.e. lip syncing), especially if you’re doing a lot with other audio at once, or working with less powerful hardware.

Hope this helps!

This is very helpful.

Each bank will have about 3800 lines (10,000 lines were count of all 3 localized voices)
I am planning to separate them into 3 banks such as = bank_en, bank_ja, bank_kr (english, japanese, korean)

Then each bank will have 3800 voice lines with key string something like “Tifa_6_20”.

Again, it probably depends on the audio usage of the game, but if I am using only 4-5 bgm + sfx at the time while dialogue voice is playing, would you prefer to separate them out or 1 bank with 3800 lines? (only 9 lines are lip-sync so maybe I will separate that into a different bank though)

Better way to put this question is:
Essentially, the streamed bank with 3800 voice’s CPU overhead is string comparison to find the key in the audio table, correct? (e.g. finding a string name “Tifa_5_21”!)

Small correction: there’s no way to “stream” a bank. Even if all assets in a bank are set to “stream”, you will still need to load the bank itself to play the streaming assets - the streaming assets won’t be loaded into memory until actually needed, but metadata regarding them will.

That said, the string comparison uses a radix tree, and as such shouldn’t incur too much of a CPU hit even with a large amount of audio table entries. The CPU overhead from actually playing back the streamed assets is probably more important to consider.

Personally, I would seperate audio tables into different banks, both to minimize on memory usage when loading/unloading banks, and to compartmentalize them by some measure of relevancy to try to create a better workflow. However, I can’t comment on your project - you know it far better than I do, so you’re in a better place to judge whether seperating your tables into different banks would be appropriate. If you aren’t running into any issues with loading/unloading banks, streaming latency, etc. then it might be fine as is - there’s not much point in pre-emptively optimizing, especially if you’ll need to spend time re-factoring your voiceline playback scripts and adding a bunch more bank loads.