Creating Sound From Byte Array Is Corrupted

Working in C# here, I’m experimenting with the idea of creating sounds using byte arrays loaded from custom files.

As a way of prototyping it, I’m loading an existing .wav file and reading the byte array data from it, then I’m creating a new sound using that very byte array, but this new sound sounds corrupted, its similar to the original sound but as if the volume is set way too high so its breaking up.

I’ve grabbed the frequency, sound type, sound format and the channels from the original sound and passed them to the CREATESOUNDEXINFO structure, which I pass to createsound().

I’ve tried various options for the mode when creating the new sound and all I get is either the same result or errors in the return.

The only thing I can think of that might be a cause is I’m not setting the bits per sample as I can’t find where to do that or even where to get that info from the original sound.

Does anyone have any suggestions for what might be the issue?

The usual culprit for a sound distorting like you’ve described is indeed the bit depth or format of the samples. FMOD uses CREATESOUNDEXINFO.format to interpret the bit depth/format of PCM data when using OPENRAW or OPENUSER. If you need accurate info on the file you’re loading, I’d recommend using a tool like MediaInfo to check, and matching up the CREATESOUNDEXINFO struct to that info. You’ll also want to ensure that the correct sound creation mode is being used, which is probably OPENUSER in this case.

If this doesn’t solve the issue, could I get you to provide a few things?

  • Your exact FMOD Engine or FMOD for Unity version number
  • The format of the sound file you’re trying to load (a screenshot from MediaInfo should do)
  • A code snippet demonstrating how you’re loading the sample data to an array and creating the sound

Thanks!

Thanks for the reply, yeah I probably should’ve included a code example to begin with.

Not sure on the right way to get the FMod version number, but the version I get from getversion() is 131622.

To preface the code, there are two things stored elsewhere:
1 - _FMod_System is the FMod.System I’ve created
2 - State is simply a RESULT for sotring error results

The function I’ve created is intended to (eventually) load a sound, extract the data needed to recreate it and pass back that all out to.

I’ve included the screenshot from Mediainfo below, though there isn’t much to it.

I notice you mentioned OPEN_RAW ‘or’ OPEN_USER, I’m actually using both as I thought they were both needed, should I be using only one?

Regarding the CREATESOUNDEXINFO.format, I’m using getFormat from the loaded sound to get both the SOUND_TYPE and SOUND_FORMAT and passing those into CREATESOUNDEXINFO.

The key thing I can say is, if I change the MODE.OPENONLY to MODE.DEFAULT and return that sound instead, it plays fine.

Here’s the code:

    public static byte[] Load_Sound_Data(string File_Name, out Sound New_Sound, out CREATESOUNDEXINFO Sound_Data)
    {
        State = _FMod_System.createSound(File_Name, MODE.OPENONLY, out Sound T_Sound);
        if (State != RESULT.OK) { System.Console.WriteLine("File could not be loaded: " + State); State = RESULT.OK; New_Sound = new(); Sound_Data = new(); return null; }

        _ = T_Sound.getLength(out uint Length, TIMEUNIT.PCMBYTES);
        byte[] Data = new byte[Length];
        _ = T_Sound.readData(Data);
        _ = T_Sound.getDefaults(out float Frequency, out _);
        _ = T_Sound.getFormat(out SOUND_TYPE Sound_Type, out SOUND_FORMAT Sound_Format, out int Channels, out _);
        _ = T_Sound.release();

        unsafe
        {
            Sound_Data = new()
            {
                length = Length,
                cbsize = sizeof(CREATESOUNDEXINFO),
                numchannels = Channels,
                defaultfrequency = (int)Frequency,
                format = Sound_Format,
                suggestedsoundtype = Sound_Type,
            };
        }

        _ = _FMod_System.createSound(Data, MODE.OPENMEMORY | MODE.OPENRAW | MODE.CREATESAMPLE | MODE._2D | MODE.LOOP_OFF | MODE.OPENUSER, ref Sound_Data, out New_Sound);

        return Data;
    }

The two modes are actually incompatible, and this is likely the source of your issue.

OPENRAW is used to tell FMOD to treat the sample data you’re providing as raw PCM, and that you’ll tell FMOD how to interpret the PCM data in the CREATESOUNDEXINFO struct. OPENUSER, on the other hand, disregards any path or buffer you pass in, and assumes you’ll either use Sound.lock() or a custom read callback to provide the sample data. In the case where you use both modes, OPENRAW will be dropped, which would cause the distortion you’re hearing - FMOD is likely interpreting garbage memory as sample data when playing the sound.

In your case, since you are providing a buffer and associated info, try dropping the use of OPENUSER and seeing whether that resolves the issue.

Ok, so I removed OPEN_USER and I’m still getting exactly the same issue.

Incase it affects part of this, I’ve grabbed the values I’m getting from the original sound that I’m passing into CREATESOUNDEXINFO and to me they appear to match what Mediainfo is showing.

Sound_Type: WAV
Sound_Format: PCM8
Frequency: 16000

I’ve also noticed in the Mediainfo it states the audio stream is unsigned, is it possible when I’m creating the sound the data is being interpreting as signed and that might cause this issue? If that is the case, how could I fix that?

The only detail I can see in Mediainfo I haven’t seen that in any of the data I can get from the sound within my code is the 128kb/s, but I suspect that’s because it comes from the 16Khz and 8bit format? But I’m not 100% sure on that point.

I thought it might be helpful if I shared a video with the audio as I’m hearing it, so I’ve done that and linked it below.

Video Link

Thanks for the video demonstrating the issue.

The bitrate of 128kb/s is indeed just the product of 16Khz * 8 bits, and doesn’t need to be provided anywhere.

This is probably the issue, yes - convention is usually that PCM is signed. There’s a bunch of ways to address this this, but fundamentally it involves shifting the range of the data from the unsigned range of 0 to 255, to the signed range of -128 to 127. Performing this operation for the whole array might look something like this:

byte[] Data = new byte[Length];
/*
    sample data gets placed into Data array with sound.readData()...
*/
sbyte[] SignedData = Data.Select(x => (sbyte)(x - (byte.MaxValue - sbyte.MaxValue))).ToArray();

That has solved the issue, the sound created from the createSound() function is no longer over-loud and corrupted!

I was wondering if there was a setting somewhat to state that the data is signed rather than unsigned, but if altering the data fixes it, I’ll go with that for now.

First I did try your example code but the createSound() function does not accept an sbyte array, so instead I simply did this:

for (int z = 0; z < Length; z++) { Data[z] += 128; }

Thankfully, adding 128 to a byte using += wraps the value automatically without needing any extra trickery.

I suspect this is going to be an issue specific to the sound files I’m using, they have been extracted from an old console game using a tool I did not write, so its unlikely to arise with sounds from other sources.

I do find it odd that the data is expected to be signed but the datatype used is unsigned.

It’s also odd that the sound works perfectly fine when loading it directly from the file but the data extracted from that sound needs to be altered to be able to work correctly.

Thank you for your help and your patience.

No problem!

Sorry, yes - I used sbyte to communicate the change to a signed data type. Keeping it as a byte array is fine since FMOD will interpret the data as signed regardless of the datatype being byte.

As an aside, using a bitwise operator to convert from unsigned to signed should be more efficient, i.e. Data[z] ^= 128.

After doing some digging, it seems that the reason for this is due to the distinction between loading from the file vs. loading raw PCM integer data. For example, according to the WAVE file spec, a .WAV file containing PCM data expects 1-8 bit PCM data to be unsigned:

  • PCM data is two’s-complement except for resolutions of 1-8 bits, which are represented as offset binary.

so FMOD will follow this specification when loading the file, which is why playing from the file works without an issue. However, raw PCM integer data doesn’t come with a header or expected file format, so FMOD expects raw PCM integer data to be signed - in this case, a linear remapping from unsigned to signed is needed.

This info isn’t present in our docs at the moment, so I’ve flagged the relevant pages for improvements internally.

1 Like