Hello!
I know this is an older post, but I stumbled upon it as I had a similar issue with the fMP4 format while doing some bug triaging on the Jellyfin app for Roku. Currently, if we have an AV1 video file that has 5.1 surround audio that needs to be converted to a different format or number of channels, the Jellyfin server will attempt to convert the audio, leave the video untouched and repackage everything up to stream to the Roku. We have found that the video will pass fine, but we have no audio. This seems to be in line with the problem you were having.
I saw you last posted a command line that would send audio in a separate stream. Did you ever roll this into a production solution that worked, and if so you could share anything that would be helpful? Thanks!