DEV Community

Cover image for Working with audio in ffmpeg
Alan Allard for Eyevinn Video Dev-Team Blog

Posted on • Edited on

Working with audio in ffmpeg

Everyone working in video streaming development knows about ffmpeg and most of us use it from time to time in our daily assignments. It's also well-known that ffmpeg has a whole bunch of functionality for audio. Not as many of us delve into that aspect I suspect though, beyond perhaps converting the occasional audio track format in some way. Ffmpeg does go way beyond that though, offering a plethora of conversion, analysis and even sound generation tools.

Re-capping simply and very briefly, a typical ffmpeg command consists of one or more inputs (-i), any number of operations and a number of outputs. (There are many more aspects to the command syntax though - see the ffmpeg docs for the full spec).

ffmpeg -i bunny.mp4 -c:v libvpx-vp9 -c:a libvorbis bunny.webm
Enter fullscreen mode Exit fullscreen mode

The power lies in what the inputs can be and in the operations, which are extremely numerous, chainable - and also extensible (that's something we will adress in the next article...). Here is a slightly more complex example, by a colleague of mine:

# Low Latency ScreenGrab
`ffmpeg -f avfoundation -capture_cursor 1 -i 3 -r 30 -c:v libx264 -an -tune zerolatency -preset ultrafast -pix_fmt yuv420p -fflags flush_packets -f mpegts - | mpv - --no-cache --untimed --no-demuxer-thread --video-sync=audio --vd-lavc-threads=1`
Enter fullscreen mode Exit fullscreen mode

There, we are using the screen as an input. (This is quite a specific command and will need some adjustment when running on another device though).

Download FFmpeg and try the following examples. I am using a track snippet that you can download from Freesound.

Audio conversion

A video streaming developer might work with encoding or transcoding various audio tracks for video. For example, you might want to convert from mono to stereo:

ffmpeg -i critics.flac -ac 1 critics_mono.flac
Enter fullscreen mode Exit fullscreen mode

or 5.1 to LR stereo (here is the same file mixed as 5.1 so you can try it out):

ffmpeg -i critics_surround.wav -ac 2 critics.wav
Enter fullscreen mode Exit fullscreen mode

or converting formats, which can be even simpler, for example to mp3 or flac:

ffmpeg -i critics.wav critics.mp3
ffmpeg -i critics.wav critics.flac
Enter fullscreen mode Exit fullscreen mode

It's also worth mentioning the ffprobe tool that is typically bundled with ffmpeg. Use it on any of the above files to see all the available file info. In the case of the surround example, it reveals my song metadata from Logic Pro X:

Input #0, wav, from 'critics_surround.wav':
  Metadata:
    encoded_by      : Logic Pro X
    date            : 2022-02-25
    creation_time   : 12:04:39
    time_reference  : 163250181
    umid            : 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004500FA50
    coding_history  : 
  Duration: 00:00:15.02, bitrate: 4237 kb/s
  Chapters:
    Chapter #0:0: start 0.000000, end 15.018662
      Metadata:
        title           : Tempo: 132.0
    Chapter #0:1: start 0.000000, end 0.113628
      Metadata:
        title           : Bridge 1
    Chapter #0:2: start 0.113628, end 14.545442
      Metadata:
        title           : Chorus
    Chapter #0:3: start 14.545442, end 15.018662
      Metadata:
        title           : Chorus w/ Vox
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 5.1, s16, 4233 kb/s
Enter fullscreen mode Exit fullscreen mode

While we're playing around with various channels, what happens if you want to convert from left-right stereo to mid-sides stereo...? Well, then you'll want to make use of a filter...

filters

Simple audio filters are invoked with the -af flag. A filter in this case is a tool with an input and output that manipulates the incoming data in some way. Some examples of when you can make use of audio filters are as follows.

  • When you want to fade out some audio:
ffmpeg -i critics.wav -a 'afade=t=out:st=4:d=9' critics_faded.wav
Enter fullscreen mode Exit fullscreen mode
  • To create a fairly naive stereo-widening effect by delaying the left and right channels by different, small amounts:
ffmpeg -i critics.wav -af 'adelay=5|0|10' critics_wide.wav
Enter fullscreen mode Exit fullscreen mode
  • To create a flange effect using an echo algorithm with a very short delay time and regeneration:
ffmpeg -i critics.wav -af 'aecho=0.8:0.88:6:0.9' critics_echo_flange.wav
Enter fullscreen mode Exit fullscreen mode
  • To pitch down by 2 octaves and filter audio (this one sounds wild...):
ffmpeg -i critics.wav -af 'afreqshift=shift=-2400' critics_shifted.wav
Enter fullscreen mode Exit fullscreen mode

So back to the stereo conversion we wanted to do. We could use the stereotools audio filter for that:

ffmpeg -i critics.wav -af 'stereotools=mode=lr>ms' critics_ms.wav
Enter fullscreen mode Exit fullscreen mode

Complex filters

If you want to work with several inputs and/or outputs, you will need to use complex filters instead, using the -filter_complex option instead of -af. Here is an example that outputs three separately filtered audio files from the original source:

ffmpeg -i critics.wav -filter_complex 'acrossover=split=900 7000:order=8th[LOW][MID][HIGH]' -map '[LOW]' critics_low.wav -map '[MID]' critics_mid.wav -map '[HIGH]' critics_high.wav
Enter fullscreen mode Exit fullscreen mode

And an example - with multiple inputs - of performing a crossfade of two of the earlier files:

ffmpeg -i critics.wav -i critics_shifted.wav -filter_complex acrossfade=d=10:c1=exp:c2=exp critics_xfade.wav
Enter fullscreen mode Exit fullscreen mode

You can chain complex_filters one after the other using an array-like syntax:

ffmpeg -i critics.wav -filter_complex 'afreqshift=shift=-1200[shifted];[shifted]chorus=0.5:0.9:50|60|40:0.4|0.32|0.3:0.25|0.4|0.3:2|2.3|1.3[chorused];[chorused]areverse' critics_destroyed.wav
Enter fullscreen mode Exit fullscreen mode

To break this down into its three parts:

  1. afreqshift=shift=-1200[shifted] - shifts the audio down an octave
  2. [shifted]chorus=0.5:0.9:50|60|40:0.4|0.32|0.3:0.25|0.4|0.3:2|2.3|1.3[chorused] - takes the output of the shifted filter and applies chorus to it
  3. [chorused]areverse - takes the output of the chorused filter and reverses it

You can do things like this on parallel inputs and outputs, then mix them together:

ffmpeg -i critics.wav -i critics_low.wav -filter_complex '[0]afreqshift=shift=-1200[shifted];[shifted]chorus=0.5:0.9:50|60|40:0.4|0.32|0.3:0.25|0.4|0.3:2|2.3|1.3[chorused];[chorused]areverse[reversed];[1]afreqshift=shift=-3600[shifted2];[reversed][shifted2]amix=inputs=2' critics_annihilated.wav
Enter fullscreen mode Exit fullscreen mode

Notice the similarity to the previous statement, but here we have added two input files and a new filter path for the second file. We add a [0] to the first audio path to capture the input from the first file then:

  1. [1]afreqshift=shift=-3600[shifted2]; - captures the input from the second audio file and shifts it down 3 octaves.
  2. [reversed][shifted2]amix=inputs=2 - combines the result of both audio flows into one file.

These kinds of flows are called audiographs and can be much more complex than this if necessary. There are even tools included in the ffmpeg suite to represent such audiographs graphically. I hope to take a more detailed look at this in the future.

There are many more built-in audio filters available in ffmpeg. Beyond that though, it's also possible to use audio plugins within ffmpeg. One of the formats you can use for this is the widely used LV2 format, the same format used in some recent audio effects hardware (eg. Mod Devices' Mod Duo X). Support for this is not necessarily included in a typical ffmpeg installation though. In the next article I will look at custom ffmpeg builds and getting LV2 plugins working within them.

I recommend another article on this subject that was very much the inspiration for my own. I've tried not to repeat what was said there and instead provided yet more useful examples as a complement to that article.

UPDATE: Similarly to Part 2 of this blog, which is now available, I added a frequency plot of the audio file as the blog post image using ffmpeg. I also converted the file to 16 bit, 44.1 kHz prior to plotting:

ffmpeg -i "critics (Excerpt).wav" -acodec pcm_s16le -ar 44100 critics16bit.wav

ffmpeg -i "critics16bit.wav" -lavfi \
showspectrumpic=s=1000x420:scale=sqrt:color=cividis:legend=0:gain=20 c.png
ffmpeg -i c.png -vf crop=900:600:0:0 -update 1 c1.png
Enter fullscreen mode Exit fullscreen mode

Alan Allard is a developer at Eyevinn Technology, the European leading independent consultancy firm specializing in video technology and media distribution.

If you need assistance in the development and implementation of this, our team of video developers are happy to help out. If you have any questions or comments just drop us a line in the comments section to this post.

Top comments (0)