How Do Video to Audio Converters Work?
A video to audio converter works by extracting the audio stream from a video file - a straightforward process once you understand how video files are structured. Let’s dive into how video to audio converters work by looking at the structure of video files.
How Video Files Are Structured
Contrary to popular belief, video files are not a single piece of data. They are actually a container that holds multiple pieces of data, also called streams.
The common streams contained in a video file include:
- Video stream — The actual images or frames that are played. (encoded with H.264, H.265, VP9, etc.)
- Audio stream — The sound that accompanies the video. The most common audio codec is MP3, though AAC, WAV, and FLAC are also widely used.
- Subtitle stream — Closed captions or subtitles for accessibility or translation.
- Metadata — Information about the video file like the title, duration, chapters, and resolution.
The container format, like .mp4 and .mkv, determines how these streams are packed together. The codec compresses the data streams for storage in the container and then decompresses them when the video is played.
A typical video codec is H.264 (AVC) and a typical audio codec is MP3. The single file users interact with is these separate codecs packaged into the one container. This article has a fantastic, detailed breakdown of video files.
What Video to Audio Converters Do
Now that we know the audio in a video file is simply a single data stream within the file’s container, we just need to extract the audio stream and leave the rest of the data alone.
There are three extraction approaches:
Lossless extraction — The audio is pulled from the container and does not lose any of its data from the original file when it is compressed. Sonos has a great article explaining lossless audio. Common lossless file types include FLAC, ALAC, and WMA.
Lossy extraction — Lossy extraction on the other hand loses some data from the original file during compression, but this makes them much easier to store and takes up less bandwidth. For this reason, lossy files are very popular for music streaming. Common lossy file types are MP3 and AAC.
Uncompressed extraction — Uncompressed files do not go through any compression, so no data is lost. This typically means the file sizes are very large and require ample hard drive space to store multiple uncompressed files. Common uncompressed file types include WAV and AIFF.
Most online converters, including Vaudify, currently use lossy extraction. According to this article by Sonos, while lossy may sound bad, most people will not be able to tell the difference between lossy and lossless, especially at higher bitrates.
When extracted at a high bitrate, the smaller file sizes make lossy extraction a great option, especially for music, podcasts, and lecture recordings.
What Is Audio Bitrate? (And Why It Matters for MP3 Quality)
Bitrate determines how much audio data is processed per second and is measured in kilobits per second (kbps).
| Bitrate | Quality | Use Case |
|---|---|---|
| 64 kbps | Low | Spoken word, voice memos |
| 128 kbps | Acceptable | General listening |
| 192 kbps | High | Music, podcasts, professional audio |
| 320 kbps | Maximum (MP3) | Audiophile quality |
Vaudify outputs audio files at 192 kbps, which is a great default for all-purpose use. It sounds clean for both speech and music, and keeps files to a reasonable size. Other bitrate options will be available in future updates.
Can Extraction Improve Audio Quality?
No. Extraction can preserve the quality that was already in the video, but if the original recording was noisy, muffled, or low-bitrate, the MP3 will be too.
Extraction takes the audio stream already present in the file. It cannot improve the data it extracts. The quality of the audio in the video is as good as it will get. Video to audio converters do their best to maintain sound quality, but they cannot fix broken or low quality audio.
Why MP3?
MP3 became popular in the early 2000s and remains the most universally supported audio format. It plays on every device, every operating system, every music app, and every podcast platform without special handling.
Newer formats like AAC and Opus offer better quality at the same bitrate, but MP3’s universal compatibility makes it the practical choice for most use cases. A podcast or song exported as an AAC file may fail to upload on certain platforms, while MP3 works everywhere without conversion. With MP3 files, uploading to multiple platforms is simple and easy. Future support for these newer formats is in the works within Vaudify.
What Happens to the Video?
Once the audio is extracted, the video stream is discarded. The resulting MP3 contains only audio, which is why MP3 files are so much smaller than their source videos — a 500 MB MP4 might produce a 20–40 MB MP3 for example. This makes storage much easier, allowing users to save precious hard drive space that would be taken up by a large video file. You can often store multiple MP3 files using the same space that would have been taken up by the single video.