Isolated mix element (dialogue, music, or atmosphere) recorded as a separate track, enabling independent editing and the creation of international versions.
Technical Details
Standard stems are delivered as 24-bit/48kHz files in WAV or AIFF format, often as 24-bit/96kHz for feature films. A typical 5.1 stem set consists of six separate channels (L, C, R, Ls, Rs, LFE), while modern 7.1.2 Atmos productions include up to ten channels per stem. Common practice distinguishes between pre-dub stems (intermediate mixes of individual categories) and final stems (final audio groups after the final mix). Stems are typically created with a -20dBFS reference level and contain no compression or limiting.
History & Development
Stem technology developed in the 1970s parallel to multichannel audio technology, as studios began archiving separate audio elements for international distribution. With the introduction of digital workstations like Pro Tools (1991) and Fairlight (1979), the creation of stems became standardized. The transition to object-based audio (Dolby Atmos, 2012) expanded the concept to include stem objects containing spatial metadata. Modern streaming platforms like Netflix have mandated separate stems for all original productions since 2018.
Practical Application in Film
Christopher Nolan's "Dunkirk" (2017) utilized separate stems for Hans Zimmer's Shepard tone composition to allow for subsequent intensity variations for different scene versions. For "Mad Max: Fury Road" (2015), sound designer Mark Mangini created over 40 vehicle stems that were individually recombined for each action sequence. Stems enable post-mix adjustments without access to the original Pro Tools sessions and are essential for international versions, as dialogue stems can be replaced with localized versions.
Comparison & Alternatives
Stems differ from tracks (individual audio sources) by their already mixed nature and from laybacks (finished mixes) by their editability. ADM-BWF (Audio Definition Model) files are increasingly replacing traditional stems as they contain rendering information in addition to audio. For pure stereo productions, simple track exports are often sufficient, while complex VR productions rely on ambisonic stems in B-format. Netflix productions require M&E (Music & Effects) stems without dialogue for international distribution.