The Immersive Audio “Family Tree”
Published on June 7, 2022: https://www.sonarworks.com/blog/learn/the-immersive-audio-family-tree
How do we begin developing immersive audio skills? Guess what! You’ve likely already got some. Although immersive seems a totally new animal, it might be more helpful to realize that immersive audio is the most recent branch on the audio family tree. Immersive audio didn’t drop out of the sky. Multi-channel immersive arrays evolved organically from previous formats.
The goal of this article is to provide a historical introduction to formats that helped shape the technical and philosophical elements of immersive audio. From this, the reader will see how earlier formats are still recognizable in the architecture of immersive audio systems. What we can learn from these previous formats can be applied to how we work in a modern immersive environment. To quote a cliché: The more things change, the more they stay the same.
Mono
Consider the nascent audio industry when everything came through one speaker (Ca.1877). Engineers were limited by the quality of their equipment, which limited the quality of recordings. Audio recording was a technological marvel at that time—a miracle of science! Inventors of early recording machines like phonographs, graphophones, and gramophones were all vying for the title of most life-like machine. From the very beginning, there was a desire to evolve, innovate, and outdo the competition. Labels couldn’t sell lifelike recordings yet, so a greater value was placed on composition and performance. There were two channels of innovation happening at once; inventors made improvements to recording devices, and engineers invented ways to make the music heard.
This axiom still holds true; there’s more that can be done with a rough recording of a great performance than a clear recording of a poor performance. Early recording engineers had to do the best with what they had, and learned quickly to work within the means of their chosen playback system. Loud marches were easier to record because they were louder and could cut through the noise of the medium. There was no mixing to be done after the recording process was over, everything had to be perfectly blended on the way in. Each musician had to be thoughtfully placed in the room to generate a natural acoustic blend at the recording horn. These recordings may have sounded relatively crude, but this was the nature of the medium. The machine did what it did, recording engineers did their best to add humanity. This is the most pure form of our art.
Immersive engineers share a lot in common with early audio pioneers. Just as the first engineers experimented with the position of musicians in a room, immersive engineers are experimenting with recording in the round to try to find the most realistic way to recreate the raw excitement of a performance. One coincident sound-field microphone can capture an entire ensemble in stunning realism, creating an opportunity for the engineer to experiment with ways to compose the arrangement of players on the stage. Ingenious researchers are still dreaming up ways to improve capture and playback devices, and daring engineers are still composing ways to present music that connects emotionally with an audience.
Consider the nascent audio industry when everything came through one speaker (Ca.1877). Engineers were limited by the quality of their equipment, which limited the quality of recordings. To compensate, engineers prioritized composition and performance; elements any listener can appreciate. Music had to be fun, compelling, and worthy of preservation. At the time, audio recording was a technological marvel—a miracle of science! Inventors of early recording machines like phonographs, graphophones, and gramophones were all vying for the title of most life-like machine. From the very beginning, there was a desire to evolve, innovate, and outdo the competition.
This axiom still holds true; there’s more that can be done with a rough recording of a great performance than a clear recording of a poor performance. Early recording engineers had to do the best with what they had, and learned quickly to work within the means of their chosen playback system. There was no mixing to be done after the recording process was over, everything had to be perfectly blended on the way in. These recordings may have sounded relatively crude, but this was the nature of the medium, after all. The machine did what it did, recording engineers added humanity. This approach is the most pure form of the recording arts.
Immersive engineers should take inspiration from early engineers. The challenge of working within the confines of the format hasn’t changed. With any format (though especially immersive audio), the goal should not be to impress listeners with technical feats of strength. Creativity, feelings, and passion should always be the most salient elements of a recording. The best immersive recordings feature compositions that lend themselves to the strengths of the immersive platform, and showcase performances that connect emotionally with the audience. Peeling out in a fast car is fun for a moment, but it doesn’t win a race.
If mono were DNA, then immersive audio is a gene. Each speaker in an immersive array is a mono component of a larger sound field. Consider the center channel, placed directly in front of the listener. The center channel is sometimes utilized to present prominent elements of a mix like vocals, snare drum, kick, or bass. Even with oodles of output options, an engineer may still find themselves creating a mono mix in the center channel. The blend of these instruments should be seamless, clear, and impactful. The quality of the equipment has improved, but the art is the same as it ever was.
Stereo
Objectively, a mono mix isn’t very life-like, even with a great performance. By the 1930’s, the novelty of recording had begun to lose its luster, and inventors began to expand research beyond one speaker. British audio scientist, Alan Blumlein, became fascinated with the idea that engineers might be able to make recordings in a similar fashion to how we hear sound with our ears. His early experiments between the years of 1933-1935 tinkered with new methods for microphone placement that would mimic human perception (binaural). Although these experiments were intended to improve sound for early films, the concepts he developed were later used for music recording (ca. 1957). Blumlein published subsequent papers that illustrated his philosophies about binaural capture as a superior recording method. In this 1958 publication in the Journal of the Audio Engineering Society, Blumlein uses the term “immersive” as a way to describe his stated ideal of stereophonic sound. Record labels liked the idea of immersive sound, and partnered with manufacturers to produce stereo playback systems. By the 1970’s, just about every new vinyl release was issued in stereo, and mono was relegated to television and AM radio broadcast. Blumlein demonstrated that the way we hear music matters. The more lifelike a recording seems, the easier it becomes for the listener to transport themselves into the music.
Today’s engineers are already adept at creating transcendent stereo mixes. It is assured that all the skills gained from working in stereo are directly relatable to immersive workflow. Just as with mono, an engineer will find that the left and right speakers will be filled with guitars, piano, drum overheads, and any other prominent sound to be oriented in front of the listener. The same is true for rear surrounds that project an image behind the listener. An experienced immersive mixer understands the principles of stereo and uses them in all sorts of creative ways.
Spatial audio is another direct descendant of Blumlein’s binaural research. Spatial audio can be described as a simulated binaural picture of a multi-channel, immersive listening experience. More simply—spatialized audio attempts to deliver in two channels what the ears would be hearing if they were sitting in the fixed listening position of an immersive audio system. The code used to generate this image is based on mathematical calculations that Blumlein describes in his early research. The next time you listen to an immersive mix on Apple Music, be sure to thank Alan Blumlein for the privilege. Better yet, take some time to read his publications. They are well written, thought-provoking, educational, and give historical context to current industry trends.
Quadrophonic (4.0)
We’ve enjoyed a long-lasting love affair with stereo. But one limitation is that the sound stage is always in front of the listener. Envelopment is possible, but it doesn’t quite match the experience of sitting in a room with a live band. One proposed solution to this problem was to add another set of stereo speakers directly behind the listening position. These speakers would “surround” the listener with ambient cues that would more realistically simulate the experience of a venue. This format was called Quadrophonic Sound (“Quad”), and it was the first commercially available multi-channel surround sound format. Quad derived its name from its delivery method, which consisted of four discreet channels;
Left
Right
Left Surround
Right Surround
In 1971, CBS introduced Quad in both vinyl and 8-track formats. Vinyl releases consisted of specially pressed records that had four different etchings in each groove. Turntables made for quad had special styluses with four needles, one for each etching. Each etching output directly to an amplifier, which powered one of the dedicated speakers. 8-track offered the same quad experience on analog tape. The left-right and surround channels were laid onto four discrete tracks. When played in an 8-track player, four channels were sent directly to the amplifier, and then to the appropriate speakers.
Quad delivered on its promise of a more immersive listening experience. However, consumers had difficulties accepting quad as a format. For one, the output level and frequency response of vinyl quad records was limited due to the cramped space inside each groove. In addition, quad turntables and commercially released records were expensive. 8-track had similar issues in that the quality of the tape used in the cartridge was not as good as a regular cassette tape, and the quality of the sound was measurably weaker than found in stereo formats. 8-track quad players were also expensive and didn’t play nice with traditional stereo sound systems.
Another drawback to quad was that it required the listener to stay in a fixed listening position. In order to enjoy the benefits of quad, the listener could not be engaged with anything else that would take them away from that spot, or they would lose the benefits of the format. The added expense, and the required focus was too much for the average consumer. The hi-fi audiophile market was more interested, albeit frustrated with the lower overall fidelity and technical issues. Quad was a flop on the commercial market, but there were significant lessons learned from the experience that set the groundwork for subsequent surround formats.
At first, engineers were apprehensive about quad. Rear speakers were new, and it was not initially clear how engineers were supposed to use them. This format didn’t come with instructions, so most decided to focus on the stereo mix first. A common strategy was to wait for the stereo mix to be approved before moving on to the remix in quad. In most cases this process consisted of moving ambience effects from the stereo field into the rear channels. This strategy also seems to be popular among some immersive engineers too. Once the relative balance and timbre of instruments has been established in stereo, an immersive version becomes easier to imagine.
Immersive engineers still haven’t come to consensus about what to do with rear channels. While it remains common to park ambience effects in the back, some engineers favor sending drums or other instrument groups to rear channels to expand the sound stage, engage the listener, or as a means of creative expression.
Similar to quad, immersive systems still rely on a fixed listening position. Today, this is not as much of an issue for consumers thanks to spatial audio. Headphones and smart speakers are capable of creating an immersive experience that can travel with the listener from room to room, to the grocery store, or on a plane. This may be an aspect of immersive that may change in the future as head-tracking devices and augmented and virtual reality engines allow more interaction with the physical environment.
5.1
Despite the apparent failure of quad, surround sound remained an alluring idea for audio research labs like Dolby. Computers and digital audio would change the way that all music would be delivered, and this created an opening for surround to make a comeback. Digital audio offered improvements to frequency response, dynamic range, and noise floor. Compact Disks, DVDs, and Digital Audio Tape proved to be a superior medium for transmitting large amounts of data, a necessity for multichannel content.
The film industry rushed to digital, and worked closely with Dolby to develop a discreet six channel format that consisted of a quadrophonic setup with an additional speaker in the center, and a low frequency effects channel (LFE). This format was called 5.1 surround; the 5 relating to the number of speakers at ear level, and the .1 relating to the dedicated LFE channel. In 5.1, dialog could be more perfectly centered in the middle of the theater via the center channel, ambient sounds and panoramic audio had a home in the left-right channels, and ambience effects and music inhabit the rear channels. Subsequent versions of surround included more channels that improved coverage from the rear and sides and increased the size of the listening sweet-spot in movie theaters (7.1, 9.1, etc.).
Batman Returns was the first blockbuster movie to be mixed in 5.1 (1992), and Jurassic Park was close behind (1993). The sound design and mix is considered to be the most compelling elements of these films, and a major contributor to their success. Those who are old enough to remember watching Jurassic Park in theaters will remember the sensation of feeling their sternum shake while watching water in a cup ripple into droplets. This was a new, exciting, and terrifying cinematic moment. The first surround consumer systems arrived in big box stores under the name “Home Theater Systems” after the success of these films. From then on, major motion pictures were available to buy or rent on DVD-A or Blu-ray for your home-viewing pleasure in surround and on your big-screen TV. The future looked bright for the format, and the music industry began issuing releases in 5.1 on SA-CD and DVD-A.
While the film industry was able to make the most of surround, the music industry was less successful. There are a multitude of reasons why. For one, home theater systems were not cheap, and extraordinarily difficult for consumers to install. To get the best experience, the listening position must be a fixed point, and the speakers must surround the listener at specific angles. This was just not practical for the dimensions of the average living room, and consumers took a very lax approach to adhering to placement guidelines. The result was a complete breakdown of quality control and ultimately, a disappointing listening experience.
Another drawback was the lack of portability. Computers were not yet capable of calculating a spatialized binaural render yet, and this compounded the limitations of a fixed listening position. To hear a surround mix the way it was intended, you needed to be in a properly treated space like a movie theater, or (if you were lucky) a professionally treated mix facility. It should be noted that around the same time (ca. 2005), Apple introduced the iPod. The iPod was revolutionary in that it allowed the listener to carry and enjoy their entire music collection with them everywhere in their pocket. Goodbye, beloved CD wallet!
Lastly, the format widely chosen for music was out of alignment with the standard format of the film industry. Early music surround releases were distributed on Super Audio Compact Disc (SACD) format, or DVD-Audio, which were rarely compatible with Blu-Ray players that came with Home Theater Systems. It was also somewhat awkward to listen to music where you watched TV when there was nothing happening on screen. While no single issue did that much damage to the format, the combination of these obstacles frustrated consumers. 5.1 music sales slowly dimmed until the format shrank into obscurity.
Enter modern immersive audio systems, which expand upon the basic setup of a surround system. Surround formats generate a two dimensional sound field where sounds can move laterally around the listening position in a circle, and also move from front to back. When third dimension–height channels– are added, the system becomes immersive. This is because height channels allow the placement of sound on the vertical axis in addition to the lateral and depth axis.
The LFE channel is also assimilated into immersive audio architecture. The way the LFE is used in an immersive mix is borrowed from earlier surround applications. In music mixes, the LFE is often used to enhance instruments with low harmonic fundamentals like bass and kick drum. It should also be noted that many immersive engineers elect not to use an LFE in their music mixes. Similar to rear surround channels, there are no rules. Critical listening sessions of commercially released music in surround and immersive formats illustrate how different engineers address this question, and are a healthy part of any practice.
The Evolution
The architects of immersive formats are students of history. Immersive audio was designed to build off of the successes (and failures) of previous platforms, and to address unresolved issues. We have learned that multichannel audio is impractical for most consumers. We have learned that what works for the film industry should also work for the music industry. Most importantly, we have learned that music needs to be able to live with the consumer, no matter where they are.
Immersive audio delivers these solutions and more. The delivery method is simple, yet very flexible–one file delivers multiple playback options. Don’t have 11 speakers in your system? No problem! The rendering software will do the work to make it sound great on whatever you have.
Immersive is easy to enjoy and relatively inexpensive. Immersive consumer devices are some of the easiest to operate in audio history. Smart speakers and sound bars seamlessly support immersive content with the quick push of a button or voice command from the user. Immersive files can be streamed over the internet via most streaming platforms. This makes them widely accessible, entirely portable, and at no extra cost to consumers. Spatial audio lets the listener enjoy a fixed position listening experience on the go. We can get immersive at the mall, at the park, or on a plane! Immersive works just as well for music as it does for film and tv. What’s more, immersive formats are a natural fit with virtual and augmented reality applications.
A lesson we can take from all of this is that history lives among us in the present. Immersive audio is an evolved fusion of mono, stereo, quad, and surround; with height channels adding a newer dimension. Audio engineers should never stop learning new skills, which even means learning to mix in older formats. Immersive engineers have fun because they get to mix in mono, refine their stereo mix skills, think creatively and critically about envelopment, and reflect on the history of our art, all in one project. Every immersive mix is a history lesson, and each mix can teach us something new. Oh, if Alan Blumlein could see us now…