OK, everyone has understood that adding sounds to a silent virtual environment could massively enhance the immersive experience. Now it is time to improve the way we use sounds in this context.
One could choose to add more sounds, trying to precisely recreate the auditory landscape. Another will spend a lot of energy to get the “perfect sound”. All of this will probably help the experience but despite all your efforts it won’t be enough if you forget a golden rule of Virtual Reality:
Immersion is not about realism, it is all about interaction.
Low-resolution images with good interactions will be more effective than a photorealistic environment where all objects are stuck. This is one of the most important rule that I have learned in my career.
There are many possible interactions with sounds.
Some are obvious, like the sounds resulting from direct interactions with objects (shooting a firearm, opening a door). Some are less obvious, subtler. Among them, the outcome of interactions between objects and virtual environment is a key element to create an immersive presence. We can’t imagine a firearm shoot in a large hall without the corresponding echo. The way sounds propagate in the virtual environment depicts the space, the geometry of your surroundings. The room doesn’t have to be huge to generate such effects; even a closet has its own “sound signature”.
We use these auditory clues all the time, indoor and outdoor, to complement other information gathered from our eyes or our feet. Bringing this type of sound in your VR experience will contribute to the immersion by improving the correlation between what you see and what you hear. There is another benefit of simulating these acoustic phenomena: they also vary according to the sources and the listener positions, they help to situate yourself in the environment. Therefore, the purpose here is not realism for itself, but as a means of interaction with the environment. So, how could we achieve such a realism?
How to simulate sound propagation?
For each acoustic phenomenon described below, we will try to identify the following parameters: the information they convey the interactions they provide, how complex they are to simulate.
When a sound wave hits an obstacle, a part of its energy is absorbed, and the remaining part is divided between multiple new sound waves, such as a transmission or a reflection wave. For simplicity, we assume that the reflection wave follows the law illustrated below:
So, unless the first obstacle is a very efficient sound trap, sound will bounce to the next surface then bounce again until the remaining energy becomes negligible and it can no longer be heard. Depending on the characteristics of the room, this process can take between 0.5 seconds, for a classroom, to 9 seconds for a large cathedral.
Each bounce conveys some sound energy that can reach the ears of a listener with some delay (because of the limited celerity of sound). A common way to sum up this energy-time transfer is to use impulse responses as seen below.
Let’s consider the sequence from the moment a sound is emitted to the moment it “disappears”. We choose to use a common division of this sequence in 3 phases depicted in the image below.
Credits Torgny Lundmark
Direct Sound Roll-off
Sound, unlike light, needs a medium to travel in. It can be a gas, like air, a solid, like a concrete wall, or a liquid, like water. In the air, sound travels at the speed of about 340 meters per second, depending on air temperature, humidity, etc. Because the “original” sound journeys between the source and your ear, its characteristics are modified. Its amplitude decreases and its overall spectrum is altered; the energy in high frequencies decreasing faster than low frequencies. This is known as the roll-off effect and it is crucial to perceive distances and, consequently, situate oneself in space.
A basic interaction provided by this effect would be when you walk away from a source: sound should decrease and lose its high frequencies according to the distance. If the roll-off is not properly rendered, you “lose” sound too soon and get a silent moving object in your scene ruining the experience. Conversely, if the sound is not altered at all, your brain considers the source as very close to your ear, triggering a danger reaction. Furthermore, if the distance is properly rendered, players can evaluate visible and invisible objects and take actions if necessary: “this one is too close and should be considered as a danger”.
To simulate the roll-off, a negative gain factor and a low-pass filter are usually applied. These factors depend on distance and air properties (temperature and humidity). This process is not CPU intensive, and can perfectly be achieved on mobile.
Sometimes direct sound can’t reach the listener because there is an obstacle in its path. A portion of the sound wave energy will travel through the object and be emitted on the other side. Dimensions and characteristics of the object will define how the direct sound is altered.
Sound waves can travel through walls or objects. Their materials and thickness define what will be heard on the other side. This effect is very important, especially when the obstacle is close to the listener or to the source. Obstacles can be static like a wall, a door or it can be mobile like a vehicle. We unconsciously expect the sound to be “muffled” when the source is visually masked. The way sound is “masked” doesn’t require a high level of detail, as some simple frequency filtering may suffice. What really matters here is audio-visual consistency; for instance, occluded sources should sound as so. If there are many sources and movements, you have to manage the occlusion for all of them. If not automatized, this process can be rapidly tedious.
Early and late reflections
Early reflections (ER) represent the sound waves reaching the listener after only a few bounces. They are of a significant help to let you estimate source distances and locate yourself in a room. They also convey information about the shape of the room and its “acoustical colour” (material and furnishing). In acoustics, the energy repartition in this part of the impulse response has a strong impact on intelligibility. Using room-scale tracking in combination with realtime ER updates helps create a true sensation of movement. In their 2000 article, Shinn and Cunningham consider sound level and reflections variations the most important clues to perceive distances.
Because they have travelled a long distance and have bounced many times, the late reflections are most of the time quieter and diffuse i.e. “coming from everywhere”. Depending on room size and materials, late reflections can last between 1 and 9 seconds. As such, they help the listener’s brain to picture the room dimension.
As you may guess, reflections are the most complex sound propagation phenomenon to simulate, therefore more CPU intensive. This is especially true if you want to render their variations according to the listener movements or changes in the simulation environment.
There are several other phenomena involved in sound wave propagation. Obstacles don’t just absorb, reflect or transmit parts of the direct sound. In some configurations, they enable a propagation of the sound that circumvents them, as shown below, which is known as a diffraction phenomenon.
Credits Adams Daramy
If sound waves can “bend” they can reach a listener behind a corner or above a wall. We experience such a phenomenon every day. We perceive what is happening around a corner on a street because we are able to hear some of the sound.
Once again, the sound spectrum is altered by the diffraction process. In general, the smaller the wavelength, the less sound energy bends and circumvents an obstacle. Since wavelength is inversely proportional to frequency, high frequencies are then less diffracted than low frequencies. In other words, we will usually perceive sounds as deeper when coming from around a corner. Despite the importance of this phenomenon only a few sound engines are able to render it in real time.
We tried to describe the main phenomena involved in sound propagation and why it is important to render it in virtual environments to maximise immersion. As sound engines evolve and get easier to use, an improved level of realism is achievable for anyone sufficiently concerned by sound in virtual environments. Sphere, Aspic Technologies sound propagation engine, helps you access most of the phenomena listed in this article. Feel free to read more and ask for a trial version.
About the author
After a 15-years career as audio engineer, programmer and trainer, Olivier Sebillotte expands its horizon by attending a Research Master Degree course in Virtual Reality. He graduates in 2012 then work for a famous French engineering institute to evangelize VR and AR, mostly in the industry domain. In 2017, he joins Aspic Technologies, a French startup who provides innovative realtime audio solutions for Virtual Reality, 360 videos and video games.