Sound and Audio

“Hearing is the only sensory system that allows us to know what is going on everywhere in our environment.”

How hearing works?


“In mammals, the inner workings of the ear are encased in the hardest bone of the body. It contains the smallest bones, the smallest muscles, and the smallest, yet one of the most elegant organs of the body, the cochlea (part of the inner ear).” [4]


The process of hearing is complex and it’s still being studied, but in a simple manner, it happens in two steps. The ear, which is the sensory organ in charge of gathering the information and the brain that will extract and interpret that information.

The human species have several sensory organs that will detect an energetic signal and convert it to electrical energy. That process of converting energy is called as transduction. Each sensory organ will detect a different type of energy. The nose and tongue detect chemical energy, the eye detects light energy, the skin detects heat and mechanical energy and the ear, the mechanical energy. [4]

The ear captures mechanical energy (vibrations) and will convert it into electrical stimulus, that the brain will interpret and make sense of it.

The hearing system is a mechanism used to sense vibrations coming from real world objects. Most physical objects cause molecules to vibrate and our ear is capable of recognizing those vibrations.

Sound can travel through different mediums at different velocities.

The evolution of the ear can be traced back to millions of years ago, when marine animals had hearing organs that detected vibrations in the water. During the evolution of some of those animals going to land, the ear has shaped differently according to the needs.


“It is fascinating that the tiny bones in the middle ear appear to have evolved from gills that were no longer needed.” [4]


The scientific field that studies how the human brain perceives sound, it’s called psychoacoustics.

The brain has the ability to distinguish multiple sounds, even if they’re playing at the same time. You can easily distinguish a car, from a human voice and a violin, even if they’re all playing in simultaneous. If you think about it, what’s happening is just molecules floating around with different characteristics, yet our brain has the capacity to understand the molecules movements and characteristics, and make sense of it. All of this, to make sure it help us survive, by distinguishing what objects / animals are creating movements. [3]

“Parts of the sound may be covered up by other sounds, or lost. The brain has to make a calculated guess about what is really out there. It does so very quickly and generally subconsciously.” [3]

The ear is not only responsible for hearing, but also for senses, such as: balance, posture control and gaze stabilization. [2]


“If a tree falls in a forest and no one is there to hear it, does it make a sound? (The question was first posed by the Irish philosopher George Berkeley) Simply, no – sound is a mental image created by the brain in response to vibrating molecules.” [1]


Sound during ancient times

Through history there’s always a need to communicate and understand the environment around. Usually the value of the information will depend on the context and age, but in can be about spirituality, morality, astronomy or even how to produce a bread.

Sound plays an important role in a diverse set of applications and here are some examples that you might know or find curious.

Religious / spiritual chants, used to evoke a sense of tranquility when connecting to a higher power or inner peace.

Hymns that were used to prepare the spirit for some type of ritual, some peaceful, others dangerous, such as wars.

Ringing rocks, rocks that emit a different sound due their characteristics but were mysterious or intriguing enough to be important.

Monuments that were praised by their acoustical properties (reverberation), such as caverns, temples, tombs, churches theaters and so on...

War sounds, being made by human voice, such as screams, or played by instruments that were meant to terrify the enemy. Even in modern history, you can see the use of these, like during the World War 2, when the Nazi dive bombers known as ‘stukas’, had a siren that emitted a terrifying sound when diving to bomb.

Proverbs that were meant to be memorable, while passing important information through generations.

Music!

There’s much more uses of sound during ancient times, but the point is to show you how important the role of sound has being, independently of where and how it were being used. Most of these sonic interactions are still being used today. There’s even a tribe in Africa that for generations has being communicating with a bird called honeyguide that helps guiding humans to harvest honey.

“Greek and Roman theaters are remarkable sonic wonders in which thousands of spectators can hear without the aid of modern electronics. They were clearly designed to achieve good acoustics…” [1]

Sound vs Audio

Sound is a physical phenomenon, audio is an analog or digital representation of sound.

Sound are molecules vibrating in a medium, that come closer together (compression) and then moving further apart (rarefaction). In its essence are particles moving using two patterns (compression and rarefaction), that generates mechanical energy. Not all particles movement occur with compression and rarefaction, but in terms of sound waves (mechanical energy), that’s how it works.

To capture mechanical energy, you need to have a device that’s capable of perceiving those vibrations. Those vibrations can be stored and later reproduced, using the exact opposite concept. When reproducing, the device will have to re-create mechanical energy, according to how those vibrations were stored. Those devices can either be analog or digital.

When talking about sound’s energy, we will be mainly focused on mechanical (sound pressure level - SPL) and electrical (volts - V).

It’s also possible to have real-time process, where recording, storing and reproducing happens in real-time (there’s always a delay, but it can be imperceptible to the brain).

Depending on the type of device used (analog or digital) and the desired goal, the process can be affected by:

Transduction: the term used for capturing one type of energy and converting into another. Like capturing mechanical energy and convert into electrical energy.

Modulation: the process used to modify an already captured signal. There’s no energy conversion, but there’s a process of encoding information. Like encoding an analog signal into a digital one.

Let’s imagine that you’re on your smartphone having a conversation with a friend. When you speak, you produce mechanical energy that is captured by a small microphone inside your phone. The words you’re saying are then recorded into your phone (digital medium) and will be also reproduced in your friend’s phone speaker as mechanical energy. The ear will capture and convert those mechanical vibrations into electrical energy and the brain will make sense of it. All happening in real-time, without consciously thinking about it. The process is much more complex because it also involves long distance communications and brain mechanisms that are in charge of communication (language). But what seems like a simple phone call, it’s actually the opposite (a complex set of several tasks).

So, during the conversation there was transduction and modulation happening.

Transduction happened every time the microphone captured and converted mechanical energy into electrical energy, and when the speaker converted electrical energy into mechanical energy. Although our ears can convert mechanical energy into electrical energy, they only capture mechanical energy.

Modulation happened every time the voice (signal) had to be modulated into a different medium (analog or digital). In this case, there was an Analog-to-Digital Conversion (ADC) when the electrical signal (already captured and converted by the microphone) was modulated into a digital signal (the phone). The signal was then recorded into your phone’s memory (temporary or not) as a digital representation of your talking. When your friend’s phone speaker is reproducing the conversation, it’ll do exactly the opposite process, of getting the digital signal and converting into analog, so your ears are able to capture it. That process is called Digital-to-Analog Conversion (DAC), where a digital signal will be modulated into an analog signal. Both the microphone and the speaker membrane needs electricity (voltage) to work.

Sound are physical vibrations and those vibrations can be represented into a signal, either it be analog or digital. That’s how we can record, store and reproduce the sonic characteristics of what’s happening in the real world.

Analog signals are represented by continuous varying electrical voltage (V) while digital signals are represented by binary code (0’s and 1’s). Both mediums have different characteristics, while digital can be a great medium for storing information without basically altering it, electrical systems, unlike digital, vary and can have a bigger margin of error when representing a signal. For a system to reproduce mechanical energy, such as speaker, it needs to have an electrical mechanism built-in that re-creates sound vibrations.

Sound is not only important in the entrainment domain, it’s also vital for communications!

Modern use of sound


Why do most modern applications use digital audio?

With the introduction of electronical components such as microchips and higher computational power, digital processing started to gain a bigger reputation in the audio community. There’s an on-going debate (probably forever) on analog vs digital, but pretty much every modern application uses digital audio. There are some pros and cons to each medium, but digital has the capacity to be very cheap, very stable, easily editable and easy to migrate into other systems. So, in terms of cost / efficiency, fidelity and versatility, digital is a better solution. Since it’s easier to manipulate, it’s better for crafting an experience.

Don’t forget that at the end of the day, our ear only captures mechanical energy, not binary code.


How is the analog signal modulated into a digital signal?

To modulate an analog signal into a digital one, there’s a process called ADC (Analog-to-Digital Conversation). The problem is how to turn a continuous varying electrical voltage into 0’s and 1’s?

Since the analog signal is a continuous wave, the first step is to get a representation in terms of samples, like recreating a line using points (samples). That process is called sampling. It’s based on the concept of converting continuous analog signals into a digital representation, known as Nyquist-Shannon Sampling Theorem.

Then there’s a process of quantization, which will determine how many samples can be mapped from the amplitude of an analog wave. The higher the number, the higher the fidelity will be. That’s why some formats are considered high-fidelity and others don’t. You can convert the same analog signal into two or more different digital representations, being with more or less fidelity. Obviously, one will be almost identical to the analog representation (higher-fidelity - lossless), while the other one won’t be as a good representation (less-fidelity). The choice of format will depend on the desired goal, but higher fidelity, requires higher processing power, which will introduce latency due to the calculations. Processing power will be determined by several factors, but an important one is if the application is happening in real-time or not. If it’s not a real-time application, then using a higher-fidelity format it will result in a “more realistic” sound. If it’s real-time application, then you need to find a good balance between latency and fidelity (quality).

After the signal is digitalized (binary code), it can be manipulated by digital tools. That process is known as Digital Signal Processing (DSP). The process is the same for when converting a physical photo (analog) into a digital representation (pixels), although each field is different due to the nature of the analog signal.


Linear vs non-linear processing

When processing a sound, depending on the tool, it can be done in a linear or non-linear way.


Analog is always non-linear, since there are always small imperfections happening in the real physical word. In the digital realm, you can have both processes, but what distinguishes them is that sounds that were processed by linear processes, can be fully restored into their original signal, by inverting the process.

Linear process = Predictable

Non-linear = Unpredictable


Modern applications

There’s a variety of modern uses of sound, some will be later presented in greater detail, others won’t, but here’s a list of some modern applications that use sound:
- Music
- Films
- Games
- Digital interfaces
- Advertisement / Branding
- Medicine
- Public infrastructures / transportation
- Virtual reality
- ...

The list is vast and ongoing, but for now we’ll focus on what are the characteristics of a sound wave and later on, on sound applications such as music and films.

Sonic characteristics

Although sound is transmitted in the air by particles vibrating at certain frequencies, there’s actually not any information on those particles about where they came from or if they’re associated to any object. It’s the brain that’s in charge of figuring this information out. It’s not the ear that distinguish a rainy sound from a musical piece, it’s the brain, but there’s nothing in the particles that say, ‘this is rain; this is music’.

To distinguish that, the brain goes through two processes known as feature extraction and as feature integration.

During the feature extraction process, the brain extracts low-level information using a specialized neural networks, relatively to which sensory organ that is used. In this particularly case (the ear), the brain will extract several characteristics from a sound wave, such as:
- Pitch
- Timbre
- Spatial location
- Loudness
- Reverberant environment
- Tone durations
- ...

Then, happening in parallel, there’s the process of combining the low-level information gathered by the extraction (feature extraction) and make sense of it, so it can be translated into real word representations. The process in charge is feature integration. It’s a crucial part of recognizing a sound. [3]

During the process of collecting information and making sense of it, the mechanisms of memory are also working.

Now we’ll have a deep look at the sonic characteristics. Characteristics that we are able to manipulate to create and enrich a detailed sonic experience. It’s also important to notice there’s always a trade-off when using tools and the quality of the recording is what will dictate the quality of final project. Signal processors, even the best ones, won’t improve your signal fidelity, at the best they’ll preserve it.


Pitch (Hertz - Hz)

“Pitch is a purely psychological construct, related both to the actual frequency of a particular tone and to its relative position in the musical scale.” [3]

The pitched is determined by the frequency of a sound wave (signal). It’s also possible to have a pure tone (sinewave), only composed by one pitch or a signal composed by several pitches (complex signal). In the real world, there are no 100% pure tones, every sound is a complex signal.

The human earing is determined by a frequency range that goes from 20Hz to 20.000Hz (20Khz). This range will depend on each person and the older it gets, the less capacity it has to ear sounds on the high spectrum of the range.

“The range of human hearing is generally 20Hz to 20000Hz, but this doesn’t mean that the range of human pitch perception is the same; although we can hear sounds in this entire range, they don’t all sound musical.” [3]

In simple terms, the frequency range can be divided into three sections: lows, mids and highs.


Lows (20Hz – 250Hz)

Known for deep and powerful sounds. Sounds that can be physically felt in areas such as stomach and chest, due to the powerful vibrations. The lower the value, the harder is to be heard / reproduced and it’ll perceived as a rumble.


Mids (250Hz - 2000Hz)

This range contains the intelligently of the majority of instruments as well as the human voice. Unlike the low and high spectrum of the human earing, this range is where are our ears are the most efficient at hearing.


Highs (2000Hz – 20.000Hz)

This is where our brain gets a bigger sense of clarity and detail of the sounds. The higher range of this interval, is also known as ‘air’.

There are also signals happening in nature with lower and higher pitches that what we humans cannot perceive, those are called ultra sounds.

Depending on the frequency spectrum, sounds are usually categorized into subjective words to facilitate communication.

Pitch is also the primary fundamental behind how musical emotion is felt. Higher notes tend to convey excitement while lower notes sadness.

In terms of manipulating the pitch or the frequency balance of a sound, there’s a tool called equalizer.


Timbre (tonality)


“Timbre distinguishes one instrument from another when both are playing the same written note.” [3]

When a note (tone) is played using an instrument, the instrument is not only reproducing one isolated pitch. Depending on the vibrations of the instrument, the note played will generate different vibrations, known as overtones. Those vibrations are what distinguish the same note being played on a violin or a piano. Depending on each frequency range is located the note played (lows, mids or highs), the overtones will also change. That’s why our brain has the ability to distinguish tonalities, even if they’re played in the same instrument.

‘If you play this note in this range, it really sounds sad’.


“Composers use timbre as a compositional tool; they chose musical instruments – and combinations of musical instruments – to express particular emotions, and to convey a sense of atmosphere or mood.” [3]

The way we perceive frequency (pitch) range is directly related to loudness. Lower frequency sounds require much more pressure than higher frequencies sounds to be reproduced. This earing relation of frequency and loudness, is known as the Fletcher-Munson curves.

According to the theory, when you hear something quieter, the ear tends to be more sensible to the midrange frequency section. When listening to moderate – louder levels (around 80dB), it tends to perceive all the three frequency ranges more equally (low, mid, high).


Rhythm


“Technically, any series of sounds or events that has duration can be called a “rhythm”; the greater the number of component durations and the variety of their sequential organization, the more complex the rhythm.” [7]


At first glance, rhythm might look as a simple concept to explain, but there’s more than meets the eye. There are four different aspects that affect our temporal music perception and cognition: rhythmic pattern, meter, tempo and timing.


“A rhythm can be considered as consisting of several components, such as rhythmic pattern, meter, tempo and timing.” [7]


Rhythmic pattern is a pattern of durations that can be represented in symbolic scale, such as in notated musical scores.

Meter is based on the beat or pulse that a listener might assign. It’s also known as metrical structure.

Tempo is related to the impression of duration in terms of speed rate. The perception of tempo seems to be related to metrical structure, or notion of beat.

Timing is related to the expressiveness nuances in a rhythm. That expressiveness can be felt as a more “mechanical”, “laid back” or “rushed” sensation. This sensation comes from the fact that some notes are played a bit earlier or later than what is expected by the imposed rhythm. It’s one of the most important aspect of a musical performance that creates the sensation of flow.

There’s also a phenomenon known as syncopation, when one or several notes are moved in time, avoiding to create a regularity in rhythm. This creates sensations that might seem more intriguing and complex, but in reality it’s an interaction between the rhythmic pattern and the meter.


“Beat induction is the cognitive skill that allows us to hear a regular pulse in music and enables our synchronization with it. Perceiving this regularity in music allows us to dance and make music together. As such, it can be considered a fundamental human trait that, arguably, played a decisive role in the origin of music.” [7]


These rhythmic aspects can be manipulated to create interesting patterns that can create anticipation / tension or serve a different purpose. In the electronic music domain, it’s more common to have a “mechanical” feeling, where the rhythmic pattern is very predictable according to the beat, while in classical or more emotional music, the rhythmic sensation is more “laid back” to accentuate the emotion and not the so much the beat.


“Tempo is a major factor in conveying emotion. Songs with fast tempos tend to be regarded as happy, and songs with slow tempos as sad. Although this is an oversimplification…” [3]



Loudness


“Loudness is purely psychological construct that relates to how much energy an instrument creates – how much air it displaces – and what an acoustician would call the amplitude of a tone.” [1]

Loudness is related to the vibrations of the particles. The more the particles vibrate, the louder it’ll sound. Although in reality, there’s no “sound”, only mechanical energy (vibrations) that is translated in the brain as loudness. Loudness is a psychoacoustic concept, only happens in the brain.

The vibrations of the particles cause a change in pressure, also known as sound pressure level (SPL). That physical pressure is measured in decibels (dB).

Although decibels are a representation of the pressure / vibrations caused by the particles in the physical world, it’s not entirely related to how the brain perceives loudness. Frequency (pitch) and loudness are related, and one affects the perception of the other.

Loudness can also be measured in different units, depending on the desired application.

The brain perception of loudness is logarithmic, not linear. When doubling the same sound, it won’t ‘sound’ as twice as loud, but more precisely, it’ll sound 3dB louder.

When exposed to loud sounds, our ears have a built in mechanism that compresses sound, in order to protect us from ear damage.

The loudness value when editing audio will depend on the final application. Mixing music requires a different approach than mixing for TV, or mixing for theater.


Loudness war


“The origin of “Loudness envy” is simply psychoacoustic: when two identical programs are presented at slightly differing loudness, the louder of the two appears to sound “better”, and therefore attracts listener’s attention.” [5]


“The louder the better!” It used to be a mantra, and it’s still common, but there’s a big misconception. Making a sound louder, might be perceived as “better” when compared to an identical one, but when both play at equal loudness, it’s a different story.

If both sounds are 100% identical but they only change in level, when both compared at the same loudness, they’ll sound exactly the same. The problem occurs on the techniques used to achieve loudness. There’s a ceiling for how much a sound can be loud without damaging it. Yet, it’s a common to see techniques that damage the sound in order to achieve louder loudness, but the problem occurs when both are compared at the same loudness, one will sound as lacking “quality”.

Listeners have a volume knob control, but not a quality knob.

“Astoundingly, just 0.2dB made the sound seem bigger, wider, and deeper on an instant comparison. We don’t perceive 0.2dB as louder per se, but we do hear it as a quality difference…The threshold for audible differences could be as low as 0.1dB.” [5]


Reverberation


“Reverberation refers to the perception of how distant the source is from us in combination with how large a room or a hall the music is in.” [3]

Every time a sound plays in a room, no matter how small or large, it’ll generate some reflections coming from the walls, floor, ceiling, objects, people… Those sounds (particles) are known as reflections, but the original sound, is known as direct sound (direct signal). The direct sound travels directly to your ears without any reflections. The amount of perceived reflections in relation to the direct signal is what dictates how much reverberation is happening.

Less reflections = More clarity on the direct signal, less reverberation

More reflections = Less clarity on the direct signal, more reverberation

The amount of reflections wanted will depend on the purpose of the room / application.


Reflections are also usually divided into two:

Early reflections - usually occur within the first 5 (usually 15) to 100 milliseconds after the direct sound. They contribute to the perception of the depth and direction of the sound (shape and dimension). [5]

Later reflections – occur after the early reflections creating a diffused sound that define the size of the space. [5]

Reverberation helps enhancing the emotional impact. The size of the room have an impact on our emotional response. Smaller rooms are sensed as being calmer and more pleasant than large spaces.

Larger rooms, on the other hand also transmit a different feeling, one example is how they enrich the sound of an orchestra when being played in a large concert hall. [1]


“These reflections also make the orchestra appear physically wider than it is – an effect called source broadening, which listeners tend to like.” [1]


The acoustical properties of a room are dictated by their size, shape and layout.

Materials also have an important role, depending on their characteristics. Those will affect the reflections in them room. Denser materials tend to absorb more energy and that will result in less room reflections (reverberation). Each material has a different density value.


“Every time a sound wave bounces or reflects, it loses some energy. The most reverberant spaces have not only smooth walls, but also very simple shapes; this means they are man-made.” [1]


When sound travels through different layers of materials, there’ll be a loss of energy, but that amount of loss / dissipation is dependent on the air flow resistivity imposed by the materials. This is an important concept about sound isolation and sound treatment. They both aim at reducing sounds’ energy, but for different purposes.


Sound isolation aims at sonically isolate the room, impeding sounds entering or leaving, while sound treatment aims at managing the room’s acoustical reflection, resulting in an acoustical desired environment.

Hard surfaces tend to reflect more of the sound, making it audible for a longer period of time, while softer surfaces tend to absorb it, reducing reflections.

Like in the visual field, there are also sound illusions that play with our perception. One of the most common psychoacoustic illusion founded in contemporary sound applications is artificial reverberation. It shapes the acoustic environment of the sound, making it sound like it’s in a large hall or a small room, even although, the listener can be a few meters or even an inch from speakers or headphones. Today it can be easily found when listening to music, watching a film or playing a video game. It creates a bigger sense of immersion, contributing to a better experience.

There are primarily two types of artificial reverberation:

Convolution – it’s based on real word responses from physical places (impulse responses). Tends to sound more realistic.

Algorithmic – it’s based on mathematical algorithms that simulates the acoustics of a space. It is more flexible since you can easily manipulate the parameter.

Although artificial reverberation is an important tool, it’s still hard to replicate exactly ‘real world’ acoustics. If you aim to obtain a higher-fidelity sound, recording the sound in a real space (using real reverberation), it’ll sound more realistic than using artificial reverberation.


When using modern tools to add artificial reverberation, you have the possibility, depending on the tool, to adjust several parameters until you find the desired acoustical result.

Decay time - the time it takes for the last reflection to be heard

Room size – the size of the room will affect the reverberation

Revert start / Pre-delay – the time it takes the early reflections to happen after the direct signal

Density - it controls the number of reflections happen closely spaced in terms of time; higher density tends to create complex soundscapes that can result in a more ‘realistic’ sound

Shape – the shape of the room will affect the reflections, by manipulating this parameter, you can choose to have a different shape variation which will result in a different reverberation

Width – it allows you to control the width of the reverberation, making it sound more or less wider (in the horizontal axis)

Diffuseness – is what determines the directionality of the reflections; higher diffuseness will result in the reflections going to several different directions, while lower diffuseness, will result in a more focused set of reflections

Frequency balance – it’s used to shape the frequency balance of the reverberation (Low, Mid, High); depending on the tool, the crossover points between each frequency range can also be adjusted

Some advanced tools also let you also change the air roll frequency and air absorption model. Those are in charge of choosing at which frequency value (Hz) the air starts to absorb the higher frequencies and at what rate. If the air is denser, the absorption will be bigger, resulting in a reflection with less perceived highs, also dependent on the decay time.


Why is the frequency balance important in reverberations?

It’s one of the main aspects that help us perceive the depth of a sound. It’s a psychoacoustic concept, where sounds with loud high frequencies tend to sound closer and sounds with quieter high frequencies, tend to sound further.

More highs = closer distance

Less highs = further distance

When manipulating reflections, be aware that they can mask the original sound if both are played at the same time (direct + reflections). That can result in a loss of definition of the direct signal, since its transient will be masked. One way to prevent this and help bring clarity to the sound is by using a pre-delay on the reverberant signal (reflections).

Pre-delay can also help emphasize the time difference between a direct or reflected signal, creating a 3D like effect by pushing the reverberant sound ‘further away’.


Spatial awareness



“Spatial awareness is the awareness of the surrounding space and the location and position of our own body within it.” [8]


Usually a surrounding space is a dynamic environment that results from movements of surrounding objects, the observer or both. Our sense of awareness is a psychoacoustic phenomenon that correlates to the physical or virtual environment that we are located. This sensation of awareness can be made from different stimulus, but in this case we’ll focus on the auditory spatial awareness.


“Auditory spatial awareness is a three-dimensional (3-D) ability; hearing is the only directional human telereceptor that operates in a full 360º range and is equally effective in darkness as in bright light.” [8]


The auditory spatial awareness results from the human abilities that help us identify: the direction of a sound, the distance, the sound source and the characteristics of the physical space that affects the sound propagation. The first three psychoacoustic elements that contribute to the auditory spatial awareness are: auditory localization, auditory distance estimation and auditory spaciousness assessment.


Auditory localization (Vertical and horizontal axis)

It’s related to the spatial perception of where the sound comes from. This perception is usually done in terms of left-right and up-down.


Auditory Distance estimation (Depth axis)

This is the estimation that the listener makes when judging the distance from a sound. In other terms, it’s concerned with the close-further relation, sometimes called depth.


Auditory Spaciousness (Multidimensional)

Is the perception of what type of space the listener is surrounded. It’s not only dependent on the volume of the space, but also how sounds interact within that space. Unlike the previous aspects, this one is multidimensional. Doesn’t obey to a particular axis representation.


“The human auditory localization ability depends on a number of anatomical and physiological properties of the auditory system as well as on a number of behavioral factors. These properties and behaviors are referred to in the literature as localization cues.” [8]


A lot of these aspects can be manipulated with respective tools, depending on the speaker’s / headphone setup. A sound can be represented in space in several ways, such as mono, stereo, 5.1, atmos and others… Depending on the system used, in theory, the more speakers you could allocate to reproduce a sound, the more solutions you have to make it more immersive. The placement of the sounds according to the speaker’s sound field axis can be done using panning. There are different panning modes, such as balance, dual, binaural, where they allow to have a different control over the position of the sound elements.

Mono = one source

Stereo = two sources

5.1 = six sources (5 normal speakers, 1 subwoofer)

Atmos = minimum 8 sources (5.1.2 - 5 normal speakers, 1 subwoofer and 2 height speakers)

The pitch / frequency response of an element, also has a psychoacoustic affect in terms of auditory spatial awareness. In general:

Darker sounds tend to sound further away in terms of depth and at the bottom in terms of the vertical axis

Brighter sounds tend to sound closer in terms of depth and up in terms of the vertical axis


Dynamic Range


“The term dynamic range refers to the difference between the loudest and softest passages of a recording, it should not be confused with loudness or a program’s average level.” [5]

Dynamic range is technically defined by a European Broadcasting Union’s recommendation, called R-128.

Dynamic range can be divided into two categories:

Macrodynamics – is about the loudness difference between two sections of a song / project

Microdynamics – is related to the expression or rhythmic property of smaller nuances, such as transient quality

Knowing how to edit both macro and micro dynamics, is one of the most important steps in order to create a great sounding experience.

With macrodynamics you can shape the overall emotional journey impact, by creating contrast and tension between sections. Quieter sections will make louder sections having more impact.

With microdynamics, you can manipulate a sound according to the desired goal, by having a more or less perceived transient impact. The impact of transients are one of the most important things in audio editing. Preserving the microdynamics is vastly important since it’s where a lot of the excitement from the performance and acoustical information relies.

There are four possible dynamic range modifications: downward compression, upward compression, downward expansion and upward expansion.

Both the downward and upward compression will diminish the dynamic range of the sound, that’s why the term “compressing”. The difference between the soft and loud parts will be decreased. Downward compression will lower the loud part of the signal while upward compression will increase the quiet part.

Expansion is the opposite process of compression, it works by “expanding” the dynamic range. The difference between the soft and loud part will be increased.

Compression = Less dynamic range

Expansion = More dynamic range

The use of these tools will depend on the type of the desired effect and final application. Modern music is heavily compressed, resulting in less dynamic range. On the contrary, films are known for having a high dynamic range, resulting in an experience with more contrast and excitement.


Distortion


Distortion is a non-linear process that alters the sound by generating pitches / frequencies.

It can be dependent on the original sound or not. If it’s dependent on the original signal, it’s called harmonic distortion, which tends to sound musical, since it’s according to the musical pitches of the signal. If it’s not dependent on the original signal, it’s called non-harmonic distortion (intermodulation distortion), which tends to sound more like noise, since it’s not dependent on the musical pitches of the original signal.

Harmonic distortion = musical distortion

Non-harmonic distortion = noise

The distortion effect is usually seen as “artistic” and both uses have different purposes according to the artistic interpretation.

Although it’s “artistic”, it can also be problematic, damaging the fidelity of the sound. That’s why most high-quality audio equipment, in order to be faithful to the signal’s fidelity, should not introduce any type of distortion. That measure is known as total harmonic distortion (THD), where it represents the value that an audio equipment introduces in terms of distortion. Less THD will result in better fidelity equipment, which translates into better sonic clarity. In the digital realm, this is not an issue, since it can be “fully transparent”, without any THD.

Although distortion is a non-linear process, it can happen in both mediums, analog or digital. There are digital tools that purposely produce distortion or try to emulate analog devices. Plus, some signal processors, independently of the medium will produce non-linear distortion, such as clipping, compressing, saturation…


What's music?


Our brains have the capacity to process vibrations according to their sonic characteristics to create the perception of what we call “music”.

Think of music as patterns that establish a sense of familiarity and interact with each other’s, yet those patterns create anticipation / tension that finalizes with a sense of resolution. This will fit the majority of music, but there’s also a small percentage that avoids creating familiarity, tension and even resolution. It’s like creating a sequence of patterns that barely have anything in common and don’t have a particular resolution. Atonal music is one example.

Depending on the musical genre, the approach of composing music will be different, but we’ll explore some classical music concepts in an abstract way, to show how music is developed and interconnected. In classical music, the composers are known for exploring thematic ideas. What are thematic ideas? Are musical concepts, such as a simple melody that translates the composer’s narrative and the desired feeling.

From that thematic idea, usually the melody, the composer will figure out other elements that will work alongside it to shape the narrative and feeling. Melodies can be short or long musical phrases that are usually build by a motif. A motif is like a small part of the puzzle that can be known for its harmonic or rhythmic content that will appear during the music. It’s a building block of a thematic idea. It can appear in a melody but also in a harmonic element, that’ll depend entirely on the composer’s idea. To support the melody, there’s usually used a harmonic element(s). Harmonies can be simple or complex when interacting with other elements, such as melodies. It’s also possible to have more than one melody playing at the same time, known as counterpoint, which is an independent melody that also harmonically works with the main melody. There’s another supportive element called counter-melody, which is a musical phrase, not as relevant as a melody, but supports it in certain musical passages.

A music can be made with several thematic ideas that evolve using variations from the main thematic idea (concept). Those variations can happen in several elements such as: motif, melody, harmony, rhythm, dynamics, tempo, orchestration, articulation…

This is very similar to a designer’s approach, where everything is connected to the main concept but there are several interconnected elements around that evolve into different elements, conveying different emotions, according to the composer’s intent (designer).

This type of process can be more easily perceived in film soundtracks, which have a strong narrative component where music must act as a supportive role. Some composers like to work with thematic ideas that mirrors the characters or aspects of the film. This also helps to create thematic interactions and variations, according to the films interactions.

Sometimes the music is so well crafted that it transcends the film itself, no longer needing the visual aspect to support it. A great example of this achievement is the composer John Williams, where a lot of his music transcends the films.

Take a concentrated listening to his piece called Hewdig’s Theme and you’ll hear melodic themes and harmonies being used and re-used during the piece, with variations and complex interactions, yet the music perfectly describes a magical and mysterious feeling while creating a great narrative development.

Not everyone can write a beautiful melody, but everyone (at least the vast majority) can sing one. The brain has the ability to distinguish the notes, even if you don’t know them in the musical theory language. Music theory is a tool to communicate with musicians and composers, but it’s not music, it’s just a tool to convey information. Every genre usually has its set of rules, according to the genre’s music theory.

Music and emotions

“…through a lifetime of exposure to musical idioms, patterns, scales, lyrics and the associations between them. Each time we hear a musical pattern that is new to our ears, our brains try to make an association through whatever visual, auditory and other sensory cues accompany it; we try to contextualize the new sounds, and eventually, we create these memory links between a particular set of notes and a particular place, time, or set of events.” [3]

I’ve always been intrigued by how music can be so powerful in terms of convening emotions. I even wondered if such feelings were mainly due to a genetic predisposition or personal experiences. Although music is a largely form of entertainment and a lot of people get to enjoy it, for me, it has always been more deep than what the vast majority seems to feel.

When listening to music, our brain seems to be very active and having several brain regions interact with each other’s. This also result in an increase of blood flow to brain regions that are in control of emotions. While listening to music, our brain also releases dopamine, making us feel “rewarded”, which can also act as a decrease in pain perception. [6]

One of the most curious aspects that happens when listening to some specific music’s, is the intense emotional and psychophysiological response known as frisson. Some people might know it as chills, or thrills, but in the scientific community there’s a debate on how to explain such concepts and frisson seems more accurate.

Frisson is a “musically induced affect that shows close links to musical surprise and is associated with a pleasant tingling feeling, raise by body hairs and gooseflesh.”

This feeling is not only dependent on the western musical culture, since many cultures around the world also perceive music has a full-body experience.

Music has the ability to activate the autonomic nervous system, mainly when loud, very high, low frequency, or rapidly changing sounds occur. These properties will increase heart rate, respiratory depth and a skin response. There’s a mechanism in our brain that makes us feel sad when listening to a sad song, although initially, we were not sad, but we mirror the song’s feeling.

One way to induce frisson is when there’s an expectancy violation. Something that’s musical expected, but is not exactly delivered, yet it creates a peak emotional experience. You can achieve that by creating / manipulating variations of musical elements, such as harmony, rhythm, melody and so on... It’s always important to have the concept / narrative in mind when developing these variations, otherwise, the violation might be too abrupt and worsen the experience.

Here’s a small list of music’s where I personally feel frissons when listening to:

Max Richter – Infra 5

Max Richter – Spring 1

Ólafur Arnalds - Spiral


Sound for film (The big illusion)


Sound has a very important role in films, but it’s usually not as well recognized as it should be. Have you ever watched a muted film? What happens if you mute your favorite film? It’s easy to see that the experience can become boring or lose interest. Its role is so important that it binds the narrative with picture, creates immersion and adds a dimension of realism.

Although most viewers don’t know, but sound for films is a big illusion.

The whole process happens in three phases: pre-production, production and post-production.

In the pre-production phase, the person / team involved with sounds project is going to study the locations of the recordings, create and develop concepts and ideas about the overall sound according to the narrative / director’s vision… It’s more about planning but it also allows field recording to test ideas.

During the production phase of a film (shootings), there are too many noises going around, from personal, to ambience, to equipment and so on… To avoid capturing those noises, the recorders usually use a microphone that is suitable to pick up sound in the microphone direction and avoids everything else. Each microphone has a different sound field that can capture, called polar pattern. The microphones used during the shoots, are hyper cardioid. The good thing is that they can keep the dialogue relatively clean but they’ll lack recording more important information, such as ambience, objects and so on… Ideally, the person in charge of recording should record separately more sounds that can be important during the post-production phase (editing), such as ambience, sound effects… Each type of sound will require a different microphone(s) approach depending on the desired goal. For dialogue, one microphone is enough, which will be able to record a mono signal of dialogue, but for ambiences that would not be sufficient. Ambiences add realism and dimension that would require multiple microphones to record in an immersive format such as stereo, 5.1 or so on… Although this layering technique is a standard procedure, it won’t sound as realistic as recording everything in simultaneous due to how reverberation affects the sound.

After planning and recording, it’s time for post-production phase, which is in charge of editing all the sound. This phase is also more complex than meets the eye and can be divided into several roles such as: dialogue editor, foley artist, sound designer, re-recording mixer (dubbing mixer)...

During this phase it’s where we can manage to use the tools described later to help shape the experience, bringing it to the next level!

This phase is a mix of technical and creative approaches. Technical because it needs to clean and correct the recording sounds, plus keeping everything right under the loudness standards. Creative, because it allows to explore sonic ideas that will improve the narrative and enhance the picture flow. This not a linear process, because the technical knowledge will also be necessary to experiment with the creative solutions, and a creative solution (idea) can happen any time. It’s better to test an idea, even if it’s not sounding great than to lose the idea. Later you can always improve the technical aspect of the idea.

This is the phase where the illusion takes place and the post-production audio department will make sure that every layer of sound serves it purposes plus joining them all together to create a cohesive experience. The viewer will not notice that a vast majority of the sounds are introduced after the recording phase, but they’ll notice the emotional impact of all those sounds.

Like in the visual field, where there are several capturing / editing techniques, it also happens exactly the same in the sound field. One of the techniques that differ the most is the use of noise reduction. Some prefer to create a more artificial approach by reducing more noise and others prefer to keep it more realistic, by reducing less noise. It always depends on the scene and the quality of the recording, but I do personally prefer the more realistic approach, where noise reduction is kept to a minimum. After all, we do live in a noisy world and are used to it.


Sound is extremely valuable in terms of enriching an experience.


Nowadays, with the advancement of AI, the sound’s industry is starting to see major innovative solutions appearing. Some of these solutions are very creative, others try to automatize older processing tools / processes.


It’s important to remember that critical listening is crucial to judge what you hear.


In order to obtain the most accurate listening solution, you need to have a properly treated acoustic space, a great set of monitors and converters, and a lot of experience in active listening. The market is full of “automated and innovative solutions”, but be aware that a lot of them can produce more damage than actually help. The best solutions tend to come from experienced professionals with a background on physics, electronics, mathematics and software development.


Emotion is everything!

Understanding the technical aspects is very important to be able to accurately achieve the desired emotion, but the sense of feeling is even more crucial for a meaningful and impactful experience.








Looking for audio tips?

https://www.enhancingsound.com/audio-post-tips

https://www.digido.com/articles/


Looking for more specific sound theory?

https://www.sfu.ca/sonic-studio-webdav/cmns/Handbook%20Tutorial/index.html

https://digitalsoundandmusic.com

https://ethanwiner.com


Looking for audio post production services?

https://www.enhancingsound.com

Sources:

  • [1] - The Sound Book: The Science of the Sonic Wonders of the World - Trevor Cox

  • [2] - Evolution of the Mammalian Ear: An Evolvability Hypothesis - Anne Le Maître, Nicole D. S. Grunstra, Cathrin Pfaff, Philipp Mitteroecker

  • [3] - This Is Your Brain on Music: The Science of a Human Obsession - Daniel J. Levitin

  • [4] - How the ear works - natures’s solutions for listening William E. Brownell-

  • [5] - Mastering Audio: The Art and the Science - Bob Katz

  • [6] - https://www.pfizer.com/news/articles/why_and_how_music_moves_us

  • [7] - Structure and Interpretation of Rhythm in Music - Henkjan Honing

  • [8] - Auditory Spatial Perception: Auditory Localization - Tomasz R. Letowski, Szymon T. Letowski

  • [9] - Thrills, chills, frissons, and skin orgasms: toward an integrative model of transcendent psychophysiological experiences in music - LukeHarrison, PsycheLoui