Comparative Analysis of Songs through Audio Visualization in Python


In the world of music, there is an undeniable connection between sound and visuals. From album covers to music videos, artists have long recognized the power of visual representation to enhance the listening experience. However, what if we could go beyond the surface and delve into the very essence of a song through audio visualization? In this article, we will explore the concept of audio visualization and how it can be used to analyze and compare songs using Python.

Understanding Audio Visualization

Audio visualization is the process of representing sound in a visual form. It allows us to see the various components of a song, such as frequency, amplitude, and time, in a graphical format. By visualizing audio, we can gain insights into the structure, dynamics, and overall characteristics of a piece of music.

The Power of Python

Python, a versatile programming language, offers a wide range of libraries and tools for audio analysis and visualization. One such library is librosa, which provides a simple interface to extract useful information from audio signals. By leveraging Python and librosa, we can perform comparative analysis of songs and uncover interesting patterns and relationships.

Extracting Audio Features

Before we can visualize songs, we need to extract relevant audio features.

What is Audio Feature Extraction?

Imagine you’re trying to describe a painting to someone. Instead of describing every single stroke, you’d probably focus on the major features: “It has a big yellow sun, a blue river flowing through the middle, and a red house on the left.” By doing this, you’re extracting the most noticeable and relevant features of the painting. Here audio feature extraction is a way of making computers do something similar on audio files.

Mel-frequency cepstral coefficients (MFCCs)

In simple terms Mel-frequency cepstral coefficients are like describing sounds to a computer. They break sounds into tiny pieces, mimicking how we, humans, hear different pitches. Some pieces are more important, just like our ears focus on certain frequencies. Computers then turn these important sound pieces into numbers, creating a code that helps them recognize and differentiate sounds. They’re used in speech and music tech for tasks like understanding emotions in voices.

MFCCs essentially transform an audio signal into a set of features that capture the “character” of the sound in a way that’s meaningful and relevant to human hearing. For speech and voice recognition tasks, MFCCs help the computer to focus on the parts of the sound that matter most to us.

MFCCs can also be used to represent the timbre of a musical instrument. Spectral contrast measures the difference in amplitude between peaks and valleys in the frequency spectrum, providing information about the perceived brightness of a sound.

MFCC extraction is rather straightforward using librosa:

import librosa

# Load the audio file
y, sr = librosa.load('shape_of_you.wav')

# Compute the MFCCs
mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=4)

n_mfcc=4 means that for each frame of the audio, we want 4 coefficients to describe its spectral characteristics. A more conventional number is 13. It’s been found that the first 13 coefficients are usually sufficient to represent the timbral texture of an audio signal for many tasks. However, depending on the application, you might choose to extract more or fewer coefficients. Feel free to experiment!

Visualizing audio features

Once we have extracted the audio features, we can visualize them to gain a better understanding of the songs. Python offers various libraries for creating visualizations, such as Matplotlib and Seaborn. These libraries allow us to generate plots, histograms, and heatmaps to represent the audio features in a meaningful way.

For instance, we can create a spectrogram, which displays the frequency content of a song over time. By analyzing the spectrogram, we can identify different sections of a song, such as verses, choruses, and bridges, based on their distinct frequency patterns. We can also compare the spectrograms of different songs to identify similarities or differences in their overall structure.

Visualization part of the code:

import seaborn as sns
import matplotlib.pyplot as plt

# Use Seaborn to visualize MFCCs
plt.figure(figsize=(10, 4))
sns.heatmap(mfccs, cmap='viridis', yticklabels=False, xticklabels=False)
plt.title('MFCCs of the audio: Shape of You')
plt.ylabel('MFCC Coefficients')

Comparative Analysis of Songs

Now that we have the tools to visualize audio features, we can perform comparative analysis of songs. By comparing the visual representations of different songs, we can uncover interesting insights and patterns.

For example, we can compare the MFCCs of two songs to determine their similarity in terms of timbre. If the MFCC patterns of two songs are similar, it suggests that they share similar instrumental characteristics. On the other hand, if the MFCC patterns are different, it indicates a variation in the timbre of the songs.

Case Study: Comparative Analysis of Pop Songs

To illustrate the power of audio visualization in comparative analysis, let’s consider a case study of analyzing pop songs. We will compare the audio features of two popular pop songs: “Shape of You” by Ed Sheeran and “Bad Guy” by Billie Eilish.

By visualizing the spectrograms of these songs, we can observe the differences in their overall structure. “Shape of You” exhibits a more consistent frequency pattern throughout the song, indicating a repetitive structure. On the other hand, “Bad Guy” shows more variation in its frequency patterns, suggesting a more dynamic structure.

By comparing the MFCCs of these songs, we can identify differences in their timbre. “Shape of You” has a smoother and more uniform MFCC pattern, indicating a consistent timbre. In contrast, “Bad Guy” displays a more jagged and varied MFCC pattern, suggesting a diverse timbre.


Audio visualization in Python offers a powerful tool for comparative analysis of songs. By extracting and visualizing audio features, we can gain insights into the structure, dynamics, and overall characteristics of music. Whether it’s comparing the timbre of different songs or identifying tonal centers, audio visualization allows us to go beyond the surface and explore the very essence of music. With Python and libraries like librosa, the possibilities for audio analysis and visualization are endless. So, let’s dive into the world of audio visualization and uncover the hidden gems within our favorite songs.

Similar Posts