DEV Community

Cover image for Workshop βš’οΈ - Create sound with sin(x) 🎧
ECecillo
ECecillo

Posted on β€’ Edited on

1

Workshop βš’οΈ - Create sound with sin(x) 🎧

Before we start

Hello πŸ‘‹, this article is part of a series on signal processing and is an opportunity for me to document the different concepts I had to grasp to code my audio encoder.

If this subject interests you and you want to learn more, I invite you to follow me or visit this post that I update as I write new articles 😁.

I am open to any remarks or suggestions for improvement; feel free to share your feedback in the comments to contribute to the enrichment of this series of articles πŸ’ͺ.

Happy reading πŸ˜‰.

Introduction

What I propose to you today is to conduct a small workshop consisting of creating a sound from a few basic mathematical formulas that will be based on the notions we have seen previously + new ones that will complete our understanding of analogue signals and digitisation.

The goal of this article is not to code a complex program but to illustrate concepts.

I will only give you the necessary algorithms to do it yourself, and at the end, I will put a sandbox where you can directly launch the program that I coded on my side with the language of my choice πŸ˜‰.

There will be no copy-paste in this article, so you will need to be at least comfortable with the language of your choice.

Settle in comfortably, grab something to drink β˜•οΈ, and if that's all good, let's go πŸš€!

Code part

Creating a Curve and Retrieving Points

We saw that the shape of an analogue signal fundamentally was based on sinusoidal curves.
Therefore, we will need to use our function sin(x)sin(x) for this.

Then, once we have our signal curve, we will need to digitise it. Usually, to go from an analogue to a digital signal, we already have small electronic equipment that does it for us: the ADCs. (Analog-Digital-Controller)

Analog-Digital-Controller image

A small illustration of an ADC that one can find on the internet πŸ˜‰.

However, for simplicity, we will not go through all the necessary steps to digitise an analogue signal, because for a sin(x)sin(x) curve, it would be a bit overkill πŸ˜….

We will detail the digitization process in another article.

To play a sound, we need values that we will calculate based on sin(x)sin(x) , we will proceed with what is called sampling.

Sampling

Sampling is the first step in analog-to-digital conversion (digitization).
It consists of taking "pictures" or samples of the analogue signal at regular intervals to be able to retrieve the original signal from key points.

This step is very important since the rest of the digitization will be based on these data!

The frequency at which these samples are selected is called sampling frequency.

But wait, I've already heard of frequency, does it correspond to this then?

No, that's why I will clarify a bit the difference between the two from here and specify their impact on the sound we perceive to avoid confusion 😁.

Frequency of an Analog Signal

The frequency of an analogue signal indicates how many cycles occur in one-second.
For example, a frequency of 1Hz1Hz indicates that one wave takes one second to complete a cycle.

Increasing the frequency produces sounds of different intensities and pitches.

You can find examples here: Examples of the impact of frequency on sin(x)sin(x)

We will see in another article the relationship between frequency and musical notes 🎼.

Sampling Frequency

The sampling frequency indicates the number of samples we take per second to represent the analogue signal digitally.
The higher the sampling frequency, the more precise the digital representation will be.

Frequency of an Analog Signal and Sampling Frequency

Analogy with sound frequency and sampling rate

Understanding the Correlation Between Wave Frequency and Sampling

  • Frequency of a signal: It determines the number of zigzags or cycles the signal has over a certain distance.
  • Sampling frequency: It determines how frequently we place points along these zigzags to capture the signal's information.

If you increase the signal frequency (more zigzags) but keep the sampling frequency constant (the markers are always spaced the same distance), then yes

, you will cross more zigzags, but you will still cross the same number of markers over a given distance.

In terms of audio, this means that if you have a constant sampling frequency (say 44.1 kHz <=> 44,100 samples per second), but you increase the frequency of the wave you generate, you will have more wave cycles (more "high-pitched notes") in the same period, but each note will still be represented by the same number of samples.

According to the Nyquist-Shannon theorem, the sampling frequency must be at least twice as high as the highest frequency present in the analogue signal to avoid aliasing (signal distortion caused by higher frequencies).

However, the more we increase the number of samples per second, the larger our file size will be because we need to store much more information.

Illustration of the Nyquist-Shannon Principle

The main idea of the theorem is that we must be able to place at least 2 points per cycle on a signal to identify its curve.

If we take the following example where we have a signal of 20Hz20Hz with a sampling frequency of 30Hz30Hz , then we might end up with the following result:

Undersampling graph with sin function

We have more than 1 point for one of the cycles which might complicate retrieving the original signal 😬.

This time let's see what happens with a sampling frequency of 80Hz80Hz :

Good sampling graph of sin function

A very important thing to observe here is the phase value of the signal which is zero. (this is important for later)

The samples represented by our red crosses are calculated every 12.5ms12.5ms which we calculate from:

1sampling frequency=... seconds\frac{1}{\text{sampling frequency}} = \text{... seconds}

Here, it is therefore 180Hz=0.0125s\frac{1}{80Hz}=0.0125s or 125ms125ms after conversion to milliseconds.

But wait, we said that twice the sampling frequency was enough, why did we take four times that of the wave frequency here?

If we had taken a frequency of 40Hz40Hz here's what it would have looked like:

Bad sampling because of the phase

Strange, right? Why are all our points at zero here?

Because of our phase value!

Yes, it can skew our sampling and make us believe that our original signal is flat.

This is a situation called undersampling and illustrates why choosing the sampling frequency and a value of sampling phase (not to be confused with the signal's phase value) can be important.

Sampling Phase

As the phase corresponds to where our signal starts, the sampling phase tells us where (amplitude) we will begin our sampling. We create a small offset so as not to be impacted by the signal's phase value.

If I now define an offset of 1ms1ms for our sampling phase, I should start at a moment where our curve is not at an amplitude of 0.

Graph with adjusted sampling phase

Great! We can now manage to observe when our curve seems to rise and fall.

OK, I understand why we need a high frequency, but I see the frequency 44.1kHz44.1 kHz everywhere, why do we use this sampling frequency?

The Human Ear and Science

Humans are capable of hearing sounds that can vary between frequencies of 20Hz20Hz and 20kHz20kHz .
As we saw earlier, the Nyquist-Shannon theorem indicated that we needed to have a sampling frequency twice as high as the frequency of a wave.

So 2βˆ—20kHz2*20kHz (because it's the highest frequency that the human ear can perceive) gives us 40kHz40kHz .

We're not far from our 44.1kHz44.1kHz , but why do we have 4.1kHz4.1kHz extra?

There are several reasons for this, but the main one today is the following:

  • Margin for anti-aliasing filters:
    • We keep a margin of 4.1kHz4.1kHz to allow the design of filters that eliminate frequencies that go beyond the audible bandwidth (wave frequency higher than 20kHz20kHz )

We will discuss the concept of psychoacoustics a bit later when we talk about quantization and signal filtering πŸ˜‰.

Hey! I wanted to create a sound, not read a Lecture on signal processing!

Now, we should have almost all the elements to determine our values on our sin(x)sin(x) curve and listen to the sound generated by it!

Sampling with sin(x)sin(x)

We will need the following values:

  • The frequency of our signal.
  • The sampling frequency.
  • The desired duration for our signal.

Today, we will play the pitch A440 (standard pitch) which corresponds to a signal frequency of 440Hz440Hz and we will sample it with a frequency of 44.1kHz44.1kHz because we can afford it, right!

Lastly, I decided that we would have a duration of 4s4s for our signal.

To summarize:

  • Signal frequency: 440Hz440Hz
  • Sampling frequency: 44.1kHz44.1kHz
  • Duration: 4s4s

If we make a graph of this thing, zooming in on the first ten milliseconds (otherwise it would be unreadable), we will have:

Sinusoidal curve at 440Hertz

All this is cool, but now let's move on to our program for calculating points on these curves.

Do you remember in the first article when I started to represent my sin(x)sin(x) curve over a radians interval and explained that it was practical to express it in radians because 2Ο€2Ο€ corresponds to a period for sin(x)sin(x) ?

In fact, the 2Ο€2Ο€ is much more useful than that; in trigonometry, it helps us for the sine and cosine functions to turn around a circle and define angles all along it.
We can express these angles in radians ( ππ ) or in degrees ( 2Ο€=360Β°2Ο€ = 360Β° )

Trigonometric circle with radians and degrees associated

Uh ok, but why are you talking about this again πŸ₯Ά?

Actually, we need a way to know how I will move on my curve to find my points.

We need a kind of compass that will indicate to us, considering all the samples and the duration of our signal, in which direction we will move on our sin(x)sin(x) curve according to the curve's frequency.

Our compass will therefore be an angle, and to have an angle, we can use our 2Ο€2Ο€ .

We need to have a constant angle that considers all the samples we must have over the total duration of the signal, which is here four seconds.

Reminder: If the sampling frequency is the number of samples per second that we can have.

Then, calculating the total number of samples over a four-second interval consists of multiplying the sampling frequency by the number of seconds:

nsamps=4βˆ—44100\text{nsamps} = 4 * 44100

We just need to calculate the constant angle of all our samples:

angle=2Ο€nsamps\text{angle} = \frac{2Ο€}{nsamps}

We will proceed with the following steps:

  1. Calculate the total number of samples to be taken over the four seconds. nsampsnsamps

  2. Calculate the angle ΞΈΞΈ for all our samples nsampsnsamps .

  3. For each sample ii relative to the total number of samples nsampsnsamps

    1. Calculate the value samplesample of the sample ii with respect to sin(ΞΈβˆ—frequencyβˆ—i)sin(ΞΈ * frequency * i)

In algorithm:

Pre-define: 
    Duration, SampleRate, Frequency

Start:
    nsamps <- Duration * SampleRate
    angle <- (2 * Ο€)/nsamps
    foreach i of nsamps
        sample <- sin(angle * Frequency * i)
End.
Enter fullscreen mode Exit fullscreen mode

I invite you to first code this rather simple algorithm, but which already does everything we wanted.

Storing Our Values

I don't know if you remember, but in the first article, I introduced you a bit to digital signals, I told you that these were a discrete representation (in binary form) of a signal.

To store our sample values in the form of a digital signal, we need to transform these floating values into binary!

This is a very important process that usually occurs in the last stages of digitization to be able to store the processed information on our computers.

However, the more computer-savvy among you may know that depending on your OS, the computer does not read and store byte sequences (8 bits) in the same direction.

Little-Endian and Big-Endian

There are thus two types of orders depending on your OS:

  • Big-Endian: the most significant bit is first (at the lowest memory address), the least significant bit is last (at the highest address).
  • Little-Endian: the least significant bit is first (at the lowest memory address), the most significant bit is last (at the highest address).

See it a bit like the difference between reading a novel and reading a manga:

  • We read a novel from left to right where the most important information will be at the end of the book. (Little-Endian)
  • The manga, we read from right to left. (Big-Endian)

If you are coding the program on your side, I invite you to type the following command to find out if your OS is Little-Endian or Big-Endian.

lscpcu | grep "Byte Order"
Enter fullscreen mode Exit fullscreen mode

We will therefore complete our algorithm to be able to write the content of the variable samplesample in our loop in a file:

Pre-define: 
    Duration, SampleRate, Frequency

Start:
    --- New ---
    fileName <- "out.bin"
    fd <- os.Create("path/to/write/file"+ fileName)
    --- EndNew ---

    nsamps <- Duration * SampleRate
    angle <- (2 * Ο€)/nsamps
    foreach i of nsamps
        sample <- sin(angle * Frequency * i)

        --- New ----
        bufByte <- LittleEndian(sample)
        byteWritten <- fd.write(bufByte)
        show (" Wrote " + samples " in byte " + byteWritten)
        --- EndNew ---
End.
Enter fullscreen mode Exit fullscreen mode

If you have managed to code this algorithm, you should now have a file in which there is a binary representation of your signal πŸ’ͺ πŸ₯³.

To listen to it, you can use software like Audacity and open the file as a "raw audio file".
You just need to switch to mono-channel and select the correct encoding.

Otherwise, you can execute the following command, but you will need to install FFMPEG on your machine:

ffplay -f f32le -ar  44100 -showmode 1 out.bin
Enter fullscreen mode Exit fullscreen mode

Demo πŸ–₯

Sandbox JavaScript + Bun πŸ₯Ÿ 😁

To Conclude

First of all, congratulations for making it this far, I know it must not have been easy, but I hope you learned a few things and had a little fun.

If you are interested in this series, you can follow me so as not to miss new articles and leave a little comment if you like it.

Otherwise, I'll see you next time for the next article which will be a bit softer on math, frequency, and music 🎧.

Image of Bright Data

High-Quality Data for AI – Access diverse datasets ready for your ML models.

Browse our extensive library of pre-collected datasets tailored for various AI and ML projects.

Explore Datasets

Top comments (0)

Sentry image

See why 4M developers consider Sentry, β€œnot bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

πŸ‘‹ Kindness is contagious

Immerse yourself in a wealth of knowledge with this piece, supported by the inclusive DEV Communityβ€”every developer, no matter where they are in their journey, is invited to contribute to our collective wisdom.

A simple β€œthank you” goes a long wayβ€”express your gratitude below in the comments!

Gathering insights enriches our journey on DEV and fortifies our community ties. Did you find this article valuable? Taking a moment to thank the author can have a significant impact.

Okay