How to use phone dial tone as an interaction controller, and decode DTMF signals

The first article in 2020, accidentally picked an Old School topic.

There was a section in 名探偵コナン 戦慄の楽譜フルスコア(Detective Conan:Full Score of Fear), released ten years ago. Konan was standing in the middle of the water. First, a world wave shot down the phone receiver on the shore, then closed his eyes and shouted loudly. 110 alarm calls were broadcast remotely.


This time I will talk about how to use sound to make a phone call.
And advanced content – using sound waves as a carrier to play interaction and decode the DTMF signals.

Konan not only demonstrated the final effect, but also explained the principles clearly. This is simply the style of mine. XD


Break it down a bit with English and lengthen the article.


The DTMF mentioned by Konan is an abbreviation of Dual-Tone Multi-Frequency.

When we make a call, especially in the era of feature phones with physical keys more than a decade ago, each number key pressed corresponds to a unique dial tone.
The phone call is actually dialed by these dial tones, not the buttons.

Each dial tone is a combination of two frequency sound waves. And the form of frequencies and numbers, Konan just prepared:


For example 1 of 110, which is composed of low frequency 697Hz and high frequency 1209Hz.
And 0, is composed of low-frequency 941Hz and high-frequency 1336Hz.

Konan and the lady according to the form above, issued low-frequency and high-frequency combined three tones – 110, so as to call for help.

In the early years, many people imitated this process, using the piano to make phone calls, using the windows media player, and also writing programs to synthesize.
In fact, as long as the sound of the correct frequency is enough.


You may doubt that, if you dial someone’s phone, or use your phone bank to enter the password, if someone is monitoring or recording, the phone number and password may be leaked. Is it?

Answer: YES w(゚Д゚)w

But don’t worry too much, after all, the subject of this article belongs to Old School. Today’s telephone system is no longer a traditional telephone network, and banks have measures:

“The call center industry and the banking industry have long had a countermeasure (Jamming tone). Before and after entering the password, some useless and unordered DTMFs are played.”


Let’s review the form of Konan:


DTMF has strong anti-interference. There is no harmonic relationship among the 8 audios in the picture, which reduces signal interference.

Its principle is very simple, each row and each column has a frequency, and then combined a dial tone.

4 rows and 4 columns = 16 tones. That is, if DTMF is made into a controller (OSC controller etc.), there are 16 switches even if only the number of single tones is pressed.
If you divide it according to the dialing interval, there are more, such as 110 to control the light switch, 114 to control the particle effect switch, 119 to fireworks …

Analogous to the OSC controller interaction process, DTMF interaction is also divided into sending and receiving ends.

Let’s talk about the sender first.

The piano in front, Mr. Konan ’s voice, and the phone dialing are all senders, but we still need to look more “serious”.

Give a few examples of MaxMSP and JavaScript.

MaxMSP example 1

There is an example in MaxMSP’s own, I have never noticed before writing this article.
See the MaxMSP’s menu:
Help - Examples - synths - dialer


Note that the 9 and two frequencies in the red box are the same as those in the Konan’s form.

The sound is like this:

MaxMSP example 2


One of the examples of “Designing Sound” is DTMF dialing. Someone gave a MaxMSP implementation of all the examples in this book:


The red box in the picture is still the frequency form of Konan.

JavaScript example

The implementation of the front-end is the same as the principle of MaxMSP. You can call the WebAudio API to play according to the DTMF frequency form.
Here is an example:



The receiving process is to collect the audio of the microphone and then decode it to get the dialed number.

For audio collection, general programs have good APIs for calling, so the focus is on how to decode DTMF.

DTMF is also a piece of audio, and it is transmitted at a specific frequency. So decoding is to find a way to get a specific frequency in the sound wave.
Naturally, Fourier appears again.
Using the Fourier transform, the sound waves in the time domain are transformed into the frequency domain, and the analysis can be easily carried out.

In actual programming operations, DFT (Discrete Fourier Transform) related techniques are still used.

An algorithm is used here: Goertzel algorithm.

The Goertzel algorithm can use the periodicity of the sequence to reduce the amount of calculation. When it only needs to calculate the DFT value on a part of the frequency in the range of 0 ~ 2π, it is more flexible and effective than the FFT which needs to calculate all the values, that is suitable for DTMF decoding.

Pure Data example
Sure enough, as soon as I encountered the hard core part, MaxMSP resources were relatively small, pulling out my old friend Pure Data!


JavaScript example

Goertzel.js, you can see from the name, this can not only be used for DTMF decoding, but also a comprehensive Goertzel algorithm library, which can be used for various other projects (instrument tuning, decoding FSK, creating spectrum plots, etc).

It comes with a demo of DTMF decoding.
I used MaxMSP and mobile phone to dial directly to test, please watch the video:

 It is proved that the transmitted numbers are indeed transmitted to the receiver through audio signals or sound waves, instead of going through the network like OSC.

Reference Resources

Talk is cheap. Show me the code!

You can find more resources in my site

You may buy me a coffee in my Patreon.
There are many articles, patches, source code and some advanced Patron-only content there.

Your encouragement is my driving energy!