Given
that I am dyslexic and also suffer from aphantasia, which is a fascinating quirk of the brain, one can imagine I have sort of a love / hate relationship with reading.
On the one hand, I must keep up with the news and trends of the software engineering industry but the serious and useful information I require, is in written form and that always drains my reading battery
because of the above average amount of concentration I have to put forward.
When
daylight saving Weekend rolled around, I decided to do something useful with the "additional" hour, so I set out to improve this situation for people like me.
I decided to build something that allows us to take the text from any article
in any language
(English, German, Romanian in my case) and convert it to an mp3
file so that we can "listen to the article" and not drain our weak reading battery
.
Then
I remembered one of the most important UNIX principles:
"DO ONE THING AND DO IT WELL"
so I challenged myself to:
- use
WELL DONE
existingTHINGS
- write only a shell script
- have less than 31 lines (today is halloween 🎃 )
- uses only
cli tools
,pipes
,shell commands
- time box of
1h
.
🤪
I did it! But this was only possible because so many wonderful people have developed so many great projects and shared them with the rest of us. There exists a multitude of software out there and any UNIX based OS allows us to interconnect it seamlessly, simply amazing 🤩
Result
The script:
#!/bin/sh
# this script does some text editing for the:
# $1 - input file
# and stores it into the:
# out-$1 - output file
# which it then later utilises to CURL to http://localhost:5002/api/tts running:
# docker run -it -p 5002:5002 synesthesiam/mozillatts:en
# you can replace the TAG at the end with any language supported by TTS
# it will:
# - produce a .wav file for each sentance in the outputfile
# - join the wav files into a single one
# - turn it into a .mp3 file named audio-$1.mp3
cat "$1" | awk -F'\.' '{ for (i=1; i<NF; i++) print $i ".\n" }' | tr "'" "´" |tr "\"" "´" > out-"$1"
input="out-$1"
while IFS= read -r line
do
curl --location --request POST 'http://localhost:5002/api/tts' --header 'Content-Type: text/plain' --data-raw "$line" -o "audio_$(date +%s).wav"
done < "$input"
ls audio*wav |awk '{print "file " $0}' > wav-list.txt
ffmpeg -f concat -safe 0 -i wav-list.txt -vn -ar 44100 -ac 2 -b:a 128k audio-$1.mp3
rm out-$1
rm *wav
rm wav-list.txt
open "audio-$1.mp3"
The list of ingredients:
- Mozilla/TTS
- FFMPEG cli this thing went to Mars 🔴
- UNIX based OS
- docker cli
- the above script
The "recipe":
# in your terminal run:
docker run -it -p 5002:5002 synesthesiam/mozillatts:en
# replace the 'en' tag with 'de', 'fr', 'ro' etc.
# select and copy the text you want to listen to
# paste it into a file: `article.txt`
# save the script as `audiofy.sh` next to the text file
# in the terminal run:
sh audiofy.sh article.txt
Demo
I used a snippet from the README file of the fantastic Mozilla/TTS for my demo i.e:
Have a listen to the output on soundcloud. (use the open in new tab
function so you can see the text and marvel at the natural sounding
synthetic voice)
Conclusion
Even tough I set only 1h
for this project, I do love the result and will return to it for improvements. Implementing this was way too much fun.
Also in the time I wrote this post I listend to most of the news articles that popped up on my Google News feed, because I had converted them as a test for my script.
Synergy: achieved!
Productivity: increased!
Reading battery: protected!
Top comments (2)
So... You invented a screen reader, that instead of reading, compiles the text into an audio file..?
Not a bad summarisation, yes it turns text into .MP3 but it's all local on your machine (no privacy concerns). And with TTS you could train and use your own voice. Also it produces files longer than 34 sec which currently is a limitation of TTS.
PS: I did not invent any of it, I just did "some plumbing" by pipe-ing together some really cool tools!