DEV Community

jeikabu
jeikabu

Posted on • Originally published at rendered-obsolete.github.io on

Home Assistant Voice Recognition with Snips

After the initial install of Home Assistant, I’ve been eager to get some basic voice recognition working. One of my early goals was for it to be “offline”; meaning, not use Amazon or Google.

Hardware

I was originally working with the HD-3000, but wasn’t very happy with the recording quality. I’m still experimenting with the ReSpeaker, but it definitely seems better. In any case, configuration was pretty similar- and likely the same goes for any other USB microphone.

Basic Alsa Audio

First, we need to get audio working; both a microphone and speaker.

Good, concise documentation that explains what’s going on with Raspberry Pi/Debian audio has eluded me thus far. Most of this is extracted from random forum posts, Stack Overflow, and a smattering of trial and error.

You can record from a microphone with arecord. Abridged arecord --help output:

Usage: arecord [OPTION]... [FILE]...

-l, --list-devices list all soundcards and digital audio devices
-L, --list-pcms list device names
-D, --device=NAME select PCM by name
-t, --file-type TYPE file type (voc, wav, raw or au)
-c, --channels=# channels
-f, --format=FORMAT sample format (case insensitive)
-r, --rate=# sample rate
-d, --duration=# interrupt after # seconds
-v, --verbose show PCM structure and setup (accumulative)
Enter fullscreen mode Exit fullscreen mode

List various devices. arecord -l:

****List of CAPTURE Hardware Devices****
card 1: Dummy [Dummy], device 0: Dummy PCM [Dummy PCM]
  <SNIP>
card 2: ArrayUAC10 [ReSpeaker 4 Mic Array (UAC1.0)], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
Enter fullscreen mode Exit fullscreen mode

Then, arecord -L:

null
    Discard all samples (playback) or generate zero samples (capture)
default
<Bunch of CARD=Dummy>
sysdefault:CARD=ArrayUAC10
    ReSpeaker 4 Mic Array (UAC1.0), USB Audio
    Default Audio Device
<Bunch of CARD=ArrayUAC10 speakers/output>
dmix:CARD=ArrayUAC10,DEV=0
    ReSpeaker 4 Mic Array (UAC1.0), USB Audio
    Direct sample mixing device
dsnoop:CARD=ArrayUAC10,DEV=0
    ReSpeaker 4 Mic Array (UAC1.0), USB Audio
    Direct sample snooping device
hw:CARD=ArrayUAC10,DEV=0
    ReSpeaker 4 Mic Array (UAC1.0), USB Audio
    Direct hardware device without any conversions
plughw:CARD=ArrayUAC10,DEV=0
    ReSpeaker 4 Mic Array (UAC1.0), USB Audio
    Hardware device with all software conversions
Enter fullscreen mode Exit fullscreen mode

To record using the ReSpeaker (card ArrayUAC10):

# `-d 3` records for 3 seconds (otherwise `Ctrl+c` to stop)
# `-D` sets the PCM device
arecord -d 3 -D hw:ArrayUAC10 tmp_file.wav
Enter fullscreen mode Exit fullscreen mode

It may output:

Recording WAVE 'tmp_file.wav' : Unsigned 8 bit, Rate 8000 Hz, Mono
arecord: set_params:1299: Sample format non available
Available formats:
- S16_LE
Enter fullscreen mode Exit fullscreen mode

Like arecord -L says, hw: is “Direct hardware device without any conversions”. We either need to record in a supported format, or use plughw: (“Hardware device with all software conversions”). Either of these work:

arecord -d 3 -D plughw:ArrayUAC10 tmp_file.wav
# `-f S16_LE` signed 16-bit little endian
# `-c 6` six channels
# `-r 16000` 16kHz
arecord -f S16_LE -c 6 -r 16000 -d 3 -D hw:ArrayUAC10 tmp_file.wav
Enter fullscreen mode Exit fullscreen mode

You can get a list of supported parameters with arecord --dump-hw-params -D hw:ArrayUAC10:

HW Params of device "hw:ArrayUAC10":
--------------------
ACCESS: MMAP_INTERLEAVED RW_INTERLEAVED
FORMAT: S16_LE
SUBFORMAT: STD
SAMPLE_BITS: 16
FRAME_BITS: 96
CHANNELS: 6
RATE: 16000
PERIOD_TIME: [1000 2730625]
PERIOD_SIZE: [16 43690]
PERIOD_BYTES: [192 524280]
PERIODS: [2 1024]
BUFFER_TIME: [2000 5461313)
BUFFER_SIZE: [32 87381]
BUFFER_BYTES: [384 1048572]
TICK_TIME: ALL
--------------------
Enter fullscreen mode Exit fullscreen mode

In online resources you’ll see values similar to hw:2,0, which means “card 2, device 0”. Looking at the arecord -l output, it’s the same as hw:ArrayUAC10 since the ReSpeaker only has the one device.

You can play the recorded audio with aplay. Looking at the output from aplay -L, I can:

aplay -D plughw:SoundLink tmp_file.wav
Enter fullscreen mode Exit fullscreen mode

There’s at least two configuration files that can affect behaviour of arecord/aplay:

  • /etc/asound.conf
  • ~/.asoundrc

For example, after changing the default sound card via Audio Device Settings my ~/.asoundrc contains:

pcm.!default {
    type hw
    card 2
}

ctl.!default {
    type hw
    card 2
}
Enter fullscreen mode Exit fullscreen mode

If I check aplay -l, “card 2” is my Bose Revolve SoundLink USB speaker:

****List of PLAYBACK Hardware Devices****
card 0: ALSA [bcm2835 ALSA], device 0: bcm2835 ALSA [bcm2835 ALSA]
  <SNIP>
card 0: ALSA [bcm2835 ALSA], device 1: bcm2835 IEC958/HDMI [bcm2835 IEC958/HDMI]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: ALSA [bcm2835 ALSA], device 2: bcm2835 IEC958/HDMI1 [bcm2835 IEC958/HDMI1]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 1: Dummy [Dummy], device 0: Dummy PCM [Dummy PCM]
  <SNIP>
card 2: SoundLink [Bose Revolve SoundLink], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
Enter fullscreen mode Exit fullscreen mode

Voice Recognition with Snips

At first I was thrilled to find Snips:

But after initial (successful) experimentation, there’s a few largish problems:

Oops. Hopefully it will return in another form.

Install

Following the manual setup instructions:

sudo apt-get install -y dirmngr
sudo bash -c 'echo "deb https://raspbian.snips.ai/$(lsb_release -cs) stable main" > /etc/apt/sources.list.d/snips.list'
sudo apt-key adv --fetch-keys https://raspbian.snips.ai/531DD1A7B702B14D.pub
sudo apt-get update
sudo apt-get install -y snips-platform-voice
Enter fullscreen mode Exit fullscreen mode

Create an Assistant

An “assistant” defines what voice commands Snips handles. Need to create an assistant via the (soon to be shutdown) Snips Console:

  1. Click Add an App
  2. Click + of apps of interest
  3. Click Add Apps button
  4. Wait for training to complete
  5. Click Deploy Assistant button
  6. Download and install manually

Install the assistant:

pc> scp ~/Downloads/assistant_proj_XYZ.zip pi@pi3.local:~
pc> ssh pi@pi3.local

sudo rm -rf /usr/share/snips/assistant/
sudo unzip ~/assistant_proj_1mE9N2ylKWa.zip -d /usr/share/snips/
sudo systemctl restart 'snips-*'
Enter fullscreen mode Exit fullscreen mode

At this point Snips should be working. If triggered with the wake word (default is hey snips), it should send “intents” over MQTT.

Verification/Troubleshooting

Check all services are green and active (running):

sudo systemctl status 'snips-*'
Enter fullscreen mode Exit fullscreen mode

Initially, the Snips Audio Server was unable to start. Check output in syslog:

tail -f /var/log/syslog
Enter fullscreen mode Exit fullscreen mode

It was unable to open the “default” audio capture device:

Dec 5 07:22:25 pi3 snips-audio-server[28216]: INFO:snips_audio_alsa::capture: Starting ALSA capture on device "default"
Dec 5 07:22:25 pi3 snips-audio-server[28216]: ERROR:snips_audio_server : an error occured in the audio pipeline: Error("snd_pcm_open", Sys(ENOENT))
Dec 5 07:22:25 pi3 snips-audio-server[28216]: -> caused by: ALSA function 'snd_pcm_open' failed with error 'ENOENT: No such file or directory'
Enter fullscreen mode Exit fullscreen mode

We could set the “default” device. Or, /etc/snips.toml contains platform configuration where we can specify values from above:

[snips-audio-server]
alsa_capture = "plughw:ArrayUAC10"
alsa_playback = "plughw:SoundLink"
Enter fullscreen mode Exit fullscreen mode

snips-watch shows a lot of information:

sudo apt-get install -y snips-watch
snips-watch -vv
Enter fullscreen mode Exit fullscreen mode

I installed the weather app. So, if I say, “hey snips, what’s the weather?” snips-watch should output:

[15:00:52] [Hotword] detected on site default, for model hey_snips
[15:00:52] [Asr] was asked to stop listening on site default
[15:00:52] [Hotword] was asked to toggle itself 'off' on site default
[15:00:52] [Dialogue] session with id 'e39a4367-e167-467c-912a-e047f49bea7a' was started on site default
[15:00:52] [Asr] was asked to listen on site default
[15:00:54] [Asr] captured text "what 's the weather" in 2.0s with tokens: what[0.950], 's[0.950], the[1.000], weather[1.000]
[15:00:54] [Asr] was asked to stop listening on site default
[15:00:55] [Nlu] was asked to parse input "what 's the weather"
[15:00:55] [Nlu] detected intent searchWeatherForecast with confidence score 1.000 for input "what 's the weather"
[15:00:55] [Dialogue] New intent detected searchWeatherForecast with confidence 1.000
Enter fullscreen mode Exit fullscreen mode

Instead of snips-watch, you can probably use any MQTT client:

sudo apt-get install -y mosquitto-clients
# Subscribe to all topics
mosquitto_sub -p 1883 -t "#"
Enter fullscreen mode Exit fullscreen mode

Home Assistant and Snips

Both Home Assistant and Snips are designed to use MQTT. You can either:

Since we did everything from scratch, Hass doesn’t have a broker. So, we can just point Hass at the one that got installed with Snips. In configuration.yaml:

# Enable snips (VERY IMPORTANT)
snips:
# Setup MQTT
mqtt:
  broker: 127.0.0.1
  port: 1883
Enter fullscreen mode Exit fullscreen mode

Restart Hass and from the UI pick ☰ > Developer Tools > MQTT > Listen to a Topic and enter hermes/intent/# (all Snips intents) then Start Listening.

Now say “hey snips, what’s the weather” and you should see a message for searchWeatherForecast intent pop up.

To test TTS, in Developer Tools > Services try snips.say service with data text: hello and Call Service. You should be greeted by a robo-voice from the speaker.

Let’s try a basic intent script triggered on the intent. In configuration.yaml:

intent_script:
  searchWeatherForecast:
    speech:
      text: 'hello intent'
    action:
      - service: system_log.write
        data_template:
          message: 'Hello intent'
          level: warning
Enter fullscreen mode Exit fullscreen mode

Now when hass receives the intent, the TTS engine will say “hello intent” and output something to Developer > Logs.

The End?

It’s a total bummer the future of Snips is uncertain because it was perfect for voice controlled home automation. But, that would be why it was acquired.

Top comments (0)