Getting Started with Web Speech Synthesis API and Svelte

#svelte

Browser get new APIs all the time, one of such APIs is Web Speech Synthesis. Let's explore it with Svelte.

Getting Started with the API

The entry point of the API is speechSynthesis object. To get list of available voices we can do:

console.log(speechSynthesis.getVoices())

Which on my Chrome on OSX laptop returns a list of 67 voices.

So let's just create a new page, and put that in JavaScript, and... it returns an empty array. What just happened?

Unfortunately the Web Speech Synthesis API is terribly designed. The list of voices is populated asynchronously, which is fair enough, but instead of just returning a promise or some event like onSpeechSynthesisReady it will just happily return an empty array if you call it too early.

There is onVoicesChanged event. Nothing in the spec says even implies that it will only trigger once during page load, and I think a browser could work the other way around (have voice list pre-populated, so no event triggers), but it seems to work fine in Chrome.

For any production use it would likely needa lot more robust code and some cross-browser cross-OS testing. Arguably a timeout loop of checking it every 16ms until it's non-empty or some max timeout elapsed might even be more robust, but we're just exploring the API here.

Display list of available voices

To get started we can wrap this event in a promise, and use Svelte to await on the promise.

<script>
  let voicesPromise = new Promise((resolve) => {
    speechSynthesis.addEventListener("voiceschanged", ev => {
      resolve(speechSynthesis.getVoices())
    })
  })
</script>

<div>Available Voices:</div>
{#await voicesPromise then voices}
  <ul>
    {#each voices as voice}
      <li>{voice.name} - {voice.lang}</li>
    {/each}
  </ul>
{/await}

<style>
:global(body) {
  margin: 0;
  min-height: 100vh;
  display: flex;
  flex-direction: column;
  justify-content: center;
  align-items: center;
}
</style>

Say something

Now it's just a matter of adding some radio boxes for voice selection, text input for text to say, a button to run it, and it works.

<script>
  let loading = true
  let voices = []
  speechSynthesis.addEventListener("voiceschanged", ev => {
    loading = false
    voices = speechSynthesis.getVoices()
  })
  let text = "Hello, world!"
  let voiceIndex = 0
  $: voice = voices[voiceIndex]

  function sayIt() {
    let u = new SpeechSynthesisUtterance(text)
    u.voice = voice
    speechSynthesis.speak(u)
  }
</script>

<div>
  <label>Text to say:
    <input bind:value={text} />
  </label>
  <button on:click={sayIt}>Say it</button>
</div>

{#if loading}
  <div>Please wait for voices to load</div>
{:else}
  <div>Available Voices:</div>
  <ul>
    {#each voices as v, i}
      <li>
        <label>
          <input type="radio" bind:group={voiceIndex} value={i}>
          {v.name} - {v.lang}
        </label>
      </li>
    {/each}
  </ul>
{/if}

<style>
:global(body) {
  margin: 0;
  min-height: 100vh;
  display: flex;
  flex-direction: column;
  justify-content: center;
  align-items: center;
}
</style>

Some notes:

I changed from promise to loading flag as I want the array of voices in the script as well as in rendered mode, and it's slightly easier this way
We need to create new SpeechSynthesisUtterance(text) instead of just doing more obvious voice.say("some text"). SpeechSynthesisUtterance object has some additional properties like speed and pitch, so you can use it for speed reading, or for UwU voice etc.
the API has some additional event for when the speech starts, ends etc. so you might consider listening to the events to know if the browser is speaking right now if you need some visual feedback as well