👀 Stumbled here on accident? Start with the introduction!
📚 The aim of this article is to incorporate the Meta - Voice SDK into our application, enabling it to respond to specific word sequences. This functionality will allow us to interact with the door we rendered in the previous article, providing a more immersive and interactive experience in our application.
ℹ️ If you find yourself facing any difficulties, remember that you can always refer to or download the code from our accompanying GitHub repository
Login to the Unity Store Unity Asset Store - The Best Assets for Game Making and add Meta - Voice SDK - Immersive Voice Commands to your library.
Return to the Unity Editor and install the Meta - Voice SDK
through the Package Manager
. You can directly access My Assets
via Window -> My Assets.
Before we begin utilizing the Meta - Voice SDK
, it's necessary to create an account on Wit.ai . You can conveniently use your existing Meta developer account for this purpose by clicking on Continue with Meta
on the Wit.ai landing page.
ℹ️ Wit.ai (often referred to as Won in Translation, by David Jacobs from its URL) is a natural language processing (NLP) service created by Facebook. It enables developers to build applications that can understand human language by providing a powerful and easy-to-use API.
Once you've set up your Wit.ai account, you can create a new application at Wit.ai. If you're unsure about what to name the application, simply go with unity_example
.
After your app is created, click on unity_example
. Then, as illustrated in the upcoming screenshot, add an Utterance
for the Intent
open_door
. This step is crucial for training your application to recognize and respond to specific user inputs related to the action of opening a door.
ℹ️ An Utterance
in the context of natural language processing (NLP), linguistics, and conversational AI, refers to a sequence of words or sounds made by a speaker. It's essentially a unit of speech. In practical terms, an utterance can be as short as a single word (like a command or an exclamation) or as long as a complete sentence or multiple sentences.
ℹ️ In Wit.ai, an Intent
represents the purpose or goal behind a user's input, typically a spoken or written phrase. It's a fundamental concept in natural language understanding (NLU) and is used to categorize user utterances into specific actions that the application should perform.
1. Fill in the phrase “open the door” as the Utterance
and select the open_door
intent in the Intent
dropdown.
2. In the Utterance
input field select the word “open”. This will open the entity form. Fill in action
as the entity and click on + Create Entity
.
3. In the Utterance
input field select the word “door”. This will open the entity form. Fill in entity
as the entity and click on + Create Entity
.
4. After creating the entities, they will now be highlighted as seen in the next screenshot.
Now, click on Train and Validate
. Once the training process is complete (indicator on the top left next to the app name), return to the Unity Editor and navigate to Oculus → Voice SDK → Get Started. In the first dialog enter the Wit Server Access Token
. The access token can be found on the Wit.ai website under Management → Settings within your unity_example
app.
You will be prompted to choose a location for saving your Wit asset. Save it in your Assets/Settings folder and name it wit
. Once you have saved the asset, the following screen will appear.
Now, click on Specify Assemblies
and uncheck everything except the first entry:
Click on Generate Manifest
then close the Voice Hub
for now.
The next step is to respond to the Utterance
“open the door”. Create a new Script under Assets/Scripts/VoiceSDK and name it OpenDoorConduit
. Add the Script to the XR Origin (XR Rig)
via Add Component
.
The Script looks as follows:
using System.Collections;
using System.Collections.Generic;
using Meta.WitAi;
using UnityEngine;
namespace Taikonauten.Unity.ArticleSeries
{
public class OpenDoorConduit : MonoBehaviour
{
private const string OPEN_DOOR_INTENT = "open_door";
[MatchIntent(OPEN_DOOR_INTENT)]
public void OpenDoor(string[] values)
{
Debug.Log("OpenDoorConduit -> OpenDoor()");
string action = values[0];
string entity = values[1];
if (!string.IsNullOrEmpty(action) && !string.IsNullOrEmpty(entity))
{
Debug.Log("OpenDoorConduit -> OpenDoor(): match");
}
}
}
}
As you can see we are using MatchIntent
with the open_door
Intent which we created in a previous step. The method OpenDoor
is automatically called by the ConduitDispatcher
.
ℹ️ ConduitDispatcher
is used to manage voice commands, direct them to the appropriate processing channels, or handle the distribution of responses or actions triggered by voice input.
Now, proceed by adding the App Voice Experience
component to the XR Origin (XR Rig)
GameObject. Once you have added this component, it's necessary to select the Wit configuration, which you have created during the Get Started process for Wit.
For the final step, we must add the Response Matcher
. This component generates the Android Intent open_door
in the Android manifest file, and upon receiving a successful response from Wit, it triggers our OpenDoor
method, which we defined earlier.
To do this, open the Understanding Viewer
via Oculus → Understanding Viewer. Enter “open the door“ in the Utterance
field and click Send
.
Now right-click value
and select Add response matcher to XR Origin (XR Rig)
.
This will result in the following screenshot:
Next, create an entry under On Multi Value Event
and select the values as follows:
Android Setup
To ensure compatibility with the Voice SDK, some adjustments are needed for the Android build. Access the Project Settings
by going to Edit → Project Settings. This step is crucial for configuring your project to work seamlessly with the Voice SDK on Android.
1. In the Project Settings
, navigate to the Player section
. There, change the Minimum API Level
to Android 10.0 (API Level 29)
and the Target API Level
to Android 12L (API Level 32)
. These options are located under Other Settings
.
2. In the Project Settings
, go to the Player
section, find Application Entry Point
under Other Settings
and change the value from GameActivity
to Activity
.
ℹ️ You can find more information about application entry points in the Unity documentation Unity - Manual: Android application entry points. As of the time of writing, it's important to note that GameActivity
is not compatible with the Meta Voice SDK.
3. In the Project Settings
, go to the Player
section, open the Publishing Settings
tab and enable Custom Main Manifest
. After activating this option, you will find the manifest file located at Assets/Plugins/Android/AndroidManifest.xml. This step is essential for gaining direct control over the Android manifest file, allowing you to make specific customizations needed for your project.
Open the AndroidManifest
file in your code editor and modify its content as follows.
<?xml version="1.0" encoding="utf-8"?>
<manifest
xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools">
<application>
<activity android:name="com.unity3d.player.UnityPlayerActivity"
android:theme="@style/UnityThemeSelector">
<intent-filter>
<action android:name="android.intent.action.MAIN" />
<category android:name="android.intent.category.LAUNCHER" />
</intent-filter>
<meta-data android:name="unityplayer.UnityActivity" android:value="true" />
<meta-data android:name="unityplayer.SkipPermissionsDialog" android:value="false" />
</activity>
</application>
</manifest>
Let's review the changes we've made. This will help us understand the adjustments and their impact on the application's functionality and compatibility.
We have removed the
GameActivity
block, as only oneActivity
is permitted and we previously opted forUnityActivity
instead ofGameActivity
in theProject Settings
.We included the
unityplayer.SkipPermissionsDialog
setting with a value offalse
to ensure that required permission dialogs are not automatically bypassed. This adjustment is important for guaranteeing that the application appropriately prompts users for necessary permissions, aligning with best practices for user consent and app functionality.
Adding some UI
Before we can test our implementation, it is essential to integrate some user interface elements into the scene. We will add a UI Label that becomes visible when the application starts listening to the user's voice. This label will then disappear once the recording ceases, triggered by a successful response from Wit. This UI component plays a crucial role in providing visual feedback to the user about the state of voice recognition within the application.
To set up the user interface for voice recognition feedback, follow these steps:
1. Create an empty GameObject in your scene and name it UI
.
2. Within the UI
GameObject, add another empty GameObject and name it VoiceSDK
.
3. With the VoiceSDK
GameObject selected in the hierarchy attach the Lazy Follow
script via Add Component
. Configure the script as follows:
4. Add a Canvas to the VoiceSDK
GameObject by right-clicking it and navigating to UI -> Canvas.
5. Inside the Canvas, add a Text element by choosing UI → Text - TextMeshPro.
This setup creates a structured UI hierarchy in your scene, with the VoiceSDK
GameObject serving as a container for the elements that will provide visual feedback for voice recognition. The Lazy Follow
script will manage the positioning, and the TextMeshPro
element will display the necessary information or status messages.
Your hierarchy should now look as follows:
Select the EventSystem
GameObject in your hierarchy and delete and add components like follows:
Don’t forget to remove the Standalone Input Module
if any.
Next, we need to configure our Canvas
and Text (TMP)
elements. Select the Canvas
GameObject in your hierarchy and set it up as follows (Add components as seen in the screenshot via Add Component
).
ℹ️ We won't be delving into UI-related topics in this article series. For those who need assistance or guidance with Unity's UI system, I recommend checking out the Unity documentation. It provides comprehensive resources and tutorials that can help you understand and effectively use Unity's UI tools in your projects.
Lastly, deactivate the VoiceSDK
GameObject, as we won't be displaying it immediately. The visibility of the UI will be managed later through our script.
ℹ️ If you're not familiar with how to deactivate a GameObject in the Unity Inspector, I recommend consulting the Unity documentation: Unity - Manual: Deactivate GameObjects.
Updating our MRArticleSeriesController Script
For the final step in this article, we'll update our MRArticleSeriesController
Script to enable and disable the Voice Service using the left Trigger
. This modification will allow for straightforward control of the Voice Service directly through user input, enhancing the interactive capabilities of our application.
using System.Collections;
using System.Collections.Generic;
using Meta.WitAi;
using Meta.WitAi.Requests;
using UnityEngine;
using UnityEngine.InputSystem;
using UnityEngine.XR.ARFoundation;
using UnityEngine.XR.ARSubsystems;
using UnityEngine.XR.Interaction.Toolkit;
namespace Taikonauten.Unity.ArticleSeries
{
public class MRArticleSeriesController : MonoBehaviour
{
[SerializeField] private ARAnchorManager anchorManager;
[SerializeField] private GameObject door;
[SerializeField] private GameObject uI;
[SerializeField] private InputActionReference buttonActionLeft;
[SerializeField] private InputActionReference buttonActionRight;
[SerializeField] private VoiceService voiceService;
[SerializeField] private XRRayInteractor rayInteractor;
private VoiceServiceRequest voiceServiceRequest;
private VoiceServiceRequestEvents voiceServiceRequestEvents;
void OnEnable()
{
Debug.Log("MRArticleSeriesController -> OnEnable()");
buttonActionRight.action.performed += OnButtonPressedRightAsync;
buttonActionLeft.action.performed += OnButtonPressedLeft;
}
void OnDisable()
{
Debug.Log("MRArticleSeriesController -> OnDisable()");
buttonActionRight.action.performed -= OnButtonPressedRightAsync;
buttonActionLeft.action.performed -= OnButtonPressedLeft;
}
private void ActivateVoiceService()
{
Debug.Log("MRArticleSeriesController -> ActivateVoiceService()");
if (voiceServiceRequestEvents == null)
{
voiceServiceRequestEvents = new VoiceServiceRequestEvents();
voiceServiceRequestEvents.OnInit.AddListener(OnInit);
voiceServiceRequestEvents.OnComplete.AddListener(OnComplete);
}
voiceServiceRequest = voiceService.Activate(voiceServiceRequestEvents);
}
private void DeactivateVoiceService()
{
Debug.Log("MRArticleSeriesController -> DeactivateVoiceService()");
voiceServiceRequest.DeactivateAudio();
}
private void OnInit(VoiceServiceRequest request)
{
uI.SetActive(true);
}
private void OnComplete(VoiceServiceRequest request)
{
uI.SetActive(false);
DeactivateVoiceService();
}
private async void OnButtonPressedRightAsync(InputAction.CallbackContext context)
{
Debug.Log("MRArticleSeriesController -> OnButtonPressedRightAsync()");
if (rayInteractor.TryGetCurrent3DRaycastHit(out RaycastHit hit))
{
Pose pose = new(hit.point, Quaternion.identity);
Result<ARAnchor> result = await anchorManager.TryAddAnchorAsync(pose);
result.TryGetResult(out ARAnchor anchor);
if (anchor != null)
{
// Instantiate the door Prefab
GameObject _door = Instantiate(door, hit.point, Quaternion.identity);
// Unity recommends parenting your content to the anchor.
_door.transform.parent = anchor.transform;
}
}
}
private void OnButtonPressedLeft(InputAction.CallbackContext context)
{
Debug.Log("MRArticleSeriesController -> OnButtonPressedLeft()");
ActivateVoiceService();
}
}
}
Let's review the updates quickly:
- Added a field named
uI
to hold the GameObject that will be activated or deactivated based on the VoiceService state. - Included a field called
voiceService
to reference the App Voice Experience component added to theXR Origin (XR Rig)
GameObject. - Introduced a field
voiceServiceRequest
to store the active request to the VoiceService. - Added a
voiceServiceRequestEvents
field, which is passed to the VoiceService. This ensures that theOnInit
andOnComplete
methods are called by the VoiceService. - In
OnEnable
andOnDisable
we add and remove theOnButtonPressedLeft
action, so we can respond to the left controllerTrigger
press. -
ActivateVoiceService
method, which activates the VoiceService, createsVoiceServiceRequestEvents
if not already initialized, and is called viaOnButtonPressedLeft
when the user presses the left Trigger. -
DeactivateVoiceService
which simple deactivates the recording of theVoiceService
. -
OnInit
Invoked when the VoiceService starts listening. It enables the UI GameObject to inform the user that the app is now listening. -
OnComplete
called when a voice request succeeds. It also invokesDeactivateVoiceService
to stop the VoiceService from listening. -
OnButtonPressedLeft
triggered when the left Trigger is pressed.
Please be aware that we have also renamed some variables and methods. After saving these changes, remember to return to the Unity Editor and update the fields for the Player
component accordingly.
Testing the app
We are now prepared to test the app. Select Build and Run
, and once the app is running, press the trigger on the left controller. You should see a label appear in front of you indicating "...Listening...". At this point, say the phrase "open the door". After a brief delay, the label should disappear. This process will allow you to verify the functionality of the voice recognition feature in your application.
Additionally, you can verify in the console if the Intent was triggered, as shown in the following screenshot. This will provide a clear indication of whether the voice command was successfully recognized.
Next article
In our upcoming article, we will take an exciting step forward by leveraging the voice command functionality we've established to initiate an animation that opens the door. This integration represents a significant enhancement in our application, blending voice recognition with dynamic visual feedback.
Top comments (0)