In this hands-on tutorial, we’ll learn how to build a powerful audio transcriber and analyzer using ToolJet and Open AI. We'll quickly design an intuitive UI using ToolJet's pre-built components, and then use the platform's query builder to interact with Open AI for audio transcription and analysis.
By the end of this tutorial, we'll have a fundamental structure to build more sophisticated transcription and audio analysis applications.
Prerequisites:
- ToolJet (https://github.com/ToolJet/ToolJet) : An open-source, low-code platform designed for quickly building and deploying internal tools. Sign up for a free ToolJet cloud account here.
- Open AI Account : Register for an Open AI account to utilize AI-powered features in your ToolJet applications. Sign up here.
Here's a quick preview of what we are going to build:
Before you begin, go to the Open AI Console, and copy your secret key. Next, login to your ToolJet account, locate the Data Sources section on the left sidebar, and configure Open AI as a data source using the secret key.
Once the data source is configured, create a new app called "Speech Insight" from the dashboard. And with that, we are ready to start building our application.
Step 1: Building the UI for the Audio Transcriber
Let's use ToolJet's visual app builder to design our UI.
- For the app header, drag and drop an Icon component on the canvas. Navigate to its properties panel on the right, and select the
IconBrandDingtalk
icon. - Drop a Text component next to it, and enter "Speech Insight" under its Data property.
- Change the color of both components to blue (
#3E63DD
). This will be the primary color scheme of our app, update the color scheme of the remaining components accordingly.
- Place a Container component below the header. We will organize the upcoming components inside the Container component.
- Rename it to
mainContainer
.
- On the top left of the Container component, place a Text component with the label "Output".
- Below it, add another Container and place two Text components inside it. We will use these components to display the transcribed text and feedback. Name them
transcribedText
andfeedback
respectively. - Add a File Picker component below it and rename it to
uploader
. Change itsAccept file types
property to"audio/*"
. - Place a Button component below it and rename it to
analyzeButton
and add an "Analyze Button" label to it.
Note: We are renaming key components to make them easier to reference in other parts of our application.
- Finally, place four Statistics components on the right for Fluency, Pronunciation, Intonation, and Vocabulary scores.
- Drop a Button component below it labeled "Copy Output" and rename it to
copyButton
.
The UI is now ready! Time to configure the interactions with Open AI.
Step 2: Interact With Open AI
In the below steps, we will go through the configuration to interact with Open AI using both REST API and ToolJet's native integration.
- Expand the query panel at the bottom and click on the Add button to create a new REST API query. Rename the query to
transcribe
. - Enter the Open AI URL under the
URL
property:https://api.openai.com/v1/audio/transcriptions
. - Add a new row under Header, enter
Content-Type
as the key andmultipart/form-data
as the value. - Add another key for
Authorization
. EnterBearer <OPEN_AI_KEY>
in the value.
- Under Body, add
file
as the key and value as{{ components.uploader.file[0] }}
. This will ensure the audio file selected in our uploader/File Picker component is sent. - Add
model
as the key and enterwhisper-1
as the value.
Now if we select an audio file in the uploader/File Picker component and click on the Run button, we will see the transcribed audio as the output.
Once the audio is transcribed, we need to analyze it to provide a score. Let's use the native Open AI integration for this query.
- Click on the Add button and add a new query. Select Open AI as the data source for this query. This is the same data source that we had set up at the beginning. Rename it to
analyze
. - Select Chat as the operation and Message as input, and enter the below prompt:
Based on the transcribed audio below, provide a
JSON object in response with the following details:
- Fluency (out of 10)
- Pronunciation (out of 10)
- Vocabulary (out of 10)
- Intonation (out of 10)
- A paragraph that gives general feedback
on the transcribed text's quality and overall improvement suggestions.
Return the object in the following format:
{fluency: "...", pronunciation: "...",
vocabulary: "...", intonation: "...", feedback: "..."}
Transcribed text:
{{queries.transcribe.data.text}}
In this prompt, we are using Open AI to perform a detailed analysis of the audio transcription. We are referencing the data returned by the transcribe
query in the prompt along with other scoring criteria.
Running this query will result in the following output:
Both the queries are ready. As a final step, let's automate the process of triggering the analyze
query every time the transcribe
query is successfully executed.
- Go back to the
transcribe
query, navigate to Events and add a new event handler. - Select Query Success as the Event, Run Query as the Action, and
analyze
as the Query.
By using events, we have set up the process of triggering the analyze
query after the transcribe
query is triggered and is ready with the output for analysis.
Step 3: Binding the Transcripts and Analysis to Components
Onto the final step. We have built our UI and also built queries to interact with Open AI. Now we can connect it all together and see the app in action.
- Select the Analyze Audio button, navigate to its properties panel on the right and add a new event.
- Select On click as the Event, Run Query as the Action, and
transcribe
as the Query.
Now every time the Analyze Audio button is clicked, it will trigger the transcribe
query.
- Select the Copy Output button and add a new event to it.
- Select On click as the Event, Copy to clipboard as the Action, and
{{queries.analyze.data}}
as the Text.
This configuration will ensure that the analyzed output gets copied when you click on the Copy Output button.
Select the Text component that we had placed to display the transcript. Enter the following value under its Data property:
Transcript:{{queries.transcribe.data.text}}
Select the Text component to display the feedback. Enter the following value under its Data property:
Feedback:{{JSON.parse(queries.analyze.data).feedback}}
Note: We received a JSON string in response to our analyze
query. Therefore, we need to parse it to construct a JavaScript object before displaying it.
Select the Statistics component for "Fluency" and enter the below value under its
Primary value
property:
{{JSON.parse(queries.analyze.data).fluency}}
Update the rest of the Statistics components using the same logic.
Our audio transcriber and analyzer is now fully complete. Upload an audio file and click on the Analyze Audio button to see the transcription, feedback, and scores getting populated in the UI.
Conclusion
In this tutorial, we learned how to create a complete audio transcription and analysis tool using ToolJet and OpenAI. We walked through designing an intuitive user interface, setting up API queries to interact with OpenAI, and binding the results to display transcriptions, feedback, and speech analysis scores.
To further customize the application, experiment with different UI components to enhance the user experience or integrate additional APIs to analyze other aspects of the audio, such as emotion detection or language translation.
To learn more, check out ToolJet's official documentation or connect on Slack for questions or queries.
Top comments (2)
Great Read 🚀
Ah neat.
Another post about Open AI.
With 86 likes and 16 bookmarks at the time of my comment.
0 interaction. No comments. Nothing.
@mods can we sort this out? It's clearly engagement farming or botting. Theres dozens of posts like this every week. They all follow the same format, and add no value to dev.to.
Some comments have been hidden by the post's author - find out more