DEV Community

Alessandro T.
Alessandro T.

Posted on • Originally published at trinca.tornidor.com

1

AI Pronunciation Trainer

In this article, I present the project I am working on: AI Pronunciation Trainer (online here), a tool designed to help you improve your pronunciation using the power of artificial intelligence. This project is a refactor of the original AI Pronunciation Trainer by Thiagohgl to which I have made several improvements to make the tool more effective and easier to use.

What it is and what it does

AI Pronunciation Trainer is a tool that uses AI to evaluate your pronunciation and provide feedback, helping you to improve and be understood more clearly. It leverages the Silero STT / TTS models for speech-to-text and text-to-speech functionalities, ensuring accurate and reliable pronunciation assessment.

Refactor: upgraded frontend and backend libraries

I have updated the backend libraries. PyTorch is now at version 2.5.x. Additionally, I changed the version of the German Speech-to-Text model to resolve a bug that prevented the use of PyTorch versions later than 1.13.x. Furthermore, regarding the frontend:

  • Updated the JavaScript libraries using the latest versions of jQuery (3.7.1) and Bootstrap (5.3.3)
  • New frontend based on Gradio 5.x
  • Added E2E tests with Playwright
  • Added the ability to insert custom sentences to read and evaluate
  • Onboarding tour for new users made with driver.js and custom css/javascript in Gradio blocks
  • Playback of individual words in the recording followed by the 'ideal' pronunciation of the same word read by the Text-to-Speech engine
  • Also added an in-browser Text-to-Speech functionality (on Windows 11, it only works if the English and German language packs are installed)

Online version: the HuggingFace Space Demo

You can try it online using the HuggingFace Space. This online demo allows you to experience the tool's capabilities without any installation or configuration. The HuggingFace Space provides a convenient and accessible way to test the AI Pronunciation Trainer and see how it can help you improve your pronunciation. Please be patient, sometimes it is a bit slow or in sleeping mode (locally it is much faster, especially if you have a powerful computer). There is also an embedded version of my HuggingFace Space.

Future Work

Although this tool works pretty good, there are still some areas for improvement. Here are some of the future enhancements I plan to implement:

  • Receive feedback from the original author on my documentation and changes
  • Ask the original author for explanations on the architectural and functional choices he made
  • Explore transitioning from PyTorch to ONNX Runtime
  • Add more E2E tests with Playwright

Conclusion

I believe AI Pronunciation Trainer is a valuable tool for anyone looking to improve their pronunciation. With the power of AI and the improvements made in the refactoring project, this tool provides accurate and reliable feedback to help you speak more clearly and confidently. I invite you to try the HuggingFace Space demo and understand how this little project can help you on your journey to better pronunciation.

Heroku

Build apps, not infrastructure.

Dealing with servers, hardware, and infrastructure can take up your valuable time. Discover the benefits of Heroku, the PaaS of choice for developers since 2007.

Visit Site

Top comments (0)

Migrate from EVM to Rust

Migrate from EVM

Migrate your smart contracts from Solidity to Rust. Let us know how we can help.

Make the Move

👋 Kindness is contagious

Engage with a sea of insights in this enlightening article, highly esteemed within the encouraging DEV Community. Programmers of every skill level are invited to participate and enrich our shared knowledge.

A simple "thank you" can uplift someone's spirits. Express your appreciation in the comments section!

On DEV, sharing knowledge smooths our journey and strengthens our community bonds. Found this useful? A brief thank you to the author can mean a lot.

Okay