Create account

DEV Community

Mustafa Unal

Posted on Apr 11, 2022

Add Speech Recognition to Your PC even to your TV

#hackwithdg #csharp

Overview of My Submission

Most of our day to day usage of computers uses our computers as a sound device so I thought it will be nice if I somehow connect default audio output to voice recognition so that independent of what software you use all words will be recognized ? Teams , youtube, tiktok, twitter, Edge, VLC, … you name it ( unfortunately for Windows only ☹ ) . And how much can we push it like Subtitles for cable TV 😊

Submission Category:

Accessibility Advocates

Link to Code on GitHub

bleakview / deepgramwinsys

Deepgram sound to text converter for all sounds in emitted Windows

deepgramwinsys

Deepgram sound to text converter for all sounds in emitted Windows

What you can find in this repository?

How to get started Deepgram in windows forms
Sample for custom label control with borders in Windows form
How to get and capturesystem wide default audio output
How to record captured audio as mp3
How to save and get system settings

View on GitHub

Additional Resources / Info

In order to have system wide voice recognitionyou need basically need two components a voice recognizer service like deepgram and some way to eavesdrop generated sounds from PC. I chose C# as language as I will be directly connecting to System. In order to loopback (technical term for connecting to self sound system) in windows you use Wasapi driver. I chose NAudio library for windows system.
And here are some of the results.

It works on teams

It works on browser

While the system has settings, transparent … properties the most critical system is getting audio and recognize it.

private async void ConvertAndTranscript()
{
    //enter credentials for deepgram
    var credentials = new Credentials(textBoxApiKey.Text);
    //Create our export folder to record sound and CSV file
    var outputFolder = CreateRecordingFolder();
    //File settings
    var dateTimeNow = DateTime.Now;
    var fileName = $"{dateTimeNow.Year}_{dateTimeNow.Month}_{dateTimeNow.Day}#{dateTimeNow.Hour}_{dateTimeNow.Minute}_{dateTimeNow.Minute}_record";
    var soundFileName = $"{fileName}.mp3";
    var csvFileName = $"{fileName}.csv";
    var outputSoundFilePath = Path.Combine(outputFolder, soundFileName);
    var outputCSVFilePath = Path.Combine(outputFolder, csvFileName);
    //init deepgram
    var deepgramClient = new DeepgramClient(credentials);
    //init loopback interface
    _WasapiLoopbackCapture = new WasapiLoopbackCapture();
    //generate memory stream and deepgram client
    using (var memoryStream = new MemoryStream())
    using (var deepgramLive = deepgramClient.CreateLiveTranscriptionClient())
    {
        //the format that will we send to deepgram is 24 Khz 16 bit 2 channels  
        var waveFormat = new WaveFormat(24000, 16, 2);
        var deepgramWriter = new WaveFileWriter(memoryStream, waveFormat);
        //mp3 writer if we wanted to save audio
        LameMP3FileWriter? mp3Writer = checkBoxSaveMP3.Checked ?
            new LameMP3FileWriter(outputSoundFilePath, _WasapiLoopbackCapture.WaveFormat, LAMEPreset.STANDARD_FAST) : null;

        //file writer if we wanted to save as csv
        StreamWriter? csvWriter = checkBoxSaveAsCSV.Checked ? File.CreateText(outputCSVFilePath) : null;
        //deepgram options
        var options = new LiveTranscriptionOptions()
        {
            Punctuate = true,
            Diarize = true,
            Encoding = Deepgram.Common.AudioEncoding.Linear16,
            ProfanityFilter = checkBoxProfinityAllowed.Checked,
            Language = _SelectedLanguage.LanguageCode,
            Model = _SelectedModel.ModelCode,
        };
        //connect 
        await deepgramLive.StartConnectionAsync(options);
        //when we receive data from deepgram this is mostly taken from their samples
        deepgramLive.TranscriptReceived += (s, e) =>
        {
            try
            {
                if (e.Transcript.IsFinal &&
                   e.Transcript.Channel.Alternatives.First().Transcript.Length > 0)
                {
                    var transcript = e.Transcript;
                    var text = $"{transcript.Channel.Alternatives.First().Transcript}";
                    _CaptionForm?.captionLabel.BeginInvoke((Action)(() =>
                    {
                        csvWriter?.WriteLine($@"{DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss \"GMT\"zzz")},""{text}""");
                        _CaptionForm.captionLabel.Text = text;
                        _CaptionForm?.captionLabel.Refresh();
                    }));
                }
            }
            catch (Exception ex)
            {

            }
        };
        deepgramLive.ConnectionError += (s, e) =>
        {

        };
        //when windows tell us that there is sound data ready to be processed
        //better than polling
        _WasapiLoopbackCapture.DataAvailable += (s, a) =>
        {
            mp3Writer?.Write(a.Buffer, 0, a.BytesRecorded);
            var buffer = ToPCM16(a.Buffer, a.BytesRecorded, _WasapiLoopbackCapture.WaveFormat);
            deepgramWriter.Write(buffer, 0, buffer.Length);
            deepgramLive.SendData(memoryStream.ToArray());
            memoryStream.Position = 0;
        };
        //when recording stopped release and flush all file pointers 
        _WasapiLoopbackCapture.RecordingStopped += (s, a) =>
        {
            if (mp3Writer != null)
            {
                mp3Writer.Dispose();
                mp3Writer = null;
            }
            if (csvWriter != null)
            {
                csvWriter.Dispose();
                csvWriter = null;
            }
            _WasapiLoopbackCapture.Dispose();
        };
        _WasapiLoopbackCapture.StartRecording();
        while (_WasapiLoopbackCapture.CaptureState != NAudio.CoreAudioApi.CaptureState.Stopped)
        {
            if (_CancellationTokenSource?.IsCancellationRequested == true)
            {
                _CancellationTokenSource?.Dispose();
                _CancellationTokenSource = null;
                return;
            }
            Thread.Sleep(500);
        }
    }
}

The rest of the code is for getting code ready to exexute show hide forms etc.

So after all how can you you have subtitles on TV ? In order to achieve this you need to somehow enter TV signal to PC I use a usb capture card for this in order to process capture card input I use OBS once I get the audio signal it makes no difference to me since I process all output sound signals. Then I use computers HDMI output to send signal to TV. It makes no difference to TV and Cable box.

Ps: If you have some issues with lagging check your network connection also there seems to be a problem with memory stream which is not happy with hacky solution. Any PR is welcomed.

DEV Community

Add Speech Recognition to Your PC even to your TV

Overview of My Submission

Submission Category:

Link to Code on GitHub

bleakview / deepgramwinsys

Deepgram sound to text converter for all sounds in emitted Windows

deepgramwinsys

Additional Resources / Info

Top comments (0)

Read next

Maximize Your Web API Performance with ASP.NET Core 9.0: Proven Strategies and Best Practices

Blazor and Single-Page Applications (SPA)

.NET Core MVC Project Structure : Implementing a Generic Service and Repository Pattern

Server Sent Events in ASP.NET Core