Overview of My Submission
Most of our day to day usage of computers uses our computers as a sound device so I thought it will be nice if I somehow connect default audio output to voice recognition so that independent of what software you use all words will be recognized ? Teams , youtube, tiktok, twitter, Edge, VLC, … you name it ( unfortunately for Windows only ☹ ) . And how much can we push it like Subtitles for cable TV 😊
Submission Category:
Accessibility Advocates
Link to Code on GitHub
bleakview / deepgramwinsys
Deepgram sound to text converter for all sounds in emitted Windows
deepgramwinsys
Deepgram sound to text converter for all sounds in emitted Windows
What you can find in this repository?
- How to get started Deepgram in windows forms
- Sample for custom label control with borders in Windows form
- How to get and capturesystem wide default audio output
- How to record captured audio as mp3
- How to save and get system settings
Additional Resources / Info
In order to have system wide voice recognitionyou need basically need two components a voice recognizer service like deepgram and some way to eavesdrop generated sounds from PC. I chose C# as language as I will be directly connecting to System. In order to loopback (technical term for connecting to self sound system) in windows you use Wasapi driver. I chose NAudio library for windows system.
And here are some of the results.
While the system has settings, transparent … properties the most critical system is getting audio and recognize it.
private async void ConvertAndTranscript()
{
//enter credentials for deepgram
var credentials = new Credentials(textBoxApiKey.Text);
//Create our export folder to record sound and CSV file
var outputFolder = CreateRecordingFolder();
//File settings
var dateTimeNow = DateTime.Now;
var fileName = $"{dateTimeNow.Year}_{dateTimeNow.Month}_{dateTimeNow.Day}#{dateTimeNow.Hour}_{dateTimeNow.Minute}_{dateTimeNow.Minute}_record";
var soundFileName = $"{fileName}.mp3";
var csvFileName = $"{fileName}.csv";
var outputSoundFilePath = Path.Combine(outputFolder, soundFileName);
var outputCSVFilePath = Path.Combine(outputFolder, csvFileName);
//init deepgram
var deepgramClient = new DeepgramClient(credentials);
//init loopback interface
_WasapiLoopbackCapture = new WasapiLoopbackCapture();
//generate memory stream and deepgram client
using (var memoryStream = new MemoryStream())
using (var deepgramLive = deepgramClient.CreateLiveTranscriptionClient())
{
//the format that will we send to deepgram is 24 Khz 16 bit 2 channels
var waveFormat = new WaveFormat(24000, 16, 2);
var deepgramWriter = new WaveFileWriter(memoryStream, waveFormat);
//mp3 writer if we wanted to save audio
LameMP3FileWriter? mp3Writer = checkBoxSaveMP3.Checked ?
new LameMP3FileWriter(outputSoundFilePath, _WasapiLoopbackCapture.WaveFormat, LAMEPreset.STANDARD_FAST) : null;
//file writer if we wanted to save as csv
StreamWriter? csvWriter = checkBoxSaveAsCSV.Checked ? File.CreateText(outputCSVFilePath) : null;
//deepgram options
var options = new LiveTranscriptionOptions()
{
Punctuate = true,
Diarize = true,
Encoding = Deepgram.Common.AudioEncoding.Linear16,
ProfanityFilter = checkBoxProfinityAllowed.Checked,
Language = _SelectedLanguage.LanguageCode,
Model = _SelectedModel.ModelCode,
};
//connect
await deepgramLive.StartConnectionAsync(options);
//when we receive data from deepgram this is mostly taken from their samples
deepgramLive.TranscriptReceived += (s, e) =>
{
try
{
if (e.Transcript.IsFinal &&
e.Transcript.Channel.Alternatives.First().Transcript.Length > 0)
{
var transcript = e.Transcript;
var text = $"{transcript.Channel.Alternatives.First().Transcript}";
_CaptionForm?.captionLabel.BeginInvoke((Action)(() =>
{
csvWriter?.WriteLine($@"{DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss \"GMT\"zzz")},""{text}""");
_CaptionForm.captionLabel.Text = text;
_CaptionForm?.captionLabel.Refresh();
}));
}
}
catch (Exception ex)
{
}
};
deepgramLive.ConnectionError += (s, e) =>
{
};
//when windows tell us that there is sound data ready to be processed
//better than polling
_WasapiLoopbackCapture.DataAvailable += (s, a) =>
{
mp3Writer?.Write(a.Buffer, 0, a.BytesRecorded);
var buffer = ToPCM16(a.Buffer, a.BytesRecorded, _WasapiLoopbackCapture.WaveFormat);
deepgramWriter.Write(buffer, 0, buffer.Length);
deepgramLive.SendData(memoryStream.ToArray());
memoryStream.Position = 0;
};
//when recording stopped release and flush all file pointers
_WasapiLoopbackCapture.RecordingStopped += (s, a) =>
{
if (mp3Writer != null)
{
mp3Writer.Dispose();
mp3Writer = null;
}
if (csvWriter != null)
{
csvWriter.Dispose();
csvWriter = null;
}
_WasapiLoopbackCapture.Dispose();
};
_WasapiLoopbackCapture.StartRecording();
while (_WasapiLoopbackCapture.CaptureState != NAudio.CoreAudioApi.CaptureState.Stopped)
{
if (_CancellationTokenSource?.IsCancellationRequested == true)
{
_CancellationTokenSource?.Dispose();
_CancellationTokenSource = null;
return;
}
Thread.Sleep(500);
}
}
}
The rest of the code is for getting code ready to exexute show hide forms etc.
So after all how can you you have subtitles on TV ? In order to achieve this you need to somehow enter TV signal to PC I use a usb capture card for this in order to process capture card input I use OBS once I get the audio signal it makes no difference to me since I process all output sound signals. Then I use computers HDMI output to send signal to TV. It makes no difference to TV and Cable box.
Ps: If you have some issues with lagging check your network connection also there seems to be a problem with memory stream which is not happy with hacky solution. Any PR is welcomed.
Top comments (0)