π About
GitHub recently published "Propelling your DevOps to new heights | GitHub InFocus", a exciting DevOPS related content :
Also, within the same period of time I watched an episode of "The Download" series (animated by @film_girl
):
This episode did introduce videogrep
:
antiboredom / videogrep
automatic video supercuts with python
Videogrep
Videogrep is a command line tool that searches through dialog in video files and makes supercuts based on what it finds. It will recognize .srt
or .vtt
subtitle tracks, or transcriptions that can be generated with vosk, pocketsphinx, and other tools.
Examples
- The Meta Experience
- All the instances of the phrase "time" in the movie "In Time"
- All the one to two second silences in "Total Recall"
- A former press secretary telling us what he can tell us
Tutorial
See my blog for a short tutorial on videogrep and yt-dlp, and part 2, on videogrep and natural language processing.
Installation
Videogrep is compatible with Python versions 3.6 to 3.10.
To install:
pip install videogrep
If you want to transcribe videos, you also need to install vosk:
pip install vosk
Note: the previous version of videogrep supported pocketsphinx for speech-to-text. Vosk seems much better so I've addedβ¦
Then came the idea :
What if I was analyzing "GitHub Infocus" with
videogrep
?
This short post will guide through this first trial on videogrep and what I have been able to produce, discover... and the fun I also had.
βοΈ Notice that I used the following excellent tutorial to perform this experience π
π₯ Get the video with yt-dlp
First I want to get the YT video https://youtu.be/awQ7LFxfXWE
locally, therefore you can choose many encoding options and choose the one that best fits your needs (
-F
option) but in our case, we'll get the default one :
yt-dlp https://youtu.be/awQ7LFxfXWE -o propelling_your_devops.mp4 --write-auto-sub
Then you are ready for the next step : use videogrep
.
π Text analysis with ngrams
videogrep
makes it possible (and super easy) to analyze text within the (downloaded vtt
files) subtitles.
So, what are the trendiest group of word ( called ngrams
) in the video ? Let's find out !
While the single word analysis is not really interesting :
β― videogrep --input propelling_your_devops.mp4.webm --ngrams 1 | head -10
to 449
and 352
that 347
you 323
the 322
we 306
a 255
of 251
so 167
is 157
2-ngrams
are much more interesting about the underlying intents of the video :
β― videogrep --input propelling_your_devops.mp4.webm --ngrams 2 | head -7
want to 97
that we 61
you can 55
you know 54
going to 51
we have 45
we can 45
... soon confirmed with the 3-grams
:
β― videogrep --input propelling_your_devops.mp4.webm --ngrams 3 | head -9
we want to 30
you want to 20
a lot of 19
want to make 19
make sure that 18
i'm going to 17
to make sure 17
we have a 16
i want to 13
π¬ Short analysis
With the help of ngrams
, within less than a second we discover, by grepping the text of the video that
"GitHub focuses it attention on what they want... and also on what you want to achieve... and make"
π That first fact already tells us a lot.
βοΈ It also puts in evidence
"the inclusive approach while using a lot of "I" and "We"
... which is also pretty exciting to onboard us on the product they are showcasing β£οΈ
βοΈποΈ Cut & get shorts
Now, the fun part.
You have made a text analysis but... wouldn't it be fun to see the movie of these grepped terms ?...
β οΈ Spoiler alert : Yes it is β (and it's easy) π€£
These are called fragments
. Let's get some of them.
π― The "Want" movie
Let's get all the sentences containing "want"
videogrep --input propelling_your_devops.mp4.webm --search 'want' --resyncsubs 0.1 --output want_sentence.mp4
π€ͺ Also "we want" to get the "want" movie π€£ :
videogrep --input propelling_your_devops.mp4.webm --search 'we want to' --search-type fragment --resyncsubs 0.1 --output want.mp4
π€ GH talking about code
What we think the more when we think about Github services is : the "code".
Let's make them talk about "code"
videogrep --input propelling_your_devops.mp4.webm --search 'code' --search-type fragment --resyncsubs 0.1 --output code.mp4
β° Github about GitHub πΉ
Last but not least, I'd love to
see how GitHub talks about GitHub
videogrep --input propelling_your_devops.mp4.webm --search 'github' --search-type fragment --resyncsubs 0.1 --output github.mp4
π§βπ¨ Conclusion
These tools open a very wide area for speech and video analysis... making it possible to put in evidence patterns, intentions or simply have fun.
Also, being aware that yt-dlp
makes it possbible to download complete channel, playlists or search queries...
possibilities are endless.
π Resources
-
Vosk
: speech recognition toolkit -
yt-dlp
: Ayoutube-dl
fork with additional features and fixes @sam_lavigne
- "GitHub Infocus 2022 analysis" playlist on YT
ποΈ News
In its 2.1.1
, videogrep
adds some really cool features like (but not only) :
- Finding "non-english vtt subtitle files"
- "Examples that integrate with spaCy"
Top comments (3)
Jon Peck being "videogrepped" π€£