The only way to build the web for everyone is to make web apps, including the related media (audio, images, videos), as accessible as possible for your entire audience.
Visual media is instrumental in conveying information. Images pass on information in picture format. Videos take that to the next level. Concise videos, in particular, attract attention and effectively tell stories.
However, video is only partially relevant to visually impaired users. Ditto deaf and hard-of-hearing people, who can absorb only half the content, not to mention those who speak a language different from that of the content.
A solution to make images accessible is to add <alt>
text, but what about the audio in videos? You add subtitles and transcripts, which would also be welcome by those who are, say, watching the video next to a sleeping partner or who don’t want to wake up a child.
With Cloudinary, you can enable people with hearing or visual challenges to engage with video and audio. This tutorial shows you how.
Acquiring the Prerequisites
To follow the steps in this tutorial, you need the following:
- A grasp of the basics of JavaScript.
- Adeptness with Node.js and Cloudinary.
- An ability to integrate Cloudinary into Node.js apps.
- A Cloudinary account. Sign up for a free account if you don’t have one.
Getting Started
As a start, upload a video, such as this one from YouTube. Follow these steps:
- Download the video to your computer.
- Create a project with a basic front end and back end to support media upload to the back end, e.g., to a Node.js server with Multer.
Note: To avoid storing copies of uploaded videos, upload them to Cloudinary with the [Cloudinary upload widget(https://cloudinary.com/documentation/upload_widget).
Your back end contains this Cloudinary configuration and API route:
const multer = require('multer')
const express = require('express')
const cors = require('cors')
const cloudinary = require('cloudinary').v2
require('dotenv').config()
const upload = multer({ dest: 'uploads/' })
cloudinary.config({
cloud_name: process.env.CLOUD_NAME,
api_key: process.env.API_KEY,
api_secret: process.env.API_SECRET,
})
const app = express()
app.use(cors())
app.use(express.json())
app.post('/video/upload', upload.single('video'), uploadVideo)
function uploadVideo(req, res) {
cloudinary.uploader.upload(
req.file.path,
{
public_id: 'videos/video1',
resource_type: 'video'
},
() => {
res.json({ message: 'Successfully uploaded video' })
}
)
}
-
Install the dependencies and save the correct environment variables in a
.env
file.Replace the variables
CLOUD_NAME
,API_KEY
, andAPI_SECRET
with the values from your account’s dashboard. On the front end, send the video to Cloudinary with a
file
input.
Improving Video Accessibility
Cloudinary supports metadata for resources, including tags and subtitles for video. You can fetch videos from Cloudinary with integrated subtitles, which must originate from existing transcripts. That’s similar to the scenario whereby, while watching a video in a media player, you must show the player where to get the subtitles.
Manually generating tags and subtitles can be tedious. A much more efficient alternative is to generate through Cloudinary in these two steps:
- Create transcripts in various languages to cater to those who are hearing challenged or foreign to the video’s language.
- Generate and display tags that relate to the video for the visually impaired, including those who determine the video’s relevance with screen readers.
Leveraging the Google AI Video Transcription Add-On
In conjunction with Google’s Speech-to-Text API, Cloudinary’s Google AI Video Transcription add-on automatically generates transcripts for videos. As a result, when uploading or updating a video with Cloudinary’s API, you can create transcripts in the same folder as the video.
Here are the steps:
Activate the add-on for your account. A free plan is available.
Add to the Cloudinary
upload
method the optionraw_convert
in the Upload API reference.raw_convert
asynchronously generates a file based on the uploaded file.
With that file, Google creates a transcript with the google_speech
value for the uploaded video. Here’s how:
function uploadVideo(req, res) {
cloudinary.uploader.upload(
req.file.path,
{
public_id: 'videos/video2',
resource_type: 'video',
raw_convert: 'google_speech'
},
() => {
res.json({ message: 'Successfully uploaded video' })
}
)
}
Note: The videos
/video2
value for public_id
identifies the video with subtitles. Assign any value as you desire and jot it down for use later.
- Go back to the front end and upload the same video.
Cloudinary then generates another file in your account’s Media Library:
The video2.transcript
file reads as follows in a code editor:
The above JSON structure shows that the line “If you only have 24 hours in a day, your success is dependent upon how you use the 24” is displayed between 0.1 and 7.3 seconds in the video.
You can also generate the following:
Other standard subtitle formats like SubRip (SRT) and VITec (VTT), which are supported by other media players.
Other transcriptions in different languages, which would make the video’s audio accessible for more viewers. French, for example, has this raw_convert
value:
...
raw_convert: 'google_speech:fr:BE'
...
That code generates a .transcript
file with a French translation. fr:BE
denotes the language and region, Belgium French in this case. Google supports numerous languages and dialects.
Adding Subtitles to Videos
Next, add subtitles to videos on request with video transformations. To do so , add a route on the back end for the uploaded video, which the generated .transcript
file transforms:
app.get('/video', getVideo)
function getVideo(req, res) {
try {
cloudinary.api.resource('videos/video2', {}, (err, result) => {
const video = cloudinary.video('videos/video2', {
resource_type: 'video',
type: 'upload',
transformation: [
{
overlay: {
resource_type: 'subtitles',
public_id: 'videos/video2.transcript',
},
},
{ flags: 'layer_apply' },
],
})
res.json({
...result,
videoElem: video.replace(/poster=/, 'controls poster='),
})
})
} catch (err) {
console.log({ err })
}
}
A few explanations:
In the transformation
property, you’ve added an overlay of the subtitles
resource type and specified the path to that transcript file.
The return value of the cloudinary.video()
method is in this format:
<video poster='http://res.cloudinary.com/dillionmegida/video/upload/l_subtitles:videos:motivational-video.transcript/fl_layer_apply/v1/videos/motivational-video.jpg'>
<source src='http://res.cloudinary.com/dillionmegida/video/upload/l_subtitles:videos:motivational-video.transcript/fl_layer_apply/v1/videos/motivational-video.webm' type='video/webm'>
<source src='http://res.cloudinary.com/dillionmegida/video/upload/l_subtitles:videos:motivational-video.transcript/fl_layer_apply/v1/videos/motivational-video.mp4' type='video/mp4'>
<source src='http://res.cloudinary.com/dillionmegida/video/upload/l_subtitles:videos:motivational-video.transcript/fl_layer_apply/v1/videos/motivational-video.ogv' type='video/ogg'>
</video>
You’ve replacedposter=
with the string controls poster=
and added the controls
attribute to the video
element, as shown here:
The Get Video button at the top makes a get
request to the back end, grabs the video
element, and renders it on the user interface.
Your video is now more accessible, complete with subtitles. If you’ve specified a different language for the transcript, the subtitles are in that language.
Capitalizing on Google’s Automatic Video-Tagging Capability
Besides categorizing or grouping your resources, Cloudinary also tags displays for viewers a video’s category or related tags before the viewers start watching the video. That information greatly helps people with poor vision.
To manually add tags to a video:
- Click the video’s Manage button and then click the Metadata tab:
- Input the tags:
Such a manual process is mundane and time sapping. Automate it with Google’s automatic video-tagging capability instead. Follow the steps below.
- Activate the Google Video Tagging add-on. A free plan is available.
Update the uploadVideo
function in the back end:
function uploadVideo(req, res) {
cloudinary.uploader.upload(
req.file.path,
{
public_id: 'videos/video3',
resource_type: 'video',
raw_convert: 'google_speech',
categorization: 'google_video_tagging',
auto_tagging: 0.7,
},
() => {
res.json({ message: 'Successfully uploaded video' })
}
)
}
The categorization
property sets up add-ons that automatically generate the video’s tags.
The confidence level specified by you for the auto_tagging
property denotes the degree of assurance with which a label relates to a resource. auto_tagging
accepts only tags with a higher confidence level than the one specified. Confidence level 1 yields specific keywords, but only a few. In the code above, the 0.7 level serves as a compromise between relevant tags and sufficient tags.
Since the add-on generates tags asynchronously, they might take a while to appear.
Refresh the screen after a while and you’ll see these results:
Depending on the video’s context, the generated tags might or might not be meaningful for a particular viewer. Nonetheless, the tags always describe the images in the video, such as “cars” and “environments.”
Displaying a Video’s Related Tags
Now obtain the video from Cloudinary by updating the getVideo
function in the back end to read as follows:
...
cloudinary.api.resource('videos/video3', {}, (err, result) => {
...
Your browser’s Networks tab (or in Postman or any API client) looks like this:
You can display video tags any way you desire, for example:
The tags might not be completely accurate so feel free to manually edit them in the dashboard or add other tags. For this video, you could add the tag “motivational quotes,” for example.
Adding Translations With the Google Translation Add-On
The tags you just generated are only accessible by English-speaking viewers only. With the Google Translation add-on, which you can use during image upload or in conjunction with a video for automatic tagging, you can add translations.
Follow these steps:
- Activate the add-on and select the free plan:
- Update the
uploadVideo
function to use the Google Translation add-on with the Google auto-tagging feature for video:
function uploadVideo(req, res) {
cloudinary.uploader.upload(
req.file.path,
{
public_id: 'videos/video4',
resource_type: 'video',
raw_convert: 'google_speech',
categorization: 'google_video_tagging:en:fr',
auto_tagging: 0.7,
},
() => {
res.json({ message: 'Successfully uploaded video' })
}
)
}
The suffix :en:fr
in the categorization
property tells the add-on to generate tags, save them in English and French, and display them in the Cloudinary dashboard:
A look at the resource details through the API yields the following:
The add-on’s data populates the info
property with properties in this flow:
categorization → google_video_tagging → data
Here, the array of generated tags contains a tag
object with the en
(for the English translation) and fr
(for the French translation) properties.
Ultimately, by leveraging this add-on, you can display tags that match the viewer’s location or language.
Summing Up
It’s crucial that web apps that contain media are accessible to all, especially your target audience.
You’ve now learned how to use Cloudinary’s add-ons to improve video accessibility by adding subtitles and automatically generating and displaying the related tags—all in multiple languages as you desire.
Afterwards, your video can reach a broader audience, including those who are hearing or vision handicapped, those who speak other languages, and even those who enjoy watching video with audio on mute.
Cloudinary offers many other robust and effective add-ons. Do check them out.
Top comments (1)
One thing I would say about the Google AI captions is that they can be very inaccurate, sometimes to the point that they become outright obsene. I wrote about this last year when Google removed the Community Captions feature from YouTube videos: ashleysheridan.co.uk/blog/Everybod...
The further an accent differs from what the AI is trained on, the less accurate the captions become.
Also, one last point. There are other types of track that can accompany a video, like subtitles (different from captions), and audio descriptions (of things that might not be speech, for example). The MDN documentation goes into some more detail about what kinds of things you can do with this, but they can be time consuming to create depending on the video.