Daily powers real-time audio and video for millions of people all over the world. Our customers are developers who use our APIs and client SDKs to build audio and video features into applications and websites.
This week at Daily, we are writing about the infrastructure that underpins everything we do. This infrastructure is mostly invisible when it is working as intended. But it's critically important and we're very proud of our infrastructure work.
Our kickoff post this week goes into more detail about what topics we’re diving into and how we think about our infrastructure’s “job to be done.” Feel free to click over and read that intro before (or after) reading this post. You also can listen to our teammate Chad, a Solutions Engineer extraordinaire here at Daily, who's sharing videos on topics throughout the week.
Today, we'll talk about Interactive Live Streaming.
Interactive Live Streaming — ILS — is the video industry's umbrella term for very low-latency streaming to large audiences.
There isn't a formal definition for ILS , but we'll give you our take on what qualifies (and what doesn't) as Interactive Live streaming. We'll explain why the lowest possible latency is important. And we'll outline the benefits of APIs that allow flexible, client-side manipulation of video and audio streams.
We'll also talk more generally about the technologies that are used to broadcast video via the Internet, and how to scale an app to hundreds of thousands of users. You also can watch this video walkthrough of Interactive Live Streaming, by our Solutions Engineer Chad.
Live video with 100,000 of your closest friends
Daily's customers do a wide range of things with our video building blocks, ranging from telehealth consultations, to webinars, to creator economy production studios in the cloud, to social games in the metaverse.
When we're getting to know a new customer, we usually start off asking three questions:
- For your application, how important is latency?
- How many participants will join your live video sessions?
- What kinds of interactivity do you want in your application?
The answers tend to be interrelated in interesting ways, but let's take these topics one at a time.
"Latency" is the total delay between a sender of video (or audio) and a receiver. It's helpful to think of latency as a spectrum from very low latency to infinitely high latency.
Latency is critical for “conversational” use cases. For conversations to feel natural, latency needs to be lower than about 200ms.
Latency can also be important for many broadcast use cases. Any UX that includes a “bring to stage” experience, for example, is much better if the delay in bringing a participant on-stage is imperceptibly low. Similarly, for webinars that include text chat, any delay between the video stream and the chat — "chat lag" — will be frustrating.
We're also seeing emerging use cases that require low latency so that actions between participants are synchronized. Live shopping apps that have auctions need sub-second latency. So do social games that have audience participation, such as trivia contests.
The technology that enables low-latency, real-time delivery of video is called WebRTC. Daily's global infrastructure was designed from the ground up to power WebRTC-based applications. Our goal is to offer the highest performance, broadest array of features, and best developer experience to engineers and product teams building real-time applications.
Let's move on to talking about scale: how many simultaneous participants will join each video session?
Daily allows 100,000 participants in a real-time video session.
And, very importantly, the latency for viewers in a 100,000-participant session is just as low as the latency for the hosts in the session. In fact, latency in our 100,000-participant sessions is the same as the latency in our 1:1 video calls. The underlying technology (WebRTC), the available features and building blocks of Daily's APIs, and the global infrastructure and media routing are the same no matter how small or how big your Daily session is.
Latency in Daily's 100K-participant sessions is the same as the latency in our 1:1 video calls
When we talk about Interactive Live Streaming, we're really talking about a collection of use cases, not about a separate technology stack. That's important because it means that developers only need to use one set of APIs to support both small- and large-group experiences. And product teams don't need to rewrite code as usage of a product grows.
If you're building a fitness app, for example, you can seamlessly support 1:1 personal training sessions and 50,000-person super-trainer live sessions with exactly the same code. Of course, you can also always customize the user experience as much as you want to for the two ends of that spectrum!
One other important thing to note: Daily's pricing for Interactive Live Streaming at scale is the same or better than our competitors' pricing for traditional (much higher latency) live streaming. You can now deliver real-time WebRTC video for the same cost as HLS video (and sometimes at lower cost).
Finally, interactivity. Are viewers in your app completely passive, sitting back from the screen? Or do they engage with your application?
- Do you have chat?
- Do you let users change the layout of video and UX elements in your app? For example, in a virtual classroom can a user change the relative size of the teacher's video and the whiteboard video?
- Can users respond to polls or give other real-time feedback?
- Can viewers become active participants? For example, in a virtual event can you invite people to the stage to ask a question?
- Is there a social aspect to your user experience? For example, in a fitness class, do viewers see their friends in addition to the instructor?
- Can you deliver a different UX for different screen sizes and device orientations?
So what is Interactive Live Streaming?
At Daily, we define Interactive Live Streaming as the combination of real-time, large-scale, and interactive capabilities described in the previous section:
- video and audio delivered at sub-200ms latencies,
- scaling to very large numbers of viewers,
- built on APIs that support rich, interactive, client-side features
Latency, the engagement killer
We pay a lot of attention to latency metrics at Daily. Keeping latency low for all users of our systems, no matter where they are in the world and what devices or platforms they are using, is always top-of-mind for us.
The reason we think latency is so important for Interactive Live Streaming is that delays make it hard to build interactive, engaging, applications.
If you're running a webinar and there's a five second delay on your video stream, that delay becomes ten seconds any time you have an interactive element in your app. For example, when a viewer hears the presenter say something and has a question, they are already five seconds behind the presenter's live stream. Then the viewer types their question in the chat window, the presenter reads it and answers it, and finally the viewer still has to wait another five seconds for the answer. This delay is disruptive for the flow of the presentation from both the presenter's and the viewer's perspective.
Similarly, for a virtual events app with a “bring to stage” user flow for Q&A, any time a viewer is invited to the stage, there's a long delay before the viewer can go live and ask a question.
And for a virtual classroom application, if a teacher wants to answer student questions or call on students individually, the same mechanics apply.
The only way to build truly interactive experiences is to build on top of a technology capable of real-time, sub-200ms latencies.
Scaling low-latency video
Why don't all live video platforms deliver video at real-time latencies?
Historically, it was hard to deliver real-time video over the Internet, and impossible to do so at scale to large audiences. Digital video standards have generally optimized for recorded video use cases. For live video, interoperability with mature, non-real time, TCP-based content delivery infrastructure has also been important. Traditional live-streaming protocols like RTMP and HLS are powerful technologies, but they don't address real-time use cases.
The development and adoption of the WebRTC standard over the past few years has opened up a pathway to truly real-time, interactive live streaming. WebRTC is now built into all major web browsers. And WebRTC works great in native mobile applications, too.
It's now possible to send video at real-time latencies to hundreds of thousands of simultaneous viewers. Hosts can “go live” on any device. Viewers can watch immediately from anywhere in the world, just by clicking a link.
Scaling WebRTC video to large sessions does pose unique challenges, though. For more on how Daily's purpose-built WebRTC infrastructure solves scaling problems, read our Infrastructure Week post about Daily's Global Mesh Network of media servers. And for background on RTMP, HLS, and WebRTC, read our in-depth explainer on how live-streaming protocols work.
User experience, flexibility, and engagement
Real-time delivery, plus the flexibility of a full-featured, WebRTC-based technology stack, opens up new opportunities for interactive applications.
Some of the new and emerging UX patterns that benefit from this flexibility are:
- having multiple hosts,
- using lots of animated, context-specific elements,
- implementing flexible, dynamic layouts,
- and creating opportunities for audience participation.
Bring your audience to the stage, even on 100K-participant calls.
Multiple hosts
Co-hosting is the biggest trend in live streaming today. YouTube calls this feature Go Live Together. TikTok calls it Multi-Guest. Twitch calls it Guest Star.
The value of having multiple hosts is obvious for education, events, and creator content applications. And combining co-hosting with flexible layouts, dynamic elements, and audience participation is even more powerful.
Daily today supports up to 25 hosts and 100,000 viewers for Interactive Live Streaming use cases. (Alternatively, for use cases where more people need to have their cams and mics on at the same time, Daily supports sessions of up to 1,000 participants with all cams and mics on.)
Dynamic elements
New production styles that have evolved organically on TikTok and other creator economy apps are now familiar to most of us. Text overlays drawn from comment threads, countdown timers, flying emojis, live captions, polls and voting, tipping interfaces, context-specific action buttons ... these are all table-stakes now for creator-oriented apps.
Daily's APIs make it easy to build custom versions of these elements for your app and your user base. And with ILS, your user interface code runs on every client, so you can customize the user experience for different roles or even different individual users.
For example, a virtual events app often has hosts, producers, and viewers. Hosts start out in a "green room" before they go on stage. The producer (or producers) helps the hosts with sound and video check, and runs the event. Viewers watch the event, but can be promoted to hosts if that's appropriate for the use case.
In the most sophisticated virtual events apps, the producer's interface is a specialized production studio running in the cloud. And the host and viewer UX is also specialized for each role. Hosts might have presentation notes visible, a manually filtered view of the live text chat, countdown timers, and more. Viewers see live overlay elements triggered by the producer: poll results, graphics, and invitations to the green room when a question is approved and the viewer is about to be brought up on stage.
Flexible layouts
With dynamic UX elements and multiple video streams from co-hosts, having client-side control of the video layout is very valuable.
On a phone in landscape orientation, the best view for two co-hosts is to place the two videos side by side. But in portrait orientation, it's a much better user experience to arrange the videos one above the other.
Daily's APIs allow your app to detect device orientation and rearrange the videos whenever orientation changes. This is possible because Daily's end-to-end WebRTC architecture delivers multiple video tracks rather than a single, composited video stream. For more on how this works, we're discussing our simulcast implementation in a blog post tomorrow.
Adjusting to screen (or window) size and aspect ratio is powerful. But you can also do much more. Students using an educational application will have differing needs, preferences, and learning styles in addition to different screen sizes. Some students will primarily want to see the teacher's camera view. Some will want to pin a whiteboard view as large as possible and make the teacher's camera view as small as possible. Some students may need to foreground the camera stream of a sign language interpreter.
You can give students control over size and placement of video (and all other) UX elements. You can also allow the teacher to automatically control the layout for all students. Or you can build a hybrid of those two approaches.
Audience participation
Most successful creator live streams heavily incorporate audience participation.
Globally visible text chat is a common feature. Other communication features like displaying streams of audience reaction emojis and voting on what a streamer should talk about next are also powerful drivers of engagement and community.
In general, live audiences increasingly expect to be able to interact with streamers and with each other. These expectations are spreading beyond creator live streams to other experiences like online education, virtual fitness classes, live music shows, and live-streamed comedy events.
Relatedly, virtual conference apps are experimenting with how to facilitate the serendipity of the “hallway track” at in-person conferences. Topical drop-in spaces, themed chat channels, and timed breakout rooms are all ways for viewers to become participants.
The value of content on the Internet is always as much about community as about the content itself. Interactive Live Streaming creates new ways to foster community around all kinds of video content.
Live streaming and recording go together like peanut butter and chocolate
Live-stream content often needs to be recorded as well as streamed.
Daily's Video Component System is a toolkit for implementing flexible layouts, custom graphics, and animated UX elements in recordings. VCS makes it possible for recordings to have the same dynamic feel and high production value as apps built with Daily's real-time SDK. For more about VCS, stay tuned for an upcoming Infrastructure Week post.
VCS also enables Daily sessions to be live-streamed out via distribution platforms like YouTube Live and AWS IVS. Daily’s VCS toolkit supports output to mp4 files, HLS storage buckets, and RTMP streams.
Conclusion
The ability to deliver real-time video flexibly to large audiences fundamentally changes what “live streaming” looks like.
At Daily, we’re starting to see a wave of amazingly creative applications that are experimenting with new ways to teach and learn, new styles of video podcasting, new approaches to live commerce, and new modes of hanging out and having fun with friends.
If you have thoughts about these topics or have questions about WebRTC, please come join us on the peerConnection community forum. Everyone is welcome, and discussions range from community notes about new WebRTC releases to feature requests for Daily, and everything in between.
Top comments (0)