Ongoing work in 2022 for TimedMediaHandler...

Done Phase 1: video.js player

The new videojs-based player is coming out of Beta at the end of February or so. This is a modern open-source video player frontend that's widely used, pluggable, and will easily upgrade to the next stages of work.

Once it's fully activated, we will retire the old frontend using the Kaltura-based MwEmbedPlayer code, which had gotten a bit rickety and hard to maintain, and was one of the few remaining places using jQueryUI and other old code.

Video player popup is cleaner

The popup player for videos is cleaner and behaves more consistently, and should generally be an improvement.

Mobile compatibility

The new front-end code runs on mobile browsers, and in the MinervaNeue skin. On Android, videos will play back using hardware decoding; on iOS they will currently use a software decoder and won't work with Picture-In-Picture, but we have a solution brewing for better integration later on...

New audio experience

We are still working on the best way to handle audio; currently in beta mode, we use the same dialog for video but a bit smaller for audio. This provides plenty of room to interact with the controls -- previously packed into a tiny micro-player -- as well as room for captions, but blocks you from interacting with the rest of the page. We may change this based on feedback to something that doesn't block the page.

Phase 2: adaptive streaming

This work will start after the rollout of the new frontend is complete; so far there's been some research to ensure we can get the formats we need.

Like large image files, videos must be re-scaled and re-compressed to suitable resolutions and file sizes for playback, and there are uses for those different sizes. Unlike image files, videos are too big and expensive to encode 'on demand' well, so we have to make long batch jobs.

Currently, the WebM VP8 and VP9 playback transcodes uses a number of standard resolutions, and one is picked at playback time based on the size of the window -- if you change to full screen, or resize the window down small, you're either missing out on detail or wasting download time on pixels you can't see. In addition, network conditions vary between users, and from moment to moment.

The industry standard solution is adaptive streaming, where the encodings are made in a consistent way such that the player can switch resolutions at regular points in time (say, every 5 or 10 seconds), completely seamlessly to playback. This is how YouTube, and Netflix, and whatever you might watch online all work nowadays.

HLS: HTTP Live Streaming

There are two main formats for the manifest describing how the audio and video track file are laid out: MPEG-DASH and HLS. HLS was created by Apple, but the manifest format was opened and is now a IETF RFC standard. The track files themselves can either be raw frame data (like MP3) or use the MPEG Transport Stream or MPEG-4 part 10 / ISO BMFF / ".mp4" container format. See also T44591.

VP9 on iPhone

Traditionally, the common but patent-encumbered H.264 or HEVC codecs are used, but iOS devices now support the open VP9 codec as well on recent hardware, meaning we don't yet have to revisit our file formats policy about H.264. (But if that ever gets worked out, we could easily add an H.264 encoding for wider compatibility.)

On older iOS devices we would continue to use the ogv.js software codec for compatibility until format wars are over...

Desktop

Desktop browsers other than Safari don't grok HLS directly, and Safari doesn't seem to have VP9 enabled for it, but they all work with video.js's HLS implementation that runs on top of Media Source Extensions.

2-step encode

I'd like to split the streaming encodes into two levels; a fast 1-pass encode in VP9 is a lot faster than a 2-pass encode with higher quality settings, but will take more bandwidth to match quality. Getting things online and playable fast, and then engaging 2-pass encodes to downsize the files only when there's idle time in the system, might be wise.

There should be a way to poke a particular file to the top of the queue for this, and it should be possible to rig that up to a list of top-viewed files, to reduce bandwidth usage for viewers.

Segmented encode

HLS splits the input into roughly evenly-spaced segments of time; I've been testing encodings based on 10-second segments so far. This gives a lot of room between key frames to allow for good compression, but means that a resolution change could take as much as 10 seconds to switch in at the next boundary. Will also do tests with 5-second segments for comparison before settling.

Because each segment is independent in terms of contents, and more or less distinct in terms of rate control, it should be possible to break up the encoding jobs along segment boundaries, and dispatch as many segments as there are available job runners.

This should allow for distributing encoding time across many CPU cores even at lower resolutions that do not offer a lot of parallelism within libvpx, and could really improve the response time of making each resolution transcode available shortly after a new upload.

However if this works it'll be necessary to plan how to schedule and prioritize jobs in case there's a lot of batch uploads.

Transcode management tools

The new transcode system will need better management tools; Special:TimedMediaHandler as currently doesn't scale to present operations, and doesn't allow the control we'll want for new files.

Quotas / limits

Video files can be quite large, and can be quite long. Set some resource limits:

maximum duration to encode, per resolution
- based on target bitrate of the fast encoding
- note that higher frame rates require higher bitrates; scale but cap at 60fps
- stop when the next transcode would exceed the quota (say a max max of 4 GiB might be 15-20 minutes of 4K60, 1 hour at 1080p, 3 hours at 480p60, or 12 hours at 240p60)
- allow for admins to approve further encodings? (this would only work if there's not a physical max file limit reached, or if the files are chunked)

Downloading

One complication is that it will become harder to download playback derivatives, because audio and video tracks have to be separated in distinct files. We might even choose to keep time segments in separate files, as this has some helpful properties for distributing encoding.

If it's necessary to download scaled derivatives as ready-to-play files, a "remuxer" can be built that pulls the chunks and produces a WebM or MP4 download (using the VP9 encoding).

Future ideas

Things I can't commit time to yet but would love to work more on:

Trim/crop tools
Upload helper using WebCodecs to transcode from MP4 into VP9+Opus client-side
Download helper using WebCodecs to transcode from VP9+Opus into MP4 H.264 client-side
Subtitle/captions editor
Support Creating VTT subtitles
Support marking subtitles as being SDH or normal subtitles
Share widget for video and iframe embedding
Extend metadata support
360 video support