Extension:TimedMediaHandler/TimedTextRework

From mediawiki.org

Content: All this stuff should be rewritten to ContentHandler. The inspiration for this might be Proofread extension

  • TimedTextContent:
    • subtitle format definition
    • current format
    • current type (captions/subtitles/chapters etc)
    • language ?
    • getDataInFormat( SRT/VTT/SSA )
    • Possibly also separate wikitext block for license ?
    • Validation ?
  • TimedTextContentHandler
    • serialization
    • initial keep as wikitext
    • future convert to json ? Also store language, type and license ?
  • Editpage
    • allow you to link with a file ?
    • allow you to set language and type ? (captions/subtitles/chapters etc)


Problems:

  • It's SRT, but with wikitext.... served as HTML'ish
  • VTT has become the defacto, HTML5 captioning standard
  • Our text tracks reference action=raw, but our usage is api.php
  • We currently assume one single format, but things change
  • Old versions cannot be served
    • But I don't think we need them either...
  • The current SRT wikitext is trivially translatable with the Translate extension, except that TMH checks the wrong titles by default (title.$code.srt instead of title.srt/$code). The new format should play well with Translate from the start.

Todo:

  • √ Create new <track> URL
  • √ Something like endpoint.php?&title=File:Example.ogv&format=[srt|vtt]&lang=en&type=[captions|subtitles|chapters]
  • √ would find a subtitle in a language that goes with the file
  • √ Allow to output as either VTT or SRT
  • √ This will allow us to switch more easily from SRT in TMH to VTT in VideoJS
  • ? Validate content. Especially VTT allows for a subset of HTML tags, that we probably should validate
  • √ What is the canonical entry point for subtitles ?
    • Do we reference the videofile, and then look up the subtitles file (possibly even one inside the videofile?).
    • Or do we use the subtitle file name directly ?
    • Or do we support both: Might be wise, since webm can contain subs, but ogg not. And we have the .srt subs already...

Api Idea:

  • √ Introduce a new API module named "timed text" -> ApiTimedText
  • √ Module has two output formats srt and vtt -> ApiFormatSrt, ApiFormatVtt
    • Better to do it in a manner similar to how the various action=feedfoobar modules do it than pretending to have a generic formatting module: subtitleformat=[srt|vtt] and just ignore the global format=.... In this case you probably don't even need an equivalent to ApiFormatFeedWrapper, just use ApiFormatRaw.
    • First go at an API module for this
  • Use Captioning project to convert between formats

Alternative:

  • restbase, but we need a core endpoint for that as well
  • complete new endpoint ?
  • ResourceLoader?
    • To me this feels similar to use case of serving gadget js or css sourced from pages. RL should be a good fit in theory? --brion