Reading/Multimedia/Structured Data/Developer Diary

This is a collection of writings from the Multimedia team on their work, challenges, and successes on the Structured Data project.

Q3-Q4 2017-2018
During Q3 and Q4, the Multimedia team focused on several different areas of work.

Search indexing for SDC
Cormac was responsible for adding various SDC concepts to search indices, namely the captions, descriptions, and some properties. Qualifiers, a much more complex subject, are currently underway - there are significant questions as to whether they will be represented in a useful way in the search index, mostly relating to non-text fields. For example, an image whose metadata include "depicts -> dog, quantity -> 3" might be indexed, and searching for images that include three dogs would be possible. But searching for an image that includes more than 2 dogs may not be possible with our current search indexing strategies, or at least, not in a performant way.

File page integration
This work, primarily undertaken by Mark, was largely done in the Wikibase and WikibaseMediaInfo code bases.

First, Mark began familiarizing himself with the systems already present. Much of the system had already been defined, so this familiarization process took some time. Then, building on the API work done in Q1 and Q2, Mark added a hook to the file page that retrieved the MediaInfo page based on the existing system for determining that relationship.

First attempt
The first attempt was to simply get the JSON representing the entity, and rendering it in a helpful way. This approach was complex, and would have significantly hindered future work, because it would split the code paths for rendering MediaInfo objects into two. Especially looking forward to the completion of MCR work, it didn't make sense to continue working on that path, so it was abandoned.

Second attempt
Second, Mark attempted to use the ParserOutput obtained from various MediaInfo entity objects to simply dump the rendered page onto the file page. This approach worked also, but had significant shortfalls - mostly, because the ParserOutput didn't have the proper context, and also, this would have been yet another splitting of code paths for very little benefit.

Final version
Finally, the file page prototype was completed by temporarily copying much of the code from Wikibase to render the MediaInfo entity in the file page hook. This solution also meant circumventing the usual TermsList placeholders by using a SimpleTermsListView object instead of the usual PlaceholderEmitting version. This approach should be used in other non-Wikibase pages to render an entity's TermsList.

API work
During this time, Mark took on the task of learning about and modifying the existing MediaInfo code to modify how API requests, especially for wbgetentities, was handled for file page titles. The API work proceeded slowly at first due to Mark having a difficult time asking the right questions of WMDE folks, but ultimately, the API was successfully created and merged, and wbgetentities now does the right thing when asked for entities related to a file page.