Wikimedia Technical Conference/2018/Session notes/Integrating data into our products

Description: Content is the key offering on Wikimedia projects, but it is also important to provide useful data about that content. Metadata, usage metrics, and content analysis are just a few areas where data can enhance our projects. This session will explore methods and motivations for using data of various types to expand and improve Wikimedia content and tools.

Questions discussed
= Features and Goals =

= Important decisions to make =

= Action items =

= New Questions =

= Detailed notes = Place detailed ongoing notes here. The secondary note-taker should focus on filling any [?] gaps the primary scribe misses, and writing the highlights into the structured sections above. This allows the topic-leader/facilitator to check on missing items/answers, and thus steer the discussion.


 * Voting Legend for Session:
 * Pink post it - non page
 * Blue post it - page element
 * Orange dot - storage on wiki
 * Red dot - storage on wikidata
 * Purple dot - not stored at all
 * Green dot - curation on wiki
 * Yellow dot - curation on wikidata

Goal: we want to know the big YES ideas when it comes to data. Don’t want the meh ideas - want the passionprojects

Generated Ideas:
 * Adding a data tab alongside “read” and “history” that takes you to the wikidata for that page
 * Storage questions: One contributor long-term hope that one day wiki and wikidata will both be merge into a single entity.
 * Contention around where its curated - Sam, Lydia, Bryan, Dmitry all thought that it should be curated through both
 * Multi language auto-generated descriptions from wikidata statements (ideally a short piece of code, esp for short page previews)
 * Some thought ideally not stored but will end up being cached
 * Not curated because not an edit
 * Lead image + focus rect
 * Ex, thumbnails from search results, but right now we don’t have a good way of focusing on the center of each image to generate thumbnails
 * Question of storage - complicated, coordinates could be stored on base on commons
 * Curation split votes - lead image is curated by the wiki but the region should be on structured data, but will probably be overwritten
 * Does it always have to be done on wiki or?
 * Open graph metadatas to interact with social medias
 * Cached but not actually stored?
 * Could be an mcr slot
 * Not curated according to votes
 * Related articles / see also
 * Storage: inferred based on elastic search; however community members thought they should be able to manually override. JonKatz says all three are viable storage options
 * Lydia: makes sense to override that locally but generally inferred (after Lydia’s statement Jon got rid of the wikidata half of his sticker)
 * CATEGORIES (mcr)
 * Esp in mcr, since now it duplicates the entire content currently every time the category changes
 * Q about storage - presumably this could spans across languages and could be storage on wikidata
 * Contention
 * Not currently consistent
 * Content creation metadata
 * Examples: number of contributors, page activity, most disputed content, blame tool
 * Templates leading to ontology and semantics
 * Do we add potentially multiple concept wikidata ids to the template OR do we have it all be done on wikidata?
 * If you can map templates to ontologies no matter where they come from, that gives you a plug for it
 * Already possible?
 * Split vote concerning storage AND concerning curation - definitely a question to be answered
 * Structured page / section data + Semantic article content mark-up + Clear separation of in-article content from its presentation
 * Opens the door to things like mixing and matching data from different projects