Technical decision making/Decision records/T274181

With the feedback from Technical Decision Forum's representatives, Structured Data Across Wikimedia Architecture (SDAW) team was as able to exit the process early. The techincal decision forum process help the team to realize the proposal was at a larger scale than originally anticipated. With a clearer understanding of their decision proposal the SDAW team will work on a more focus frame work to bring the Technical Decision Forum. The feedback the SDAW team received are as follow:

Question:
Was the problem clearly stated

Respond Percentage:

 * The Problem statement talks about tagging articles, paragraphs and even sentences with relevant language independent Wikidata concepts. Leaving aside the fact that there exist concepts that are not language independent (e.g. https://en.wikipedia.org/wiki/Saudade), I 'd like to focus on the fact that tagging sole sentences is a completely different scale of problem than tagging articles (or paragraphs even). Regardless of implementation, the amount of computing resources required for tagging paragraphs or sentences will be orders of magnitude more than tagging articles. Could this problem be broken down more? Perhaps starting with just tagging articles and scaling up from there? From my understanding a big part of the gains will be more tagging paragraphs (e.g. an intr/summary being an answer to a question) so maybe paragraphs can fit in the initial plan as well. But adding sentences to start with sounds a bit too much to me.


 * I have read the "What" section at least 5 times now. I finally realized that "The first most useful type of metadata is the topic of the content." is the closest thing to an explanation of the goal of this body of work that I can understand. The size of the grant funding the work is irrelevant (as is the fact of an earmarked grant being involved at all). This as the lead of the section obfuscates rather than informs. It is still not clear to me after several re-reads if extracting/identifying/cataloging "sections" is part of the work expected, or if instead this will build on other existing structural decomposition of articles that already somehow exists. There is a paragraph on "Structuring content into discrete sections" but it does not contain statements of proposed action. Instead it merely states that this might be a nice thing for external reusers of content and other workflows without establishing an concrete basis for those statements. I would personally expect the What to be statements in active voice about a problem domain and the high level course of action to be explored. Ideally this would also be written in inverted pyramid/journalistic style so that one does not have to hunt for the important ideas within other tangentially related prose.


 * As stated, it sounds like the problem is the presence of the grant itself.

1. Will there be a difference between tagging cited and un-cited content? Will there be a preference for getting tagged information from cited content? 2. Will people be able to access the website of the citation?
 * I asked members of my team to review the SDAW document and take and one of my teammates has several questions (posted below verbatim):
 * 1) "Looking through the presentation, I'm a bit confused by the application of this idea. I see that they are trying to apply section-level concepts to the lead section, which (ideally) summarizes every section in the article. shouldn't this just reflect article-level concepts, and if so, couldn't they use existing ways of describing a topic (eg category tree - although this might be a useful way of replacing that; existing descriptors on Wikidata)? and how would this work with abstracting out references, since again leads are more typically supported only in body? also how does link analysis interact with project-determined linking standards like enwp's MOS:LINK - eg something linked in an earlier section may not be linked again later? they argue that it minimizes bias vs machine learning - I don't agree that it would"
 * 2) Questions I had are the following: