Talk:Wikimedia Technology/Annual Plans/FY2019/CDP3: Knowledge Integrity

About this board

Re: Contributors, tool developers and partner organizations can understand accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories.

5
Kerry Raymond (talkcontribs)

I would like it if the CopyPatrol had tools that understood about CC sources (it would be nice if the people did too) and other random editors who think they can spot copyright violation.

I work with a lot of material released under CC-BY, and I get tired of the accusations of copyvio (including speedy deletion of my new articles) despite my clear attribution within the article that it is drawing from an identified CC-BY source. I follow the rules but others don't even seem to understand those rules.

The first common misunderstanding is that an assertion of copyright over a webpage does override a CC notice. If an organisation is to license its content under CC-BY, then it must be the copyright holder to do so. There is no incompatibility between a copyright notice and a CC license; it's to be expected. It is not a case of "find a copyright statement and declare a copyvio", but look for the CC license.

In particular, our tools need to look beyond the individual webpage which does not contain a CC-BY statement, but to the copyright notice on the bottom of the page links to a copyright statement which CC-BYs all (or most) pages within the website. As an example, suppose I were to copy content (appropriately attributed) from this webpage onto Wikipedia:

https://www.business.qld.gov.au/industries/farms-fishing-forestry/agriculture/land-management/health-pests-weeds-diseases/weeds-diseases/woww

As you can see, there is nothing on this page that says it is CC-BY, but the link at the bottom of the page labelled Copyright links to this page:

https://www.qld.gov.au/legal/copyright

which says that the whole website (except where explicitly excluded) is CC-BY-4.0.

We also need easier ways to attribute suitably licensed CC-material, particularly when there are large CC-BY websites (such as the one above - the primary portal for the Queensland Government). I don't mind doing a "big" attribution statement when I created an article largely from a CC-BY source, e.g,

https://en.wikipedia.org/wiki/Junction_Park_State_School

but if I just want a few sentences from that source, and then a few sentences from another CC-BY source etc, then the attribution starts to become very heavyweight (could end up with more bytes of attribution than the text taken from those sources!) and indeed such "end of article" attribution cannot indicate which portions of the article were derived from it. I think it would be much better if the cite templates took a parameter to signal the licensing of a source where this is relevant and displayed the little "CC-BY" icon (or whichever license icon is applicable). Alternatively we need better attribution templates. Currently I have to write my own attribution templates as in the case above, or manually write an attribution as in:

https://en.wikipedia.org/wiki/New_Mapoon,_Queensland

because the existing templates are inadequate. Another piece of the problem is that some people believe you must use one of the existing templates to avoid the accusation of copyright violation.

CC-BY material is a great way to get a lot of content onto Wikipedia. In the example above, we were using such content to fill knowledge gaps on Wikipedia about Indigenous Australian communities (several articles were expanded using that Qld Govt website). Let's encourage the use of CC-BY content instead of the current practice of making it as unpleasant and as difficult as possible.

Now, above I am discussing CC-BY content. But if the content is CC0 or PD, then I don't have to make attribution at all. But again we have no clear policies and proceses in place about how to do this. I often write biographies based on obituaries that appeared in the pre-1955 Australian newspapers (which are both out-of-copyright in Australia and extensively digitised), e.g. I might want to write a biography based on this newspaper article:

https://trove.nla.gov.au/newspaper/article/53048297

Do I need to do anything more than cite it? Is there anywhere I should be asserting {{PD-Australia}}? Again, if the cite templates included the capability for me to add the {{PD-Australia}} to an appropriate field, there would be a clear assertion by me that this is public domain material and hence not a copyvio. At the moment, I left at the untender mercies of any random editor who decides to call it a copyright violation.

Ocaasi (WMF) (talkcontribs)

Hi Kerry,

Again it's a hard problem to distinguish license metadata on web pages, most of which is unstructured and not machine-readable. It's also hard to know that a CC statement is accurate, also because most people don't use a machine-readable format.

Speaking from my experience more as an editor than Foundation Staff, I believe it is customary to just link to the CC content in the edit summary of the addition. As far as citation template fields go, you'd likely want to propose this at https://en.wikipedia.org/wiki/Help_talk:Citation_Style_1 which is the main citation template for ENWP.

Cheers, Jake

Kerry Raymond (talkcontribs)

I have previously asked at the citation template page without success. They don't see it as a matter for the reader.

Kerry Raymond (talkcontribs)

I think linking in the edit summary is insufficient for attribution, particularly when the license specifies a desired attribution. I don't think we encourage people to license on CC-BY if we don't make a genuine effort to attribute.

Ocaasi (WMF) (talkcontribs)

Kerry, I follow the guidance here: https://en.wikipedia.org/wiki/Wikipedia:Copying_within_Wikipedia.

"Copying content from another page within Wikipedia requires supplementary attribution to indicate it. At minimum, this means providing an edit summary at the destination page – that is, the page into which the material is copied – stating that content was copied, together with a link to the source (copied-from) page, e.g., copied content from [[page name]]; see that page's history for attribution. It is good practice, especially if copying is extensive, to make a note in an edit summary at the source page as well. Content reusers should also consider leaving notes at the talk pages of both source and destination....

The Wikimedia Foundation's Terms of Use are clear that attribution will be supplied: in any of the following fashions: a) through a hyperlink (where possible) or URL to the article or articles you contributed to, b) through a hyperlink (where possible) or URL to an alternative, stable online copy which is freely accessible, which conforms with the license, and which provides credit to the authors in a manner equivalent to the credit given on this website, or c) through a list of all authors. (Any list of authors may be filtered to exclude very small or irrelevant contributions.)...

Attribution can be provided in any of the fashions detailed in the Terms of Use (listed above), although methods (a) and (c) — i.e., through a hyperlink (where possible) or URL to the article or articles you contributed to; or through a list of all authors — are the most practical for transferring text from one Wikipedia page to another. Both methods have strengths and weaknesses, but either satisfies the licensing requirements if properly done.

Reply to "Re: Contributors, tool developers and partner organizations can understand accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories."

Qualification of citations, retractions, and other annotation of primary sources

5
Egon Willighagen (talkcontribs)

To build trust (I generally don't like the term, and prefer transparency) we need to be able to indicate if primary sources are still, umm, "valid". Some research turned out to be useful for some time, and still present interesting studies, but the conclusions are now false. Some research was based on fraudulent research and got retracted. Annotating primary sources in Wikidata with various forms of qualification is essential to moving science forward: ignoring mistakes and misconduct and not showing we know how to handle that, will contribute the a further blur of facts, fake, and fiction. If we cannot indicate a source was proven wrong, we will forget it, cite it again and again, and generally not learn from our mistakes.

Therefore, I think part of this proposal should be to make a start with developing models that adopt various community proposals that work in this area. I do not anticipate this to be solved in the first year, but doing something impactful over the period of the full project sounds quite achievable: data and tools are around, but integration and awareness is missing, two actions core to the proposal already. The first year (this annual plan) could work out a plan to integrate the the resources.

The first resources I like to see interoperable are those that provide information about retractions. These include the RetractionWatch database and PubMed (CrossRef may also have retraction information). Interoperability would start with the creation of suitable properties and a model that describes how retractions are showing up in Wikidata (probably with suitable ShEX). For the RetractionWatch database, maybe a property for their database entries may be sufficient. The more provenance about the retraction, the better, however.

The second source of information are citations. We already have a rich and growing citation network, but the current citations do not reflect the reason why the citations was made: this can include agreement, reuse of knowledge and data, but also disagreement, etc. The Citation Typing Ontology nicely captures various reasons why an article is cited. Some articles are cited a lot, but not because the paper turned our solid (e.g. http://science.sciencemag.org/content/332/6034/1163).

The importance is huge. The effort will be substantial, but the work can be largely done by the community, once the foundation and standards are laid out by Wikipedia/Wikidata. Repeatedly we find people citing retracted papers, citing papers with false information, and that alone is by scholars who read a substantial part of the research in their field. The impact will be substantial and use cases are easy to envision: use in policy development (which research should our governance based on, and which not), research funding (what is the long term quality of research at some institute (boring but solid, versus exciting by risky)), and doing research itself (is this paper still reflecting our best knowledge).

Of course, without this foundation we keep running into questions of reliability in Wikipedia too: can you automate alerting editors of articles where a cited paper is now considered false? Or regarding research, can false/true source ratio information be used to identify Wikipedia articles of dubious nature?

I fully understand that Wikimedia would go beyond state of the art of the research community, but the community is not doing itself. Just like it was not doing an Open, domain-independent resources, which turned out of great use in and to science. If our goal is the collection of all knowledge, this collection is not a mere pile of more and more knowledge, but must be bound by carefully judging the quality of that knowledge. For this, tracking the above types of information (retractions, citation types) is essential, IMHO.

Fnielsen (talkcontribs)

Retractions and fraudulent papers are a minuscule part of problematic papers. By far the largest problem comes from "normal" scientific papers. There is in my opinion a massive problem with ordinary science. Some parts of science refer to this as the "reproducibility crisis". It is my guess that this affects a considerable part of bioinformatics, including parts of the information already entered in Wikidata. We already got an "erratum" property. Perhaps we need a "paper status" property to capture retracted papers. But I do not see how we can easily handle the problem with "normal" papers. I suggested a property to record pre-registered meta-analysis. I guess such kind of properties could help.

Dario (WMF) (talkcontribs)

@Egon Willighagen @Fnielsen thank you for taking the time to write this up. I very much agree with you, being able to annotate "source quality" through a mix of manual and automated curation, and propagating this signal to data reusers, to me is one of the key value propositions of WikiCite. There has been some discussion on how to model formal retractions in Wikidata, and I believe this is something the contributor community should be fully empowered to decide and implement autonomously. I like Finn's suggestions to start hacking a possible model (though I am not sure a single "paper status" property would be sufficient). If there's anything specific that you think we could support programmatically in the next (fiscal) year to facilitate these efforts on WMF's end, I'd be very interested in discussing it.

Also, as you probably remember–because I've been a bit of broken record about this :) – the advanced user stories you mention (such as notifications to editors when they try and add a disputed source) are important but currently blocked on:

1) the existence of sufficiently rich and vetted data models for the variety of publication types cited across Wikimedia projects (if the goal is to extend this beyond the scholarly domain);

2) high-quality and comprehensive data coverage, we're still far from being at a point where all sources cited in Wikimedia projects are captured in near real time in Wikidata. My hope is that some of the proposed technical directions mentioned in the program (the reference event stream, Citoid integration in Wikidata) can help significantly accelerate this work;

3) the existence of a community supported data model to represent source quality in all its relevant dimensions (see above);

4) most importantly, and depending on all of the above, the existence of APIs for reading source metadata from Wikidata and gracefully allowing Wikipedia editors to rely on this. This is a big product intervention fraught with concerns about cross-project contributions that need to be addressed before your idea becomes a reality.

Regarding the second proposal–which I read as adding some kind of support for a Citation Ontology in Wikidata–much as I like the idea in theory (I have been one of CiTO's early enthusiastic supporters) I am skeptical we will ever get a critical mass of contributors to annotate citation types for a meaningful portion of a bibliographic corpus. The problem is 1-2 orders of magnitude larger than the creation and curation of bibliographic records, and requires much more effort (actually reading papers), than other types of curation workflow (such as entity disambiguation, reconciliation etc). I don't see yet how this could be viable until a solid bibliographic corpus exists, a more complete open citation graph (right?), and much more scalable human curation systems.

GerardM (talkcontribs)

Hoi, I tried to get this discussion going. Someone decided that a "retracted paper" should be a subclass of a paper. He insisted on edit warring on the subject and consequently I find that the community is not empowered to decide and implement. I have no interest in such nonsense. Thanks,

Fnielsen (talkcontribs)

I also see some difficulty with CiTO. Some can already be inferred from Wikidata, e.g., Self Citation, while others may be "up for interpretation", where is the border between ''disputed by'' and ''disagreed with by''.

Reply to "Qualification of citations, retractions, and other annotation of primary sources"

Re: understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories.

4
Kerry Raymond (talkcontribs)

One of the problems I encounter when trying to resolve a "citation needed" is that often I can find the information on a website (thanks Google) but, as most websites don't cite their souces and Wikipedia is soooo popular, it's not uncommon to find the same sentence/paragraph, word-for-word or substantially similar. Is the Wikipedia text a copyright violation of this website? Am I seeing an unattributed copy of Wikipedia material on this website? While I can (with some effort) find the diff that added the information to the Wikipedia article, I know the date the info appeared on Wikipedia, but most websites don't even date the webpage (beyond perhaps refreshing their annual copyright notice C. 2018), let alone provide a history. And even if the sentences are different (although how many ways can you say "Joe Bloggs was born in Sydney on 20 December 1880"), there's still no guarantee that the information content didn't come from Wikipedia. The line between Wikipedia and external sources is now totally blurred. After 17 years of Wikipedia, even offline sources like books (once seen as "authoratitive" and definitely distinct from Wikipedia) may now be containing Wikipedia information content. Are pre-2001 sources the only safe haven?

Should we have a campaign to ask websites to stamp pages with a "Guaranteed: No Wikipedia inside" so we can use them with more confidence?

It's all very well to say Wikipedia is a tertiary source drawing on secondary sources, but if we can't distinguish between a secondary source and a quaternary source, maybe we need to revisit the role of primary sources in Wikipedia. Maybe we have to stop being crowdsourcers and start being scholars?

Ocaasi (WMF) (talkcontribs)

Hi Kerry, you've identified a "hard" problem of knowing what came first, Wikipedia or the publication. With many websites lacking datestamps, resolving this issue is not trivial in many cases. The closest work being done on this that I'm aware of is based off a partnership we formed with plagiarism detection company Turnitin (iThenticate) now live at https://tools.wmflabs.org/copypatrol/en CopyPatrol. It's a great tool, but even with its very sophisticated algorithm and huge corpus of materials to check against, determining if Wikipedia was first or second requires human review.

Do you think that this program should incorporate improvements on CopyPatrol or other plagiarism tools? At the moment that is handled by the Community Tech Team, although it'd be something we could look into further developing if you think new features could help address this conundrum. Cheers, Jake

Kerry Raymond (talkcontribs)

If the webpage in question can be found in the Internet Archive, we do have some way to timestamp that webpage. If we had a version of WikiBlame

http://wikipedia.ramselehof.de/wikiblame.php?lang=en

that worked over the Internet Archive for the same content in parallel with the Wikipedia article, we may be able to get a range of time in which that information first appeared on that webpage (but nothing like the precision we get with our versioning). But it might be sufficient to establish that the Wikipedia content was definitely before or definitely after that webpage, which then allows to decide if one is likely to be the source (or the copyvio) of the other. Of course, if the Wikipedia timestamp is in the middle of the range for the webpage, we are none the wiser. And it doesn't rule out that both have a common third source, (which might be another Wikipedia article or another webpage on that same website, since both can get refactored).

Ocaasi (WMF) (talkcontribs)

This is a neat idea, and I wonder what it would take to intersect Turnitin and Wayback Machine. It doesn't sound trivial, but I'm happy to talk to Mark Graham at IA about it (he runs Wayback). -Jake

Reply to "Re: understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories."

outreach to librarians via webinars

5
Slowking4 (talkcontribs)

i would like to see outreach (outcome 4) follow-up to training library professionals like the OCLC webinar effort. it provides basic skills and confidence building. you could incorporate learning patterns and collaborate more with OCLC.

Ocaasi (WMF) (talkcontribs)

Hey Slowking, conversations about how to adapt/adopt/expand the amazing OCLC program are happening among TWL, WMF grants, and separately in the Wikimedia and Libraries Usergroup. I am not sure that it fits within this CDP but it is definitely on a bunch of people's radar. TWL continues to collaborate with OCLC and their Wikipedia + Libraries course gives us all the more reason to continue doing so. It was phenomenally well done. They are releasing their videos and course materials and we will be taking a close look at them while looking at opportunities for OCLC to continue leading in this critical area.

Slowking4 (talkcontribs)

thanks for your conversations. i would like to see outreach formalized, so the multiple approaches / methods are documented. to the extent WMF can partner knowledge production with others, and build in knowledge quality as a community value is important. not sure how to make metrics, but documenting effort (contacts made; partners engaged) would be good.

Ocaasi (WMF) (talkcontribs)

I appreciate that desire. The partnership has been multi-pronged over nearly 7 years now, and parts of it are at least roughly documented at https://en.wikipedia.org/wiki/Wikipedia:OCLC. Their Wikipedia + Libraries course was itself a partnership in that their WIR was funded by a WMF grant. There are good learnings documented on the WMF blog (https://blog.wikimedia.org/2017/12/13/monika-sengul-jones-interview/) and in the grant report. I'll also ping the grant officer for W+L @Mjohnson (WMF) to see if she has ideas about how we can document our relationship more publicly and consistently. Cheers, Jake

Kerry Raymond (talkcontribs)

As a member of the Wikipedia and Libraries User Group and as a member of the WIkimedia Australia chapter, I can confirm that there is a lot of interest in trying to roll out the OCLC webinar program more widely. Certainly I would like to run it in Australia, but, while the materials will shortly become available from OCLC under a re-usable license, there are some significant issues that I would have to address to make it work.

The biggest hurdle is that it's a huge piece of work to roll out. OCLC had paid staff involved in their program. If I run it in Australia, we have no staff in Wikimedia Australia so running a 9 week program with just volunteers will be challenging. Wikipedians can contribute on-wiki when they wish according to their own timetables of free time (when I am not at work, when I am not doing family things, etc). It is very difficult to get volunteers to support individual outreach activities, let alone commit to a 9 week program which would be most likely delivered during the working day.

A second hurdle is get the engagement of librarians into the program. The best way to make that happen is to get the program into their professional development regime so doing the program would tick a box for them professionally and be useful to them for applying for jobs and for promotion. So we have to persuade those who run that PD regime that it is something suitable for their program and we may have to repackage it in some way to fit their framework.

A third problem, and I know it was a concern in the OCLC program (where I was one of the mentor/guides), is not to falsely raise expectations in librarians that there is a large local volunteer community of Wikipedians that they can rely on for advice, events and so on. We don't know where Wikipedians live (unless they choose to disclose it); we are not generally able to introduce librarians to "local Wikipedians" which comes as some surprise to librarians. If the rollout of the OCLC program in Australia was successful, it might lead to more Wikipedia events held by libraries which, if not supported by experienced Wikipedians, may lead to problematic outcomes. There is already hostility within the Wikipedia community about edit-a-thons and other events which bring a lot of newbie Wikipedians on to Wikipedia with inadequate supervision resulting in a lot of edits that are reverted. But finding volunteers for events is like pulling teeth.

As you can see, my first and third points are about Wikipedians volunteers willing to do outreach. This is a major stumbling block for us in rolling out these large programs. We have too few people willing to do outreach and we are spread too thinly across too many activities as is.

A final issue is the internationalisation of the program in terms of language (it's only in English) and in terms of examples (which are drawn from English Wikipedia and predominantly about USA topics). Being an English-speaking country with a culture not that dissimilar to the USA, I would probably run the program as is, although it would probably be more effective in Australia if we had more Australian examples, including more interviews with Australian librarians and Australian Wikipedians (but there's a time and effort cost to doing this). But more generally, where the language spoken is different, the Wikipedia is not English Wikipedia (meaning all the examples and policy content has to be rewritten, interviews translated or new ones made) and the cultural gap is greater (what's realistic in terms of library programming is different in a more affulent country to a poorer country), there is a LOT of work to roll it out.

To roll out the OCLC program in other places is likely to need funding both to make necessary redevelopment of the material suitable for a local audience (e.g. translating and local examples) and to support participants during delivery.

Reply to "outreach to librarians via webinars"

Does "Knowledge as a Service - increase reach" require linking to, or from external resources?

13
James Salsman (talkcontribs)

Why do the several outcomes, targets, and outputs associated with the top-line goal to "increase reach" discuss only linking to external resources?

Does that actually even increase reach? Isn't reach defined as the number of people who (can) find us, not the number of things people can find from us?

Shouldn't we, if we wish to work to satisfy the goal as stated, work to make sure that the external sources who re-use our content for their customers link back to us, so that the content can be further updated? Or is there a new definition of reach being used for this goal? James Salsman (talk) 11:30, 18 April 2018 (UTC)

Ocaasi (WMF) (talkcontribs)

Hi James. Increasing reach in this instance is primarily through building a web of interconnected, structured, linked data and relationships. We don't control when others link to us, but we can map out the entire 'graph' of who we link to and how those links are related to other data components and sources. We are also increasing reach in other ways, by empowering better research which reaches readers--and by better informing the public of Wikipedia's verifiability and citation practices, to increase their understanding and better inform their usage of the site. Cheers, Jake

James Salsman (talkcontribs)

Hi Jake, are there any sources which define "reach" as the number of outgoing external links instead of the number of incoming channels?

Do the Executive Director and the Board agree with this nonstandard definition? If so, would you please publish their directly quoted statements to that effect? Thanks. James Salsman (talk) 19:40, 19 April 2018 (UTC)

Ocaasi (WMF) (talkcontribs)

James, it seems like you may be misinterpreting what we mean (or we're miscommunicating it). Take link archiving for example. If those links are dead, readers cannot reach content we cite. Yes, it exists outside our site domain network, but it's definitely a benefit of readers to reach them with working urls that they can reach. In the context of mapping how we link to external sources, that map itself is a product that increases our reach, because citation graphs are traditionally proprietary and closed access. The link graph *is* the product, and a tool in itself. Not only that, but having the link graph makes it easier to find related sources and do better research, which ultimately reaches readers.

James Salsman (talkcontribs)

@Eloquence: what is your definition of "reach" in the context of Wikimedia metrics? James Salsman (talk) 14:26, 20 April 2018 (UTC)

James Salsman (talkcontribs)

I have proposed a resolution to this at .

Does it require additional budget, FTE resources, or training? James Salsman (talk) 15:03, 18 April 2018 (UTC)

Ocaasi (WMF) (talkcontribs)

Hi James, I have reverted your edit so we can first discuss it here. Issues of privacy, "fuzzing", compliance are definitely significant increases in the scope of this program and would involve not only additional technical capacity but investment of legal counsel as well. At the moment all issues concerning privacy should be addressed directly to privacy{{at}}wikimedia.org. If you would like to propose those measure, that is the place to start. Cheers, Jake

James Salsman (talkcontribs)

Jake, has anyone performed a cost-benefit analysis of subpoena processing costs under Congress's new amendments to the Section 203 safe harbor provisions? Do you agree that it would be good to consider how the subpoena processing cost burden changes under storing and not storing personally identifying information (not I am not suggesting refraining from storing cryptographic hashes of IP addresses such as would be necessary for checkusers to investigate sockpuppetry)? If so, would you please ask for the question to be considered by Legal? Thank you. James Salsman (talk) 19:44, 19 April 2018 (UTC)

Ocaasi (WMF) (talkcontribs)

I will certainly pass the question on to Legal. Thank you! Jake

Ocaasi (WMF) (talkcontribs)

Hi James, these questions are way beyond my area of expertise and are squarely in the real of the Foundation legal/privacy team. I will forward your concerns to them. Cheers, Jake

This post was hidden by Ocaasi (WMF) (history)
This post was hidden by Ocaasi (WMF) (history)
Kerry Raymond (talkcontribs)

I think the notion of "reach" is linked to our notion of "gap". We see a content "gap" and then we think maybe there is a contributor "gap" underlying it. But maybe there is also a source "gap" which underlies it and so on.

I think the same is true about "reach". While reach is inherently about readers (people), I imagine that (apart from inability to access the web for whatever reason), any shortfall in our reach must relate to people's lack of interest in our content. Given that search engines tend to drive people toward Wikipedia, I don't think lack of awareness of Wikipedia is the major issue here. Lack of interest in our content might relate to a content gap, e.g. we don't have articles about topics they are interested in, or not have it in their language, or because they judge the content to be inferior (or have been pre-conditioned to think it will be inferior, either by personal past experience or teacher/librarian instruction to "not use Wikipedia").

I think we can increase our reach by ensuring that the content we present is as broad and as deep and as authoritative as we can. And obviously more the citations we can provide to authoritative sources and/or online sources the better.

Aside, we do need to convince the librarians about Wikipedia being not as unsafe as they fear. But teachers are a different issue. As a former university professor, the concern about Wikipedia is not necessarily about the quality of the content but that the purpose of the assigned task is to develop the student's own skills in researching and writing (which is not achieved if they just copy from Wikipedia or any other source). As someone who does outreach with university librarians, it is clear that these librarians' position on Wikipedia is now shifting to a more reasonable position of "it's a good place to start but you need to dig deeper using its citations and other sources". Many of them explicitly discuss Wikipedia now as part of workshops they run for students on "how to research an assignment" etc.

Reply to "Does "Knowledge as a Service - increase reach" require linking to, or from external resources?"

Research into citation practices of users and editors

6
Hfordsa (talkcontribs)

So exciting to see plans around this really important issue! I was particularly interested in the plan to: "Conduct research to understand how readers access sources and to help contributors improve citation quality" (on the slideshow) but I can't see where (particularly the first part) is reflected in this annual planning doc? Would love to hear more!

Ocaasi (WMF) (talkcontribs)

Hi Heather! For the past 2 years we've increasingly been working together with a team based out of Stanford. Here is the Research page: https://meta.wikimedia.org/wiki/Research:Citation_Click_Data . You can see the first output here: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0190046. There's a lot more to do with understanding which sources readers access and what the OA-ness of those sources is. We have proceeded slowly to ensure proper privacy and data protections are in place, but it's starting to ramp up now.

Dario (WMF) (talkcontribs)

Hey Heather, on top of the pointer Jake mentioned, there are some overarching questions related to the usage of citations to understand how and when readers rely on them, that we've never been able to answer at scale. These go beyond the questions described in Lauren's proposal (which is very much focused on reference consumption in health-related articles). We're scoping out a more comprehensive project based on this data that we'll pilot in this quarter and expect to extend in the coming fiscal year (from July on) if the Knowledge Integrity project is approved. There's a qualitative component to this work that we're also looking to spec out in the next couple of months (led by @Jmorgan (WMF)) and could use some of your input and expertise (as soon as it's up on Meta).

Hfordsa (talkcontribs)

Super cool stuff from Stanford @Ocaasi (WMF) re. the biomedical data but yes, very important to go beyond medical data. Having studied citation practice in the context of global media events (disasters, elections or terrorist attacks) and read most of the research coming out about citation impact in medical research and the natural sciences I'm really struck by how different the practices and impact might be, but also, importantly, how Wikipedia's citation practices in the context of media events demonstrates the *value* of Wikipedia's approach to current problems of misinformation, fake news etc. I think it's in these areas that we need to really demonstrate how readers and editors understand the (social/technical) *meaning* of citations. It is great to see the beginnings of qualitative work planned in this area, @Dario (WMF)! Personally I think that the approach should involve mixed methods - ethnographers working with data scientists - but you @Jmorgan (WMF) is super experienced with this so excited to see what you come up with. I have the germs of some new work in this area in mind so very interested to see whether we might collaborate... :)

Jtmorgan (talkcontribs)

@Hfordsa glad to hear that you're interested! Lets set up some time to talk. I'm just starting to wrap my head around how we should approach this problem space. So your timing is excellent :)

Hfordsa (talkcontribs)
Reply to "Research into citation practices of users and editors"
There are no older topics