Wikimedia Enterprise/Breaking news

From mediawiki.org

Wikimedia Enterprise's "Breaking News" feature identifies new articles related to large-scale global "newsworthy" events as they are being written about across Wikipedia language editions at any given moment. These events are then marked with a boolean field, allowing API users to easily identify this kind of content within their copy of the dataset. Please send us feedback for this beta feature.

https://breakingnews-beta.enterprise.wikimedia.com/ 

        Note that this is a desktop only app and is in beta.

The feature is created primarily for Enterprise API users and with community editors in mind. You can help us improve the feature by testing and sending us different combination of inputs that retrieved better results. Surfacing any potentially missing templates from across language projects would help us capture more results. Using the thumbs up and down buttons in the demo to confirm or deny if entries are accurately identified as breaking news, will help us in the long and medium-term in building a better, more accurate tool.

You can access this feature via the WME Realtime API via WM Cloud Services, if you choose to build in top of it.

It is important to note that this feature does not imply a change of Wikipedia editorial policy: most particularly regarding notability, reliable sources - summarized in the English Wikipedia policy "NOTNEWS".

Details[edit]

Screenshot of a demonstration visualization
https://breakingnews-beta.enterprise.wikimedia.com/ 

       Note that this is a desktop only app and in beta.

The Breaking News Feature of the Wikimedia Enterprise API (documentation on Meta) identifies likely "newsworthy" events as they are being written about across Wikipedia language editions at any given moment. These events are then marked with a boolean field, allowing API users to easily identify this kind of content within their copy of the dataset.

The feature is currently in its beta phase, continuously evolving to deliver even more accurate and comprehensive results over time.

The Wikimedia Enterprise engineering team relies on community approaches to editing in crafting features. Consistent with the Wikimedia Enterprise principles (in particular that of "no exclusive content") the information this API feature is built upon is already public information that Wikimedia editors are already commonly using in content moderation workflows – for example "does this article have a sudden increase in the number of pageviews, or of unique editors?", "was this article recently created/moved and have a 'current event' template?". The feature turns that kind of information into a feed of articles which API users can treat differently, if they wish. For example to re-index these articles more rapidly, or to pause re-indexing entirely until the content becomes more stable.

Please contact us if you have recommendations for templates or other pieces of metadata which are used on smaller language projects, or with knowledge of any editing/reading behaviours, which this tool should be aware of. The different editorial policy on each language edition (especially relating to whether events which are still occurring are permitted to have an article) will affect the output of this feature. Using the demo, is in itself feedback. Please read the next section on how!

We want to be able to capture every new and existing news event being worked on across Wikipedia, no matter how big or small, or how niche a certain group of people, or what country or continent it takes place. It’s a huge challenge – especially crafting a feature that works across Wikipedia language projects. Our hope is that with your feedback, we can improve this feature by cutting out false positives and tinkering with the underlying algorithm to generate better, faster, and more accurate results.

The work is heavily based on the research I conducted for our product strategy last year, specifically the section "Sourcing and Source Quality for reusers' success".

We wish to also acknowledge the adage that "Wikipedia is not a newspaper" - commonly summarised in English as "NOTNEWS". This feature does not imply a change of Wikipedia editorial policy: most particularly regarding the article topics are of enduring notability, and built upon reliable sources.

— FNavas-WMF (talk) 19:49, 31 August 2023, Product Manager with the Wikimedia Foundation, Enterprise team

How to use this demo[edit]

  1. Using the "Reaction" column
    • For each row, you’ll see a red thumbs down and a green thumbs up. If the article in the row relates to a news event please click the green thumb, in the opposite case, click the red thumb. After you vote, you can add a comment as to your reasoning if you'd like. You can only vote once per row. If you made a mistake, don’t worry.
  2. Using the toggles
    • At the top of the page, you'll see switch options to adjust parameters of the algorithm which will change what results surface. We'd love to hear from you on which combination of parameters return the best quality of results.
Toggles atop the demo page
Toggles atop the demo page

Other notes

  • The results displayed update every day and number of editors or edits, every half-hour. We will work to cut the update time for the latter two to around five minutes.
  • You may notice that this demo does not include pageviews, that’s because they’re still delayed by 24 hours so not very useful at diagnosing live events. We’re working on what page views can mean for this feature.
  • This is a desktop only app.

How to send us feedback[edit]

There are many ways to get in touch with us. Any feedback is good feedback.

  • Post questions or answers via our metaWiki talk page.
  • Get in touch with me directly with ideas and thoughts at fnavas@wikimedia.org. I’ll get back to you as soon as possible.
  • Finally, as with all of Enterprise’s work, should you find a bug or see a clear opportunity for improvement, please comment in our Phab task or create a new task with our Phab tag. We will do our best to respond in a timely manner.

Technical details[edit]

Consistent with the Wikimedia Enterprise principles (in particular that of "no exclusive content") the information this API feature is built upon is already public information that Wikimedia editors are already commonly using in their content moderation workflows.

WMF provides continuous streams of events happening on wikimedia projects as Server-Sent Events, known as EventStream. These events can be article creation on a certain wikimedia project, article deletion, article revision, etc. The eventstreams relevant to breaking news are: page-create, page-move and revision-create. Page-create stream sends out article-created events. Page-move stream emits an event whenever an article is moved from one namespace to another. Revision-create sends out an event when a new revision/edit is done to an existing article on a wikimedia project. Each event carries information about the editor performing the action (create/move/edit).

The current implementation only considers articles that were either created or moved to the main namespace in the last 24 hours period.

For each event, we fetch templates and categories associated with the article using mediawiki Action APIs. From all the associated templates and categories, we filter the ones that may indicate: a) the article is being heavily edited currently; b) the article concerns a current event. Finally, we save the events along with editor information, relevant templates/categories into a database.