Wikimedia Enterprise/Breaking news

Summary
Wikimedia Enterprise's "Breaking News" feature identifies new articles related to large-scale global "newsworthy" events as they are being written about across Wikipedia language editions at any given moment. These events are then marked with a boolean field, allowing API users to easily identify this kind of content within their copy of the dataset. Please send us feedback for this beta feature. https://breakingnews-beta.enterprise.wikimedia.com/ - Note that this is a desktop only app and in beta. The feature is created primarily for Enterprise API users and with community editors in mind. You can help us improve the feature by testing and sending us different combination of inputs that retrieved better results. Surfacing any potentially missing templates from across language projects would help us capture more results. Using the thumbs up and down buttons in the demo to confirm or deny if entries are accurately identified as breaking news, will help us in the long and medium-term in building a better, more accurate tool.

It is important to note that this feature does not imply a change of Wikipedia editorial policy: most particularly regarding notability, reliable sources - summarized in the English Wikipedia policy "NOTNEWS". https://breakingnews-beta.enterprise.wikimedia.com/ - Note that this is a desktop only app and in beta. The Breaking News Feature of the Wikimedia Enterprise API (documentation on Meta) identifies likely "newsworthy" events as they are being written about across Wikipedia language editions at any given moment. These events are then marked with a boolean field, allowing API users to easily identify this kind of content within their copy of the dataset.

The feature is currently in its beta phase, continuously evolving to deliver even more accurate and comprehensive results over time.

The Wikimedia Enterprise engineering team relies on community approaches to editing in crafting features. Consistent with the Wikimedia Enterprise principles (in particular that of "no exclusive content") the information this API feature is built upon is already public information that Wikimedia editors are already commonly using in content moderation workflows – for example "does this article have a sudden increase in the number of pageviews, or of unique editors?", "was this article recently created/moved and have a 'current event' template?". The feature turns that kind of information into a feed of articles which API users can treat differently, if they wish. For example to re-index these articles more rapidly, or to pause re-indexing entirely until the content becomes more stable.

Please contact us if you have recommendations for templates or other pieces of metadata which are used on smaller language projects, or with knowledge of any editing/reading behaviours, which this tool should be aware of. The different editorial policy on each language edition (especially relating to whether events which are still occurring are permitted to have an article) will affect the output of this feature. Using the demo, is in itself feedback. Please read the next section on how!

We want to be able to capture every new and existing news event being worked on across Wikipedia, no matter how big or small, or how niche a certain group of people, or what country or continent it takes place. It’s a huge challenge – especially crafting a feature that works across Wikipedia language projects. Our hope is that with your feedback, we can improve this feature by cutting out false positives and tinkering with the underlying algorithm to generate better, faster and more accurate results.

The work is heavily based on the research I conducted for our product strategy last year, specifically the section "Sourcing and Source Quality for reusers' success".

We wish to also acknowledge the adage that "Wikipedia is not a newspaper" - commonly summarised in English as "NOTNEWS". This feature does not imply a change of Wikipedia editorial policy: most particularly regarding the article topics are of enduring notability, and built upon reliable sources.

— Francisco Navas, Product Manager with the Wikimedia Foundation, Enterprise team

How to use this demo
You can access this feature via WME API via WM Cloud Services.

In the ‘Reaction’ column, you’ll see a red thumbs down and a green thumbs up. If the article in the row relates to a news event please click the green thumb, in the opposite case, click the red thumb.

After you vote, you can add a comment as to your reasoning if you'd like.

You can only vote once per row. If you made a mistake, don’t worry.

Use the toggles at the top to switch between results on different language projects, amount of editors and amount of edits at the point that our algorithm surfaces the content.

The results displayed update every day and number of editors or edits, every half-hour. We will work to cut the update time for the latter two to around five minutes.

You may notice that this demo does not include pageviews, that’s because they’re still delayed by 24 hours so not very useful at diagnosing live events. We’re working on what page views can mean for this feature.

Note that this is a desktop only app.

How to send us feedback
There are many ways to get in touch with us. Any feedback is good feedback.


 * Post questions or answers via our metaWiki talk page.
 * Get in touch with me directly with ideas and thoughts at [mailto:fnavas@wikimedia.org fnavas@wikimedia.org]. I’ll get back to you as soon as possible.
 * Finally, as with all of Enterprise’s work, should you find a bug or see a clear opportunity please file a ticket via our Phab board. We will do our best to respond in a timely manner.

Technical details
Consistent with the Wikimedia Enterprise principles (in particular that of "no exclusive content") the information this API feature is built upon is already public information that Wikimedia editors are already commonly using in their content moderation workflows.

WMF provides continuous streams of events happening on wikimedia projects as Server-Sent Events, known as EventStream. These events can be article creation on a certain wikimedia project, article deletion, article revision, etc. The eventstreams relevant to breaking news are: page-create, page-move and revision-create. Page-create stream sends out article-created events. Page-move stream emits an event whenever an article is moved from one namespace to another. Revision-create sends out an event when a new revision/edit is done to an existing article on a wikimedia project. Each event carries information about the editor performing the action (create/move/edit).

The current implementation only considers articles that were either created or moved to the main namespace in the last 24 hours period.

For each event, we fetch templates and categories associated with the article using mediawiki Action APIs. From all the associated templates and categories, we filter the ones that may indicate: a) the article is being heavily edited currently; b) the article concerns a current event. Finally, we save the events along with editor information, relevant templates/categories into a database.