Extension:Graph/Plans

From mediawiki.org

Update April 2024

Hello everyone – I’m Marshall Miller; I’m a Senior Director of Product at WMF working with the product managers and teams that focus on the user experience of reading and editing the wikis.  Thank you all for being part of this ongoing conversation and being patient during the frustrating outage of the Graph extension.  I gave the last update about graphs here and on wikimedia-l. I have since talked to many volunteers about their experiences and needs with graphs, and gathered a group of staff members to propose a plan.  I am back with a proposed plan for your feedback and input. I'm posting here on the project page instead of the talk page so that this update can be marked for translation to other languages. There is a new header on the talk page for discussion.

Summary

In short, we at the Wikimedia Foundation propose moving forward with an approach that many community members have suggested: building a new service to replace the Graph extension. This approach will enable editors to create basic visualizations, will require coordination with communities around migrating existing graphs, and will be extensible by developers who want to build and maintain additional functionality.

We’ve needed some time to consider all of the architectural questions and how to resource this work, and now we want to hear from volunteers on whether this sounds like the right approach. This work will be led by Chris Ciufo, the product manager with the Design System team.  You can expect to hear from him going forward.  There’s more information below for those who want to see the details and considerations for this approach.

Since this work hasn’t started yet, there are still several months to go before the new graphs are operational.  We’ll be getting the right engineers involved and start architecting over the coming weeks, making sure we have a strong plan and are ready to iterate on it, and then likely starting this work in July as staff members become available from their previous projects.  We do not know yet how long it will take before the first types of graphs are operational.  We’re happy to discuss ideas that community members have about what, if anything, to do about the graphs continuing to be unavailable during these upcoming months.

Rationale

Chris and I are proposing this approach based on looking at how people have used graphs in the past, how we think they will use them in the future, and considerations on making sure our technology will be secure, scalable, and maintainable going forward.

In looking at how people have used graphs in the past, we see that graphs are a valuable, but not overwhelmingly common tool in the wikis. In English Wikipedia, graphs are used on about 10,000 articles, which is 0.15% of all articles, and across all Wikipedias, they are used on about 178,000, which is 0.28% of all articles.  Outside the main namespace, graphs are used more often, frequently because they are a part of templates that are displayed heavily.  For instance, in Arabic Wikipedia, there was a pageviews graph on every Article Talk page (until they were recently removed).  Importantly, we’ve noticed that the large majority of graphs are relatively simple: bar, line, pie, etc, and use data inline in the wikitext or in the Data namespace on Commons.  The resourcing for graphs should match this moderate usage – sufficient support, but not for complex functionality that isn’t widely used.

Technical discussion

The functionality of the new extension would be more limited compared to the old one, especially in that it won’t support all the visualization types and data sources of the old extension, but this approach represents a fresh start to a more sustainable future with graphs.

In terms of security, scalability, and maintainability, we decided in December that there was not a viable way to fix and continue with the legacy Graph extension.  Among other options, we attempted upgrading to Vega 5 (only to continue to find the same security issues), and we tried wrapping the Vega canvas in a sandboxed iframe (which caused significant performance issues).  This meant that a path forward for graphs would require a new extension.

Here is the brief outline of the approach we’re thinking about:

  • The legacy Graph extension would be sunset.
  • The Foundation would build a new parser tag extension that supports a limited set of predetermined visualization types, like basic charts and maps, that cover the majority of existing use cases, which editors would specify in wikitext and get displayed as static images on wiki pages.
  • Rendering server-side would avoid known or substantial security risks, such as those in the legacy Graphs extension.
  • We do not know yet which visualization library or libraries it would use, whether Vega, d3 (which powers Vega), something like Our World in Data-Grapher, or something else.
  • The new extension would support graph definition data specified inline or through Commons tabular data (in the Data: namespace), as was supported by the Graph extension. We would try to offer assistance to migrate legacy graphs using these data sources.
  • It would be able to be extended with new visualization types by staff or volunteer developers through a controlled, centralized, and code-reviewable process.
  • It would be able to be extended to draw data from other sources, such as Wikidata, which it won’t be built to do at the outset.
  • It would display graphs on the Wikipedia iOS and Android apps (this was not possible with the Graph extension after Graphoid was decommissioned).
  • It would be officially maintained by WMF to address bugs.

In the many conversations around graphs, volunteers have also raised longer term questions about “interactive content”, such as timelines and 3D objects. Rebuilding the capability to serve simple graphs securely will be a large amount of work for staff and volunteers. As part of this, the new extension will be readily extensible by volunteers who have the technical skill to add more sophisticated visualizations and more data sources. This may be an open door to some kinds of interactive content, but the larger topic of interactive content is worthy of separate, continued conversations moving forward.

Moving forward

Moving forward, we want to hear your thoughts on this approach:

  • Does this seem like the right way to proceed?
  • What are the basic visualization types that are most important to support? Which ones can we do without?
  • Which use cases are you concerned about being missed?
  • How will communities need to participate or react to these changes?

As we discuss, there are many important questions to sort through. One that is top of mind for me is what will happen with the ecosystem of templates and data sources that has been built around the Graph extension over the last ten years. While we want to make it easy for many of the existing graph specifications to work in the new system, we will need to think through this together.

Thank you for reading this long update and for continuing to be part of this effort. I know many of you have spent a lot of time over the past months discussing graphs and building workarounds. We’re looking forward to continuing the work.

Discussion for this update

Previous technical proposals

The previous technical proposals can be seen at this archive link. Unfortunately, our research found that there were security and/or performance problems with these proposals. The update above, and related discussion on the talkpage, have details on the newer proposal.