Extension:Graph/Plans

This page is a place for WMF staff and volunteers to gather the information needed for the WMF to decide the role it will play in ensuring the needs the Graph Extension emerged to serve continue being met.

Decisions to be made
The Open questions listed below are meant to surface the information the WMF will need to make the following decisions:


 * 1)  What needs will any proposed path forward need to meet? 
 * 2)   What role will the WMF play in ensuring these needs are met? 

Proposal
To safely restore access to the information and capabilities disabling the Graph Extension has left people without and promote the volunteer+staff collaboration needed to do so, we (the WMF) are committing to:

Re-enable the Graph Extension in a sandboxed iFrame with a restrictive content security policy.
 * 1) Once the Graph Extension is reenabled, it will continue to work with Vega 2 for a yet-to-be defined period of time. Note: we'll need to define this window together.
 * 2) After that "yet-to-be defined period of time," Vega 2 support will be discontinued and use of the Graph Extension will require volunteers to make graphs with Vega 5.
 * 3) As soon as possible, make the sandboxed Graph extension available on the beta cluster for testing. See: T346292.
 * 4) Investigate the viability of adding logging to increase our awareness of instances where people are exploiting the security vulnerabilities inherent with restoring support for Vega on our platform. See T346414.
 * 5) Publish the technical documentation needed for developers across the Movement to understand how we implemented the sandboxed CSP approach
 * 6) Publish a clear timeline for when you all can expect all of the above to happen
 * 7) Note: exploratory work to redeploy the Graph Extension in a sandboxed iframe has started. See T222807.
 * 8) Share regular updates about the progress we're making on the commitments named above on Phabricator and MediaWiki.
 * 9) Support volunteers with code and processes that will ease the transition from Vega 2 to Vega 5 when the time for this transition comes.

In support of the above, we'd need to depend on ya'll (volunteers) to:


 * 1) Spread awareness of this proposal and the updates that will come as we start implementation, assuming this proposal moves forward.
 * 2) Manually migrate some proportion of Vega 2-based graphs to be compatible with Vega 5. See the "Vega 2 → Vega 5 transition" section below.
 * 3) Potentially, fix/port graphs that attempted to fetch live data using methods that the sandboxing approach inhibits.
 * 4) Note: the need for the above will become clear once we decide on whether we will restore the pseudo-protocols that were used to fetch data live from the action API, the REST API, WDQS etc, and the precise sandbox parameters we select (domains/ports/http methods allowed). This decision will be made in T346291.

Vega 2 → Vega 5 Transition

 * Why do we think it's worthwhile to migrate from Vega 2 to Vega 5?
 * Vega 2 has been superseded by Vega 3, 4, then 5 upstream.  Upstream and third-party documentation exclusively refers to syntax in “Vega version 3.0 and later”, and it is difficult for new contributors to find documentation relevant to Vega 2.  The last upstream release (bugfix or security) of Vega 2.x was in January 2017.  Vega 5 was released in March 2019 and is still under active maintenance and development, with the latest 5.25.0 release in April 2023.
 * Volunteers have reported issues with Vega 2's accessibility, syntax, and overall functionality, per this 2023 wish.
 * Vega 5 has made improvements to the library's expression layer that harden it from a security perspective compared to Vega 2.  It is not perfect, but by introducing a parsed expression grammar it offers a more robust foundation for additional security hardening in the future if it proves necessary.
 * Maintaining multiple versions of Vega concurrently is unsustainable in the long run. The wiki community is taxed in the attempt to independently support software which is not being maintained upstream. Our efforts are best spent working in cooperation with upstream and third-party developers, and to do this we need to be working from the upstream Vega 5 code base.
 * What might be required to migrate graphs from Vega 2 to Vega 5?
 * Create a converter that would migrate Vega 2-based graphs to be compatible with Vega 5. @Jdlrobson started work on an initial approach in T335048#8794138.  The initial work needs to be restructured slightly to refocus it on being an aid to manual porting, instead of the automatic translator which was its original goal.  Note: We estimate this converter currently works for ~80% of graphs, with diminishing returns on additional engineering effort to cover more.  We do not plan on continuing to invest significant additional engineering resources here, but instead to simply repurpose the existing codebase as an aid to manual porting.
 * Volunteers would need to update  syntax on a case-by-case basis, aided by (1) the ability to run the existing Vega 2 and new Vega 5 specification side-by-side, (2) the partial Vega2-to-5 porting tool which handles 80% of the “obvious” keyword changes and other mechanical conversions, (3) [ https://vega.github.io/vega/docs/porting-guide/ the upstream Vega2 porting guide], and (4) additional documentation or tools which might be created by the wiki community.
 * Update the limited number of Scribunto templates on-wiki which generate  output in Vega 2 format to instead output Vega 5.  This requires both lua and Vega expertise, but fixes a larger number of Vega 2 uses on wiki at once.

Research
In order to make the Decisions to be made above, we think we need to answer the listed questions below. We anticipate this list evolving over time as new information emerges.

1) Who are the people (e.g., WMF teams, volunteers, wikis, etc.) that depended on the Graph Extension? In what way(s) had people been using the Graph Extension?

 * 1) A (old, but I doubt much has changed) analysis of this is at User:Bawolff/Reflections on graphs
 * 2) Volunteers
 * 3) Generate infographics to present data to readers in a variety of forms. E.g. bar chats, stacked graphs, pie charts, scatter plots, timelines, histograms, geographic maps.
 * 4) See Category:Pages with disabled graphs for the range of pages/contexts where the Graph Extension was used
 * 5) Generate graphs on talk pages to show article   and user page views over time.
 * 6) Use the Graph extension to generate maps with more features than the Kartographer extension provides.
 * 7) Generate page view graphs on ?action=info as part of Extension:PageViewInfo
 * 8) Generate Interactive COVID-19 maps
 * 9) Pamputt: Generate up-to-date graph from Wikidata data. The only workaround is to create by hand an image, to upload it on Commons and to redo it each time the values are updated.
 * 10) WMF Staff
 * 11) PPelberg (WMF): While WMF Teams do not yet seem to be depending on the Graph Extension, teams' longer-term plans/strategies do depend on a future wherein functionality exists on-wiki to: 1) ingest data stored off-wiki and 2) "transform" it into infographics/visualizations that people can explore/interact with on-wiki.
 * 12) "1)" and "2)" named above are strategically important to the Movement being able to evaluate the impact of the work it's doing and monitor its health. At present, storing data and generating "artifacts" that enable people to interact and explore this data happens off-wiki and takes a great deal of time and technical expertise.

=== 2) What is no longer possible as a result of the Graph Extension being disabled? Asked another way: what capabilities did the Graph Extension provide for which you have not yet found a viable workaround? ===
 * 1) Nux: Generating page views charts for popular pages. E.g. page views template on pl.wiki
 * 2) Nux: Generating charts based on Wikidata. E.g. en:Template:Airport-Statistics (used by multiple other languages).
 * 3) TheDJ: Generate page view graphs on ?action=info.
 * 4) Pamputt: Generate up-to-date graph from Wikidata data.
 * 5) Sj: Viewing graphs already generated by this extension, illustrating tens of thousands of pages and articles E.g. mass migration needed
 * 6) colt_browning: Generating population history graphs using ru:Template:Население from sophisticated in-wiki census data tables.

3) What did you notice yourself (and other people) using the Graph Extension to do that you hadn't been doing before?

 * 1) Ahecht: Treating charts as collaborative content equivalent to the rest of a Wikipedia article. Editors are reluctant to tweak a chart when it requires downloading the source data, recreating a similar chart from scratch in external software, converting it to an image, and uploading it over another editor's image, and therefore each chart would only be edited by it's initial creator.  +1
 * 2) Sj: Being able to inspect individual data points in a graph (rather than guessing by zooming in), and to get readable diffs of updates

=== 4) What – if anything – did the Graph Extension make it easier/more convenient to do? If you can, please describe what each "task" you used the Graph Extension for looked like before and after it existed. ===
 * 1) RobinLeicester: As with the charts, improving/adding to an annotated map, either your own or someone elses, became much more like proper wikipedia editing, compared to uploading a 'finished' item to commons. With such limited options within maplink, external graphics programs would have to be used and the result is then fossilised on the page, or never attempted in the first place. Also, placing annotations using 'coords', on a reliable base map, makes them more verifiable and correctable.

5) What workarounds have you developed and/or seen other volunteers develop in the time between now and when the Graph Extension was disabled?

 * 1) 86.122.161.172: Nothing, there are just too many pages and there was too much effort involved in developing the existing graphs to replicate at once on smaller wikis. I really hope that if another solution is provided, an automatic converter will also exist.
 * +1, a lot of pages just look broken now.
 * 1) Nux: Moved a bar chart from graph to timeline . This kind of works, but timeline is rather static (not tooltips on hover) and fonts are less readable.
 * TheDJ: EasyTimeline itself is also no longer receiving updates and depends on perl. I think we should consider that this too will require replacement at some point !!
 * 1) Nux: Added generic pie chart with 2 values pl:Szablon:Wykres Smooth Pie 2. This is a fully functional replacement for pl:Szablon:Demografia_powiatu. Though this doesn't look the same (especially the labels of charts could be better). And also it's not easy to add more then 2 values on this pie chart.
 * 2) Edu!: At Portuguese Wikinews, the Graph 2 module was developed, which uses CSS to generate line, area and column graphs in the Gráfico template. The module does not have limitations that exist in alternative templates used in some Wikipedia languages. The community is working on improvements.
 * 3) TheDJ: The limitation of CSS based graphs is that they generally are not that accessible. There are problems with color inversion, screenreaders etc. This particular example is even worse, as it uses HTML tables to do layout (there is no reason to use tables).
 * 4) TheDJ: en:Template:OSM Location map was switched from the Graph mode to pure Kartographer. This made it loose some features (custom pogs, minimap, custom labels etc).  See also comments by RobinLeicester
 * 5) User:Snævar: Created is:Module:Flugvallar tölfræði to replace Template:Airport-statistics using Wikibase Lua access and Easytimeline.
 * Great but excuse me, isn't there a pb on the years at, for instance, https://is.wikipedia.org/wiki/Reykjav%C3%ADkurflugv%C3%B6llur given the max year was 2007 with 471372 pax ?Bouzinac (talk) 12:38, 10 August 2023 (UTC)
 * Use whole words instead of pb and pax, you are not making yourself clear.--Snævar (talk) 23:45, 8 September 2023 (UTC)
 * 1) colt_browning: Using bar chart instead of a line graph. Unfortunately, the former is significantly inferior for visualizing the dynamics with unevenly distributed data points.

6) What tools (on- and off-wiki)/templates/gadgets/etc. are you currently using to create data visualizations for Wikipedia?

 * 1) Nux: I know en.wiki is linking to Wikidata for some stats... But that UX of having to see huge SPARQL on  WikidataQueryService is terrible (see the Airport-Statistics template)...
 * Lighlty fixed with +"embed.html" on the link, but I agree, the direct graph is far more desirable. Bouzinac (talk) 12:52, 10 August 2023 (UTC)
 * 1) There's always the unmaintained Extension:pChart4mw

7) Why do you think it is or is not strategically important for the Wikimedia Movement to offer on-wiki tools for storing, editing, and visualizing data?

 * 1) Nux: A picture is worth a thousand words. And a dynamic chart with context is even more precious...

8) What requests (e.g. wishes) have volunteers made over time to improve support for storing and representing/visualizing data on-wiki?

 * 1) T195627: Support Vega 3.0 in Graphoid
 * 2) T195628: Support Vega Lite 2.0 in Graphoid
 * 3) T100444: Graph localization support
 * 4) T165118: Support Vega 5.0+
 * 5) meta:Improve graphs and interactive content
 * 6) Many were translated into Graph feature requests + tasks. (Generate graph from wikitable;
 * 7) T236892#6135375: Don’t break the Compatibility promise by not showing anything to non-JS users

9) What – if anything – could be done to safely reenable the Graph Extension?

 * 1) Some proposals have included (These may or may not be sufficient and different people have different opinions)
 * 2) T222807 - Use browser based iframe sandboxing
 * 3) PPelberg (WMF): per what Tgr shared in T222807#8858868, and what SBassett (WMF), Tgr, and I met to talk about offline on 20 July 2023, the prospect of iframe sandboxed enhanced with a Content Security Policy seems potentially viable.
 * 4) Render server side (or render client side with some sanitization layer) to allow static but not interactive graphs
 * 5) T336595 - Make editing graphs be restricted similar to MediaWiki:Common.js
 * 6) Bawolff: The elephant in the room is that graph is a high maintenance extension, even before this, that on the whole is not that widely used and where it is used, it is used mostly in a fairly simple fashion. Given its usage numbers and how it is used, it is unclear that further investment is worth it.
 * 7) Bawolff: It was pointed out on phab that vega has not fixed the underlying security issue despite presumably having been aware of it for a while now. In order to be deployed again we would want both the known security issue fixed and some assurance there won't be more or have some sandboxing to lower the impact of any currently unknown issues. If Vega is not fixing it in a timely fashion, WMF could perhaps contribute fixes themselves, but being required to take that on is a very large red flag for the overall security of the library.
 * 8) T336544: Codex, Graph, and Wikistats date

10) Of the pages that display graphs/infographics, what proportion of them depended on the Graph Extension?
TheDJ: any sort of listing will be hard to trust, as people have already started to migrate away in various ways. It would be good if we can see if we have some numbers from before somewhere. When it was initially disabled, I mentioned that en.wp had: en.wp used "about 60 000 [pages] for en.wp, or just 16k articles". I'm however not sure where i pulled those numbers from.

PPelberg (WMF): also see User:Bawolff/Reflections on graphs via SBassett (WMF).

Background
The Graph extension uses the older Vega 1 & Vega 2 libraries, which had a number of security vulnerabilities.

In the interest of the security of the people who use Wikipedia, the Graph extension was disabled on all Wikimedia wikis in April 2023.