Topic on Talk:Wikimedia Enterprise

Summary by JCrespo (WMF)

My question was answered and feedback was provided.

JCrespo (WMF) (talkcontribs)

Why were external contractors using open source technologies that are not well supported by Wikimedia's stack/know how, given that there are many open source alternatives that we can support much better? While Redis and Postgres are good options for projects if you started from 0, and the only options in some usages (e.g. PostGIS), there are technologies that the Foundation knows more, or has worked for years to eliminate from our infrastructure (Redis)- mostly to decrease the proliferation of technologies that work for about the same use cases.

Why where the contractors not told to use alternative stacks that are well known and well supported by the wikimedia employees, which will eventually have to support the stack anyway?

I know they are open source, and they are great tools. But using something like S3 instead of OpenStack Swift, I understand, and it won't be as problematic to change if needed. But Postgres and Redis won't be easy to change in an existing application -as our own years of migration showed T212129-, and goes completely opposite what the rest of the organization has been working for years, having obvious alternatives.

Are people aware that we will have to double our staff, services (monitoring, backups, support) and automation, every time a new technology (not matter how good it is) is introduced? Were people in SRE, Security, Performance, etc. consulted about this?

If this is a small-sized project, then why was a new technology used, given any small sized one has lots of flexibility about underlying tech? If this is a large-sized project, why was a new technology used, given it will take lots of effort to migrate away to a supported tech?

RBrounley (WMF) (talkcontribs)

Thanks for popping on here - you have good points and it’s definitely something we are considering as we expand the project further.

Redis/Postgres were chosen because we were starting from 0 here and building this separate of the WMF stack on a different service. To add some transparency on the decision-making, the external contractors had a proficiency on Redis/Postgres hence why we went that direction - it was quick and efficient to start building. Representatives from SRE, Core Platform, Architecture, and Data Engineering have been in loop and key members of our technical decision making as part of our every 6-week technical committee. We also host office hours to WMF staff every two weeks, which you are encouraged to attend, and have had plenty of good feedback there. As of now, this is still a prototype building around a potential business case that we’re still exploring and frankly we still maintain a lot of flexibility with what we have built - compared to the scale of mediawiki, this is small potatoes still.

In terms of staffing, we don’t anticipate doubling the staff to support this - the current plan is to add support staff as appropriate on Wikimedia Enterprise that works with WMF SRE in some capacity (we are still ironing out the exact details with leadership) as it is in the spirit of having this service remain separate to everything else but still in the Wikimedia orbit. But oversight is good and we’ve taken steps to get more of that, I am personally welcome to as much as possible - if you would like to join in, I’m happy to have your voice around our work - we can discuss more at the next office hours.

JCrespo (WMF) (talkcontribs)

The documentation stated/I read between notes that "we are using contractors for a first phase". Absolutely no issue with that. The worries is that, at some point, the contractors will go away and, like many times happened in the past, the employees will have to handle the load. My concerns are not "our stack is the best" (it is not!:-D), but "supporting a completely different parallel stack". Things like alerting workflow, containers, backups, configuration management, tracking system, orchestration, security incidents workflow, etc. that exist around the core development are many times shared between even separate realms and teams such as wiki production, fundraising, analytics, cloud and office IT. This helps us work better and faster- even if it takes a bit more to start up- and there is always expertise somewhere to help you be productive.

Given "this is still a prototype", wouldn't be wise to encourage them to use similar technologies than the rest of the organization (even if it is a separate organization) when possible, to reduce overhead of technology proliferation and long term maintenance costs? I am not asking them to use PHP or stop using S3, or have a list of preapproved technologies -just avoiding technology and workflow overhead when possible, specially for things such as caching and relational databases, and specially the things I mentioned above (high level workflows) before it is too late to change them. That is my only feedback.

Particularly, both redis and postgres work nicely in small numbers, but they tend to not scale -operationally- in high numbers (long term, with geographic redundancy, upgrade cycle).