Talk:Wikimedia Engineering/2012-13 Goals

"integration of collaborative editing work from GSoC student, if applicable"

awesome! What is this brave student's name? Is there a page for this project yet?

Adamw (talk) 23:12, 25 March 2012 (UTC)
 * Ashish Dubey -- see User:Dash1291/GSoC 2012 Application, which I'll link to. Thanks. Sumana Harihareswara, Wikimedia Foundation Volunteer Development Coordinator (talk) 12:40, 26 March 2012 (UTC)

IPv6
See en:Wikipedia:WikiProject IPv6. I think IPv6 is an important goal here. It sure wouldn't be good for the WMF's PR if we miss World IPv6 Launch.Jasper Deng (talk) 03:43, 27 March 2012 (UTC)
 * But World IPv6 Day is 6 June 2012, which is before 1 July 2012 (the start of the 2012-2013 fiscal year). So you wouldn't be seeing that in these plans, which are for the 2012-2013 fiscal year. Sumana Harihareswara, Wikimedia Foundation Volunteer Development Coordinator (talk) 12:07, 27 March 2012 (UTC)
 * But let's say that we miss it. Surely IPv6 will still require lots of activity after the launch.Jasper Deng (talk) 19:29, 27 March 2012 (UTC)

previous goals?
I know about the page "Roadmap" and the strategy wiki's "product whitepaper", but is there anything like a "2011-2012 Goals" page so we can see how past planning/prioritisation efforts have gone - so we can assess the scale of what's actually achievable in a 12 month period? I think all of the things that are listed here are great and would be brilliant if we had them, but is it feasible that even a majority of them can actually be delivered on time? Looking at the "big picture" section of this page there are 7 key areas, the first one of which is a massive undertaking just by itself [modernise the editor]. Don't you think that trying to do that in parallel with all the other things is likely to be a classic case of "biting off more than we can chew"? I'd love to see the budget approved to achieve all this, but I'd hate to for it to look like a plan that is over-promising and likely to under-deliver. Wittylama (talk) 05:36, 2 April 2012 (UTC)


 * Hi Liam,


 * all good questions. The best public record of the 2011-12 engineering objectives is the high-level recap in the Annual Plan. We did develop a detailed goals document last year as well; however, we did not have time to take it past the initial submission (this was around the time of the CTO transition), which means that it doesn't reflect the funding that was ultimately approved. The difference was pretty significant, about 17 FTEs total, much of which was a proposed new offshore unit for additional development capacity. (I was in favor of removing that from the plan, because I didn't feel we'd be able to pull it off.) So to do a fair comparison, one would have to heavily annotate and reverse engineer that document, which I don't think would be a good use of anyone's time. But at the high level, in addition to the stuff you find in the plan deck and in the mid-year review, we aspired to do the following:

 
 * full DB replication to the new data-center (done)
 * fail-over capability to the new data-center (available in case of emergency, but hot failover will only be done next FY)
 * rearchitecting of analytics log collection infrastructure (lots of work already done, lots more left)
 * Git migration (done)
 * Heterogeneous deployment capability (done)
 * More frequent releases (done)
 * Completion of mobile port (done)
 * Mobile login (backend work done, including SSL)
 * Completion of mobile field research (done)
 * Multiple releases of i18n features such as input methods, font delivery, etc. (done, with much higher release frequency than planned)
 * Improved profiling tools (done)
 * Improved massively scalable media storage architecture (done for thumbnails, in progress for originals)
 * Work on next generation of AFT with external vendor (first releases done, work still in progress)


 * There's one big project we've deliberately postponed, which is the adoption of HipHop as a performance improvement -- instead we're focusing on Lua scripting.


 * There are a couple of big items that are still pending, mobile uploads (we're just kicking off work on the first implementation), and the launch of a new caching center (we're waiting for a final decision from a potential partner).


 * And, there are two areas where we've been only able to do substantially less than hoped for: QA and analytics. In both areas, it's taken us a lot longer to fill key positions than planned, so the work has only recently kicked off in earnest. But QA was also an area where we had first proposed to build a larger team sooner, and ultimately only added a single hire.


 * On the other hand, there are a few things we've been able to get done that weren't included in the goals:

 
 * We've developed powerful and successful Android and iOS Wikipedia apps, and have started significantly enhancing the mobile web experience
 * We have set up a continuous integration server for running unit tests and nightly builds of the mobile apps
 * We managed the unplanned integration of a complex new international payment processing system (Global Collect) for fundraising purposes
 * We're developing a dedicated extension to manage the Global Education program
 * We've made available the first-ever experimental mirror of Wikimedia Commons media files
 * We've created support for RSS feeds from featured content
 * We've achieved cost savings through various donations of hardware and services.


 * Plus a bunch more stuff that's not quite production-ready yet but will probably make it before the end of FY.


 * With that said, the list above illustrates that a one-year plan needs to be flexible and allow for rational changes -- this is "what we ought to do, as far as we know", not "what we will do, come hell or high water". Ideally, teams adjust their priorities on a daily basis, continuously delivering changes to the site, and understanding their own velocity in order to arrive at realistic estimates about what's doable, close to the time they're actually setting out to do it.


 * As for whether it's ambitious, it definitely is. In general, I would say that we're now in the middle of a continuum of risk aversion. In the early days, we were accepting a high risk of failure for most projects that we launched, because we had no resources, and therefore typically had part of one person's time per project. For example, Extension:AbuseFilter, Extension:LiquidThreads, Extension:FlaggedRevs and Extension:CentralAuth all had the bulk of the work done by 1-2 people, often student contractors who were also working on lots of other things. The bulkier projects in the goals we're discussing today all have teams of about 4-5 behind them, usually full-time employees. So that's a nice beefing up of capacity.


 * Building larger teams has come at the cost at saying "No" to lots of requests and desires; for example, while we'll aim to improve user-to-user messaging next year, we're not going to undergo any wholesale re-architecture of all talk pages yet, even though it's clearly going to be needed eventually.


 * Leaving aside unavoidable risks when diving into the unknown, this is still taking a significant degree of risk when a key person leaves (or gets sick for long), a key hire is late, there's an important skills gap on the team, a higher priority project is needing more resources, etc. So almost certainly, we'll miss the mark on some of the objectives. That risk could be reduced by having a lot more redundancy, and in turn, fewer projects. So instead of 500K you might pay 1M for a project, without necessarily getting a lot more out of it other than confidence that you have enough people to cover certain risks (having those extra people will in fact slow you down some of the time; see Brooks's law). I don't think we need to be that kind of organization -- we can afford to accept some risks of late delivery while being more nimble and being able to parallelize a bit more.


 * Relatedly, I can make reasonable assumptions about which hires are going to be easy and which ones are going to be hard, based on past experience, but I'm not going to be able to predict the future, nor will I tolerate bad hires to hit a deadline. I could make only worst case assumptions, but I don't see the point of that. So while we've been able to significantly pick up the pace on hiring, that's an area that will likely cause some lateness again. We'll post more positions sooner, though, to also make some opportunity hires when we can.


 * In general, I'm comfortable with our place in the risk continuum now. There are some areas where I'd like to have more solid ground under our feet. One of those areas is testing. We're proposing, here, a pretty risky QA strategy that's grounded entirely on test automation and community-driven testing, with very little room for paid testers. That's highly unorthodox and risky, but I think it's a good risk for us to take. Given the many permutations of Wikimedia from different users' perspectives, it would be very expensive to test thoroughly in any other way. With that said, I don't know whether it'll work, and I'll sleep better when I do. :-) ---Eloquence (talk) 07:58, 3 April 2012 (UTC)

Community-oriented CRM
Wanted to say that I was very pleased to see that the plan includes the Q3 rollout of a community-oriented CRM solution. I think this will be very useful for groups, including committees, that have to track multiple functions over time, and reduces the incentive to consider solutions outside of the WMF/Wikmedia/MediaWiki framework. Certainly from my perspective as an arbitrator on the very busy English Wikipedia Arbitration Committee, I can see it having the potential for a very positive effect on the manner in which the Committee manages its business. Thanks for adding it into the plan. Risker (talk) 03:05, 8 April 2012 (UTC)


 * Thanks! I hope we can get the IT Systems Engineer position funded which would be critical to make these types of projects happen.--Eloquence (talk) 18:17, 20 April 2012 (UTC)

Keep the Lights on
«An uptime metrics of 99.85% for *.wikipedia.org and *.m.wikipedia.org for wikipedia readers, and a 99.8% for editors.» What, sister projects don't deserve even lights to be kept on? --Nemo 18:12, 15 April 2012 (UTC)
 * Not to mention being fast, of course. Wikimedia_Engineering/2012-13_Goals: «Wikipedia must remain usable and responsive in order for the movement to sustain its mission [while the other wikis can be as slow as they want]». --Nemo 18:36, 15 April 2012 (UTC)

West Coast data center transition
«Improve/reduce response times by 70ms for users/readers in Asia and West Coast USA (by redirecting users to the nearest server cluster)» -> «Operations contractor to work on West Coast data center transition (3 months)» -> «Launch of caching center in West Coast DC». Is this directed to response times in Asia too? It wouldn't seem fair to give higher priority to the reduction of response times for the small fraction of the world's population living in the West Coast, over all the other less well served areas of the world, which apparently will be served better only if sponsors will allow: «Reach out to get sponsorship for more Caching Centers around the world, with goals of caching locations in Asia and South America». --Nemo 18:18, 15 April 2012 (UTC)


 * As the engineer seriously pushing for this and working on this, if we only get a single location for a caching center (as each caching center not only has the cost of machines but the larger recurring cost of networks, power, rackspace, etc), the West Coast of the US makes sense. It will serve both the west coast of the USA and APAC, which will reduce latency.  Some of the big reasons, assuming we only get 1 center at this time, to do this in the WC, US are the close proximity of engineers, so while building this site we do not incur costs of flying people around, hotels, or "smart hands" when breakages happen, relationships with existing US vendors, which also reduces costs, and WC is very well networked from APAC and a huge number of ISPs.  While having an APAC caching site is even better for the region, the WC site would have a larger impact on latency than just the physical move (70ms) due to the fact that we can interconnect to many more networks. If we just put a APAC caching center in, it would not have quite the impact as we would not be able to connect to West Coast North American networks, so for a first step, this would be the biggest "bang for the buck".
 * I want to put caching centers everywhere, however with our budget, this goal is unlikely as it will increase our operations costs far beyond the budgeted amount.
 * Of course, disclaimer, while I am working for WMF and am an engineer working on this project, I do not have final say over how our money is spent and have a limited viewpoint (my POV is heavily network and latency influenced).
 * PS - If you know anyone willing to host service in other locations, especially with free access to transit and peering exchanges, let us know. LeslieCarr (talk) 20:45, 18 April 2012 (UTC)


 * Wikimedia_Engineering/2012-13_Goals says: «Reduce ping times from all worldwide locations to <150ms according to Watchmouse and site24x7.com. Reduce ping times to all European and North American locations to <80ms according to the above resources». It's not clear how the two are connected, if Europe and USA are treated in the same way also for caching, how much resources are put into it for Europe+USA vs. rest of the world (none of the milestones seems related to this goal). --Nemo 18:47, 15 April 2012 (UTC)

Wikimedia Labs
The goals are not clear at all, and the bunch of tasks below is not clearly connected to those tasks. For instance, with regard to the Tool labs: how will it compare to the Toolserver, if WL will double its users and projects? I think the project should be focused on giving new services and features which are needed and not available anywhere, and avoid spending resources on things which already exist. For instance, it's not worth spending a lot to give database access if it will be the same as on the Toolserver; but it would be very useful to provide full database access which is not available on the TS (revisions' text etc.). --Nemo 18:33, 15 April 2012 (UTC)


 * In general, WMF and Wikimedia Germany are in agreement that Wikimedia Labs will increasingly take on services and projects currently hosted by the toolserver. Toolserver (which, BTW, is hosted in a WMF data-center and we're covering part of its operating costs) is a tremendously valuable project, but doesn't really make long term sense for chapter organizations to operate key service infrastructure, especially as these services increase in complexity and you get economies of scale benefits (both in terms of staffing and infrastructure costs). Labs already offers lots of functionality that toolserver doesn't, but our initial goal with Tool Labs is feature parity.--Eloquence (talk) 18:58, 20 April 2012 (UTC)

Analytics
I see that this project will have multiple staffers and an enormous amount of hardware. Will it replace and supersede WikiStats soon? Especially if it doesn't, how hard it would be to get all the (very useful) already existing tools back to regular functioning and updating? For instance (see missing columns)    etc., many statistics have not been updated for years or have several bugs in edge cases (some of them are in bugzilla). WikiStats tools have been created by a single person with a single very old server, so I hope fixing them shouldn't cost much. --Nemo 19:19, 15 April 2012 (UTC)

Fundraising Engineering and translations
I don't see anything about translation here. Translations with the old system are a nightmare, as Fundraising 2011/Translation/Project report testifies. Integration with the Translate extension of the CentralNotice extension at the very least and if possible also pushing of translations to other wikis ( or donatewiki), is of the highest priority for the very large communit of translators, and would save a lot of time both to volunteers and to WMF staff (hence reducing costs, besides improving user experience and quality). --Nemo 19:29, 15 April 2012 (UTC)
 * Most of the backend heavy lifting for translations has already been done. Translate is now in active use on meta, grouping translatable pages, signing up as translator for notifications, a translation memory shared among Wikimedia sites and notifying translators that signed up of new translatable material will become available within two weeks. We are currently conducting a user experience analysis of the translation proces, to further enhance it. Hooking translations into central notice is a relatively small thing. It's not been specified explicitely because of that. Bot based solutions are possible already, although not implemented. A skilled volunteer developer could probably implement it in a day or two, taking the workflow states message group in Translate as an example, together with a minor UI change in CentralNotice to allow for making the translations through Translate. mw:Extension:Push, transwiki import and the likes should be considered for moving pages between wikis, in my opinion. -siebrand (talk) 20:43, 20 April 2012 (UTC)

MediaWiki development process
I read: «there have been significant regressions in productivity from Subversion (such as code review tooling), which need to be addressed». I guess this is supposed to be handled by «High priority Git/Gerrit usability fixes» and then «Additional Git workflow enhancements» and that code review tools fall within the first category, but most bugs/feature requests in this category have been marked upstream and nobody seems able/available to work on them. Gerrit needs a lot of changes to work for MediaWiki developers and the Wikimedia community, so staff resources seem needed for this or it will be a bottleneck for all software development projects. --Nemo 19:51, 15 April 2012 (UTC)

Documentation
Api seems a weird place to focus on given how well it is documented. Especially relative to other areas of MediaWiki (Like the how to write various types of extensions tutorials). Bawolff (talk) 20:36, 15 April 2012 (UTC)
 * "Goal: a team of on-call MediaWiki documenters who can sprint on specific areas, and up-to-date documentation for the MediaWiki API and for the extensions that Wikimedia Foundation deploys"

Super-protection
What is super-protection? --MZMcBride (talk) 20:39, 15 April 2012 (UTC)
 * From the name I'm guessing it means fiddling around with $wgProtectionLevels to create a level higher than full protection.--Jasper Deng (talk) 23:13, 15 April 2012 (UTC)
 * Yes, specifically for the MediaWiki: namespace, which can be used to make drastic site-wide changes and where there's currently no mechanism for managing problematic changes short of nuclear options (removing the editinterface permission, disabling site JS/CSS altogether, etc.).--Eloquence (talk) 20:26, 20 April 2012 (UTC)
 * Out of all the things that are pressing needs, such as true global blocks for users or a CA that doesn't leak OS'ed info (that being just the last in a long list of CA bugs that pop up almost weekly), super-protection for the MediaWiki: namespace seems like it should be fairly low priority. Snowolf How can I help? 14:36, 21 April 2012 (UTC)

Bug links?
I see almost no links to Bugzilla in the current document. There should be associated bugs for every feature or enhancement request listed, right? --MZMcBride (talk) 14:08, 21 April 2012 (UTC)