Language Testing Plan/Testing Strategy

Goals of this document
This document describes about current and future testing procedures and strategy inside Wikimedia Language Engineering team.

Testing workflow
Current workflow of testing is not well defined or well documented. We’re using two types of testing in our workflow.

For each developer’s individual testing methodology, see spreadsheet at [1].

Design references
Before testing begins, we spend time in feature conception and GWTs for writing various tests like browser tests.


 * Feature conception
 * GWTs

Manual testing
Manual testing of tests like PHPUnit and QUnit tests where applicable. See ‘Statistics’ section about number of Unit tests in our extensions at moment.
 * Manual testing of patches submitted to Gerrit.
 * Manual testing in MLEB [2] is done once in a month (mostly release is near end of month). This includes Universal Language Selector (ULS) extension as of now. Other extensions are not well-tested or less-tested for manual testing in MLEB.

Automated testing
Automated browser testing on beta wikis and several instances.
 * Automated testing after patch is submitted to Gerrit. This will check patch against jslint, PHP Codesniffer and other syntax errors etc. This procedure (along with release) is well documented at, Continuous Delivery Process diagram


 * Browser tests - coding, review, maintenance (i.e. how is it being changed if the original feature changes or bugs are corrected)


 * Pairing with QA:
 * Jenkins configuration fixes (optional)
 * Bug fixes.
 * Betalabs configuration related fixes.

Minimal testing
I have defined testing as ‘Minimal testing’ which we always should do before and after submitting changes to production.

Manual

 * Change passes QUnit test.
 * Change passes PHPUnit test.
 * Change passes browser test.

Automated

 * Change passes all browser tests. Failures will be notified on Cloudbee or via Email (if email is in job).
 * Change satisfy all criteria of what patch is suppose to do as describe in commit message.

Average testing
An average testing scenario would be what we are doing in Minimal testing case plus following test cases:

Manual browser testing

 * We run browser testing our own. ie run tests manually upon patch.
 * Test it with different browsers.

Ideal testing
Ideal testing will include all scenarios from Minimal, Average plus following cases. Ideal scenario will also focus on testing procedure that is well documented and uses systems like TCMS [3].


 * Unit test is written for each patch/changes for backend.
 * Browser test is written for each frontend change or updates.
 * We test change to make sure that several components are taken into account. For example, we do stress testing, try to make test fails, try edge cases etc.
 * We are using different browsers to test changes.
 * Several people are involved into testing when needed. ie More feedback on critical changes.
 * If change is big, we set up instance(s) on Labs to give more robust testing (We already did this in CX for example).
 * We use Test Case Management System for manual testing. (For example: MLEB is using TCMS from March 2014)
 * Browser tests are always Green! [4]

Browser tests
We run browser tests on several beta wikis and instances [5]. Following are number of scenarios for each extension our team maintains:


 * Universal Language Selector: 45
 * Translate: 35
 * Content Translation: 11
 * Twn Main Page: 23

Unit tests
We have two types of Unit tests for our extensions. There is discussion/thoughts going on writing unit tests for node.js for Content Translation.

QUnit tests
ULS also inherited following QUnit tests from upstream jquery.* libraries. Note that this is total list of assertions in unit tests.
 * Universal Language Selector: 3
 * jquery.i18n: 160
 * jquery.ime: 5109
 * jquery.uls: 48
 * jquery.webfonts: 29


 * Translate: 3
 * Content Translation: 0 (recently QUnit infrastructure has been added!)
 * Twn Main Page: 0

PHPUnit tests
Note: This number can be misleading, as I counted number of files as one ‘unit’. Several tests are done using single file in some cases. But, it should give clear idea which extension has most PHPUnit tests as of now.


 * Universal Language Selector: 2
 * Translate: 45
 * Content Translation: 0
 * Twn Main Page: 2

ULS

 * Stage 0:
 * ULS testing inherited testing from upstream jquery components (jquery.uls, jquery.ime, jquery.i18n and jquery.webfonts).
 * Unit tests are mostly written for jquery.* libraries. This is one time procedure along with updates in code as required.


 * Stage 1:
 * Frontend features are tested with Browser Testing which involves coding, review, local testing, various validations and pairing with QA.
 * Once Browser Test is merged in Gerrit, it runs regularly by Jenkins twice in a day. Failures are reported via Email to listed developers in Jenkins configuration file.


 * Stage 2:
 * Failures of Browser Tests are fix by developers and/or QA. Pairing sessions are done for this.


 * Manual tests are often performed for ULS's various features such as manual verification of features like Autonym font, IMEs and other components.

WebFonts
Webfonts are part of main ULS repository.


 * Webfonts code is on jquery.webfonts Github repository.
 * Actual webfont is in ULS fontrepo, which contains webfont testing interface which needs manual testing when fonts are updated or changed by developer. Manual testing of font is also received as feedback from various language community.
 * Some part of webfont testing often done as part of Translate.
 * RTL testing and feedback is done by team and community as part of automated and manual testing.

MLEB
MLEB test cases are repeated for 4 browsers (Google Chrome, Firefox, Safari and Internet Explorer) with 2 releases (stable and legacy-stable at time of testing).


 * Total number of test cases of each browser for each release: 153
 * Total test cases (all): 612

Updates: TestLink setup has reduced MLEB test cases to almost 60% now after reviewing and removing possible duplicates.

Content Translation
Content Translation testing follows following process for testing:
 * GWTs and feature conception for frontend testing was written as ContentTranslation project was started.
 * Automated browser testing was setup to do frontend testing.
 * Automated browser testing was stopped as we couldn't keep pace of refactoring browser tests and ContentTranslation backend code.
 * Manual testing was adopted for backend.
 * Manual testing was done for important features like segmentation which is tested against in-production Wikipedia's articles.
 * Instances setups for ContentTranslation server and client.
 * Server testing playground (cxserver.wmflabs.org) was setup to showcase and test features like segmentation and to check server logs etc.
 * node.js testing framework is progress and we need it for backend.
 * Unit testing for features like ROT13 and segmentation is written.

Recommendations

 * Instances: Infrastructure - setup, maintenance (how it get the devel updates, cron jobs etc), outages, backup infrastructure, staging
 * Manual testing: - when is it to be done? where will the results be documented? When can redundant tests be considered for automation? What criteria determines the selection? When do tests fallback to manual mode?
 * Unit testing:

Platforms
We are using following platforms in our testing:

Beta labs

 * Beta labs has similar (almost) set up as production wikis.
 * It is used as Test Wiki in browser testing.
 * Current list of test wikis used by Language Engineering team is: https://www.mediawiki.org/wiki/Language_portal/Test_wikis

Cloudbee
Cloudbee is used in automated browser testing jobs.

Recommendations
1. Recommendations for each project is listed in Use Cases section.

2. Labs instance for each project where applicable.

Example: instance for Content Translation (CX) http://language-stage1.wmflabs.org has been setup exclusively to test. Content Translation server is setup on http://cxserver.wmflabs.org This server is also used in automated browser testing procedure.

3. Automated browser testing instances with controlled environment.

Example: instance for web fonts enabled by default in ULS extension at, http://language-browsertests.wmflabs.org Such setup eases testing where our it is difficult to test in default senarios.

4. Test Case Management System(s).

This will reduce time in manual testings like MLEB testing. We can easily compre previous results with current result. This will also help us in tracking bugs reported by such tests.

5. Automated unit testing. This is experimental idea and can be done with discussion with team.

6. Test driven development.

7. System and performance level testing. For example, Page Loading.