Jump to content

Selenium/How-to/Reduce-test-runtime

From mediawiki.org

In 2026 we worked on T420590 to bring MediaWiki's CI feedback loop under 10 minutes, and a big chunk of that work landed in the WebdriverIO browser tests. This page collects the patterns we ended up using. Both the configuration changes that ship in the shared wdio-mediawiki package and the spec-level patterns that each repository has to apply itself.

If you write or maintain selenium specs in mediawiki/core, an extension or a skin, this is for you.

The configuration changes propagate everywhere automatically because all of us depend on the same wdio-mediawiki. The test-code patterns don't — those live in your repo so you need to update them yourself.

Why this matters

[edit]

Why bother with all this? Because the slowest CI job sets the feedback loop for everybody. Before this work, quibble-with-gated-extensions-selenium-php83 took around 24 minutes — long enough that you'd context-switch, forget what you were doing, and come back to find half your morning gone. The combined changes brought the core-only browser-test job from a January 2026 average of 3:16 down to 1:23 in April 2026 (more than a 50 % reduction), and the broader gated-extensions job to a ~10:13 median. The point is simple: when you push a change, you should still remember why you pushed it by the time CI tells you whether it worked.

TL;DR — the patterns

[edit]
  1. If you don't need a browser, don't write a browser test — use api-testing or QUnit instead.
  2. Make tests independent so they can run in any order, in parallel.
  3. Run tests in parallel, scaled to the host's CPUs.
  4. Run headless by default. Record video only when debugging.
  5. Take screenshots only on failure.
  6. Log in once per suite, not once per test.
  7. Use the API to arrange fixtures, not the UI.
  8. Use isExisting() for "should not be there" assertions, never .not.toExist().
  9. Wait for a condition (waitForDisplayed, waitUntil), never browser.pause( N ).
  10. Split big spec files; put slowest specs first.
  11. Skip work that the job doesn't need (e.g. composer dev-requires for a browser-only job, or splitting an extension into its own CI job).
  12. Measure before optimising — instrument per-command timings when you need to find the hotspot.

Ok, that's the short version. The rest of this page explains each pattern, with code references and the Phabricator tasks where the work happened.


1. Parallelization scaled to the machine

[edit]

The single biggest lever is running test suites in parallel. We do that with maxInstances, which wdio-mediawiki now scales to the CPUs available in CI:

// tests/selenium/wdio-mediawiki/wdio-defaults.conf.js
maxInstances: process.env.CI ? Math.floor( os.cpus().length * 0.75 ) : 1,
  • In CI we use 75 % of available CPUs, leaving headroom for the rest of the job (database, PHP, Apache, FFmpeg if recording, …).
  • Locally we default to 1 so a developer's laptop is not flooded.
  • The configured value is logged at start-up so it's visible in CI output.

There's a catch though: if your specs assume sequential execution, raising maxInstances will start surfacing flakes that were never really fixed, just hidden. The fix is to make the specs independent (see §5), not to turn parallelization back off.

2. Headless by default

[edit]
// tests/selenium/wdio-mediawiki/wdio-defaults.conf.js
useBrowserHeadless: Boolean( process.env.CI ) || !process.env.DISPLAY,

Chrome --headless uses noticeably less CPU than a headful browser, and without it we also need an X server (XVFB) and FFmpeg to record video. So when you turn headless on, you also get to skip all of that in the common case. Which is where the real saving comes from.

To see the browser locally, just run npm run selenium-test (no DISPLAY needed , the default flips to non-headless when there is no CI env var).

To run headless locally:

npm run selenium-test --useBrowserHeadless

3. Screenshots only on failure

[edit]
// tests/selenium/wdio-mediawiki/wdio-defaults.conf.js
screenshotsOnFailureOnly: true,

Previously every test wrote a screenshot whether it passed or not. Now we only do it on failure (T416704). If you need a screenshot on every step while debugging a flake, re-enable it in your project's wdio.conf.js:

// tests/selenium/wdio.conf.js
export const config = { ...wdioDefaults,
    // To enable screenshots on all tests:
    screenshotsOnFailureOnly: false,
    // ...
};

4. Video recording is off by default

[edit]
recordVideo: false,

FFmpeg recording is expensive (both CPU and disk), and let's be honest: on most CI runs nobody ever looks at the video. So we only turn it on when we're actually debugging something. To turn it back on in a project's wdio.conf.js:

recordVideo: true,
useBrowserHeadless: false,

When recordVideo is on, wdio-mediawiki starts one XVFB per parallel worker so video capture works alongside parallelization (T344754).


5. Test code patterns

[edit]

Configuration alone won't speed up a slow spec — these are the patterns each spec author needs to apply, and most of them are about not doing things rather than doing new things.

But before any of that: do you actually need a browser test? A browser test exercises the full stack — PHP, JS, CSS, the DOM, the request cycle — and it is the most expensive test we have. If the thing you want to verify is backend behaviour, a tests/api-testing/ test is much cheaper and runs in a different, faster CI job. If it's pure JS, a QUnit test runs in milliseconds. Use the lightest test that proves the thing you actually care about, and reserve selenium and Cypress for genuine end-to-end UI behaviour.

5.1 Log in once per suite (before), not per test (beforeEach)

[edit]
// tests/selenium/specs/page.js
import { LoginPage, createApiClient } from 'wdio-mediawiki';

let apiClient;

before( async () => {
    apiClient = await createApiClient();
    await LoginPage.loginAdmin();
} );

it( 'should be re-creatable', async () => {
    // ... uses apiClient and the already-logged-in browser session
} );

it( 'should be editable', async () => {
    // ... same
} );

Logging in is a multi-second UI flow. Doing it once per suite instead of once per test cuts measurable seconds off every spec (T420081).

This is safe as long as the tests in the suite don't depend on a fresh login state — i.e. they don't log out or expect to be anonymous. If a test needs an anonymous browser, do await browser.deleteAllCookies() inside that specific it().

5.2 Use the API to arrange fixtures, not the UI

[edit]

The Api helper from wdio-mediawiki gives you edit, delete, read, login, etc. directly via api.php:

// Bad — slow: clicking through the editor just to set up a fixture
await EditPage.edit( name, initialContent );

// Good — fast: API call to create the page
await apiClient.edit( name, initialContent, 'create for delete' );
await apiClient.delete( name, 'delete prior to recreate' );

The general rule: only drive the UI for the thing you are actually asserting about. Everything else — accounts, pages, protections, files, preferences — should be set up via the API.

5.3 Use isExisting() for negative assertions

[edit]

If there's one thing in this whole document to take away, it's this. This single change (T419803) gave us the biggest test-code win of the round.

// Bad — waits the full retry window (often 10 s) for the element to "go away"
await expect( EditPage.save ).not.toExist();

// Good — returns immediately
expect( await EditPage.save.isExisting() ).toBe( false );

expect(...).not.toExist() is a retrying assertion: WebdriverIO will keep polling until either the element disappears or the configured timeout elapses. For an element that should already be absent, this means we always wait the full timeout. One spec (temporaryuser.js) dropped from 41 s to a couple of seconds after this swap.

Real example from the codebase:

// tests/selenium/specs/temporaryuser.js
expect( await CreateAccountPage.tempPasswordInput.isExisting() ).toBe( false,
    { message: 'Temporary users should not have the option to have a temporary password sent on signup (T328718)' }
);

When is not.toExist() correct? Only when you genuinely need to wait for an element to disappear (e.g. a spinner, a toast). For "this element should not be on this page" use isExisting().

5.4 Make tests independent

[edit]

Tests don't depend on others. The test suite should pass when tests are running in random order or in parallel. — Selenium/Anti-patterns

With maxInstances > 1, multiple spec files run at the same time, in separate browser sessions. So:

  • Don't rely on global state from another spec.
  • Use getTestString() from wdio-mediawiki/Util to generate unique page/user names so two parallel specs can't collide.
  • Don't assume execution order between it() blocks within a spec; if you do depend on it, document why and keep the dependency chain short.
import { Util } from 'wdio-mediawiki';
const name = Util.getTestString( 'Page-' );  // 'Page-' + random suffix

5.5 Split big spec files for parallelism

[edit]

maxInstances parallelizes across spec files, not within them. A single 500-line spec is one worker's job. If a project has one giant spec, splitting it into two or three smaller specs along feature lines lets the runner spread the work.

5.6 Run the slowest specs first

[edit]

The runtime of a parallel job is the runtime of the slowest worker after the last batch starts. If a 90-second spec is the last thing dispatched, it holds up the whole job. Reordering so the slowest specs are listed first in wdio.conf.js specs: keeps the long pole out of the tail.

5.7 Don't browser.pause() — wait for a condition

[edit]
// Bad — sleep is dead time, and either too short (flake) or too long (waste)
await browser.pause( 5000 );
await EditPage.heading.click();

// Good — wait for the actual condition you care about
await EditPage.heading.waitForDisplayed();
await EditPage.heading.click();

WebdriverIO already auto-waits for elements to be present and interactable when you act on them, so most explicit waits are unnecessary in the first place. When you genuinely need to wait, wait for a condition (waitForDisplayed, waitForExist, waitForClickable, waitUntil( fn )) — never for a fixed amount of time. A browser.pause( N ) is dead time: too short and the test flakes, too long and you pay it on every run.

When touching a spec, grep it for browser.pause and replace each with the appropriate condition wait.


6. Don't do work the job doesn't need

[edit]

It sounds obvious when you say it out loud, but a surprising amount of time goes into things the job will never use. A browser-test-only Quibble job, for example:

  • doesn't need PHPUnit's composer dev-requires — skip composer install --dev for that job.
  • shouldn't run npm install twice — Quibble was deduped to install dependencies once.

These are job-config changes, not spec changes, but they shave noticeable time off every run, and unlike spec changes you only have to do them once.


7. Split slow extensions into their own job

[edit]

Remember, CI feedback time is set by the slowest job, not the average. A few extensions (Wikibase, GrowthExperiments) have Cypress suites large enough that bundling them with the rest of the gated extensions makes quibble-with-gated-extensions-selenium-php83 the long pole for everyone who touches MediaWiki. The fix is structural: split that extension out into its own standalone Quibble job, so it runs in parallel with the rest of CI instead of in series inside the gated job.

This is a CI-config change in integration/config, not a spec change. Reasonable criteria for splitting:

  • The extension's Cypress tests dominate the gated job's runtime.

It's a one-time refactor with ongoing benefit: every commit afterwards gets faster feedback.


8. Diagnostics: measure before you optimise

[edit]

You can't fix what you can't see. And most of the wins above only became visible after we'd added the right measurements.

Sometimes a single spec is the long pole, and the per-test number alone doesn't tell you which command inside it is slow. When that happens, instrument the WebdriverIO or Cypress commands themselves. This is temporary measurement: paste it in, run the job, grep the build log, fix the hotspot, then take it out again. Don't leave it in CI: it produces noisy logs that hide real failures.

WebdriverIO

[edit]

Add beforeCommand / afterCommand hooks in your project's wdio.conf.js:

// tests/selenium/wdio.conf.js
import { config as wdioDefaults } from 'wdio-mediawiki/wdio-defaults.conf.js';

let cmdStart = 0n;
let cmdName = '';

export const config = { ...wdioDefaults,
    beforeCommand: function ( commandName ) {
        cmdStart = process.hrtime.bigint();
        cmdName = commandName;
    },
    afterCommand: function ( commandName ) {
        if ( cmdStart && cmdName === commandName ) {
            const ms = Number( process.hrtime.bigint() - cmdStart ) / 1e6;
            console.log( `[TIMING] ${ commandName } ${ ms.toFixed( 0 ) }ms` );
        }
    }
};

Reading the output

[edit]

Check the Jenkins build log for [TIMING] lines and look for the slowest commands. Common culprits:

  • a slow waitForDisplayed hiding a slow page load — fix the load, not the wait.
  • repeated setValue on a form that re-renders on each keystroke — type in fewer chunks or set the value via execute().
  • a navigation that could have been an apiClient call (see use the API).

Once you've identified the hotspot and fixed it, revert the instrumentation before merging. In the future we could add automatic instrumentation via configuration.


9. Cheat sheet for spec authors

[edit]
Smell Use instead
beforeEach( async () => LoginPage.loginAdmin() ) before( async () => LoginPage.loginAdmin() )
await EditPage.edit( name, content ) just to set up await apiClient.edit( name, content, '…' )
await expect( foo ).not.toExist() expect( await foo.isExisting() ).toBe( false )
await browser.pause( 5000 ) waitForDisplayed() / waitUntil()
Browser test that doesn't need a browser tests/api-testing/ test, or a QUnit test
Hard-coded page name 'TestPage' getTestString( 'TestPage-' )
Single 2 000-line spec file Split by feature
Slow spec listed last in specs: List slow specs first
recordVideo: true always Off by default; enable when debugging
screenshotsOnFailureOnly: false always true by default; flip when debugging