Jump to content

Product Safety and Integrity/Anti-abuse signals/hCaptcha/vi

shortcut: hcaptcha
From mediawiki.org
This page is a translated version of the page Product Safety and Integrity/Anti-abuse signals/hCaptcha and the translation is 3% complete.

The Product Safety and Integrity team is testing a bot detection service (hCaptcha) as a possible replacement for our current CAPTCHA system. We want to see if this service can better protect Wikimedia projects from malicious bots while also potentially improving usability and accessibility. See our blog post where we announce this project.

Tổng quan

Hệ thống CAPTCHA hiện tại, FancyCaptcha

The current CAPTCHA system, FancyCaptcha, was first introduced to the wikis back in the 2000s, and was built for an earlier era of the web. The page for FancyCaptcha has for years displayed a warning: "This type is used by very few wikis outside WMF, if any, probably because of scarce effectiveness."

The visual challenge frustrates people – especially those with visual impairments or who use non-Latin keyboards. It also doesn't offer good non-visual options. And more fundamentally, it's too easy for bots to get around. Because of this, Wikimedia projects are left too open to high-scale vandalism or other operations, which puts a heavy strain on both volunteers and Foundation anti-abuse teams.

To help solve this, the Wikimedia Foundation is running a trial to test hCaptcha Enterprise as a possible replacement for the current system. The goal is to evaluate whether this new bot detection system can help with identifying automated actors in a variety of situations.

Trial

Screenshot of hCaptcha on the account creation page, with privacy policy and terms links showing.
During the trial, the line mentioning hCaptcha, and links to its terms of service and privacy policy, appear above the "Create your account" button. They also appear in the editing interfaces.

The trial has begun on several Wikipedias where hCaptcha was enabled on Special:CreateAccount. As the next step, it will also be enabled as part of the editing flow for users without the skipcaptcha right. These are generally logged-out sessions, temporary account users, and newly-registered account users (accounts that are not autoconfirmed). We may expand the trial to cover more wikis to evaluate hCaptcha's effectiveness in different situations.

During the trial, we are collecting data to see how effective it is at bot detection, evaluating its overall usability and accessibility, and assessing community feedback. We will need to run the trial for at least three months to evaluate hCaptcha on account creation. During the trial, we are monitoring:

  • Correlation between accounts flagged as bots by hCaptcha and blocks issued on Wikimedia sites
  • Qualitative review of accounts flagged as bots by hCaptcha
  • Rate of account creations
  • Rate of locks that mention spam or spambots
  • Failure rates for hCaptcha challenges
  • Analytics information for the Special:CreateAccount funnel
  • Impacts on the edit funnel
  • Performance impacts to editing workflows
  • Correlation between risk scores assigned to edits and account blocks
  • Community feedback

After at least three months, we will be able to report back to the communities with our conclusions and recommendations for next steps.

We recognize that sending data to a third-party service has privacy risks, and this is not something that we take lightly. See below for more technical detail about how we are reducing the risks to user privacy in this trial. Our view is that we need to balance the risks of the current state, where automated actors can still likely create accounts and make edits, with the privacy risks of using a third-party system like hCaptcha.

Our Legal department has approved such implementation of hCaptcha and confirmed that it is in line with our Privacy policy and Terms of Use.

How hCaptcha works

  • A small number (about 0.1%) of users creating new accounts or publishing changes encounter a challenge. Visitors that receive a challenge need to complete it to create an account or publish changes. This is what hCaptcha calls 99.9% Passive mode. You can see screenshots here.
  • Users with sight issues or other accessibility needs can choose a text-based challenge that can be completed using only a keyboard.
  • The service sends back a "risk score" that is their confidence level in the account having been made by an inauthentic user. This risk score is not public, but the Foundation has it saved privately to support analysis and responses to potentially bot-driven activity.

Privacy safeguards and risks

We are taking some specific technical measures to reduce the sensitivity of the information that is sent/available to hCaptcha:

  • User traffic is routed through a Wikimedia-controlled proxy (hcaptcha.wikimedia.org) to ensure that users' IP addresses are not transmitted to hCaptcha. The proxy sends a hashed IP address to hCaptcha to allow their service to aggregate repeat requests from the same IP without needing to see the raw IP.
  • The proxy also ensures that sensitive headers and cookies are removed. This includes cookies set by Wikimedia Foundation's own traffic infrastructure (GeoIP and WMF-Uniq). These cookies will be visible in user browsers as having been sent to Wikimedia's proxies, but are dropped before being sent to hCaptcha.
  • The hCaptcha script is sandboxed when loaded into the Wikipedia session, using what hCaptcha calls "Secure Enclave" mode. This prevents their code from seeing or interfering with the page context of the user session, and prevents hCaptcha from seeing the specific URL of the page, or reading or modifying application state variables.
  • Additionally, hCaptcha discards what data they do collect about clients visiting Wikimedia properties within 10 days.

These are significant mitigations, but some risks remain present. For example, a bad actor with access to internal hCaptcha data, who knew that the trial was currently limited to account creation, could correlate hCaptcha's data with a Wikimedia account creation event. This is potentially possible because the Wikimedia projects, unlike most websites, publicly log precise timestamps of many user actions, including account creation.

We expect this risk to decrease naturally with more expansive use of the service, especially when additional actions like editing are included. From the perspective of hCaptcha, which does not see URLs, all Wikipedia actions look the same and can't directly be correlated to specific actions.

More generally, the web is a complex platform, and despite our best effort to drop cookies and sandbox iframes, there is always some risk of gaps in how we constrain the security and privacy implications of third-party code being embedded in the wikis. We will be continuing to look at how we can strengthen our sandboxing approach, and welcome community analysis and recommendations in this area.

Deployment timeline

  • : Deployment to test2wiki
  • : Rollout to selected trial Wikipedias
  • October 2025 onwards: Account creation trial evaluation & reporting
  • : Rollout of hCaptcha bot detection on edits for wikitext editor
  • : Work on VisualEditor, MobileFrontend, DiscussionTools and other editing interfaces

Contact

Subscribe to the newsletter

FAQ

Does hCaptcha have access to my IP address?

No. All client-side interactions with hCaptcha's servers are routed through a Wikimedia proxy, and IP addresses are hashed before reaching hCaptcha. Additionally, server-side interactions with hCaptcha do not share the client IP.

Is hCaptcha performing browser fingerprinting?

hCaptcha evaluates thousands of signals from the browser, and the requests that browser makes to hCaptcha, in order to generate a bot detection risk score.

Exactly what is being analyzed is deliberately opaque and can change over time. Our privacy model for this trial is designed around the assumption that what hCaptcha does analyze is useful in identifying a device over time. Our privacy protections are intended to reduce the sensitivity of any collected data by disconnecting it from other information (including the user's IP, specific URL, and other cookies), and for that collected data to be discarded after a short time.

What is the data retention period for Wikimedia data sent to hCaptcha?

hCaptcha discards data about clients visiting Wikimedia properties within 10 days.

Why does it look like the GeoIP and WMF-Uniq cookies are being sent to hCaptcha?

These cookies are generated by WMF's traffic layer. We unset these cookies at the proxy level (code), so even though you see them being sent to the proxy, they are not forwarded onwards to hCaptcha.

Where will hCaptcha appear on Wikimedia projects?

At the beginning of the trial, only on Special:CreateAccount on the wikis that are part of the trial. We are likely to expand the trial later to cover some kinds of higher-risk editing. Any workflows not covered by hCaptcha will continue to be covered by the existing CAPTCHA system, for now.

What does 99.9% passive mode mean?

It means hCaptcha runs silently in the background for almost everyone. Only about 0.1% of users will see a challenge. The challenge is issued when hCaptcha needs more interaction data to generate its risk score.

How difficult is it for an attacker to bypass the hCaptcha challenge?

We expect hCaptcha to raise the barrier significantly for bad actors, while making it less likely that a good-faith human user will be flagged as a bot.

What about accessibility?

hCaptcha offers text-based challenges that work with screen readers. These are available in around 110 languages, with other languages available via machine translation. Visitors with sight issues or other accessibility needs can complete the text-based challenges using only their keyboard.

Can we use hCaptcha's accessibility cookie feature?

Unfortunately, not on Wikimedia properties. Users can set an accessibility cookie for any hCaptcha-using website, as described on hCaptcha's accessibility page. However, this is not enabled on Wikimedia's hCaptcha integration, because the feature is not compatible with the privacy proxy approach that Wikimedia uses to avoid sending IP addresses and other metadata to hCaptcha.

How will hCaptcha work for no-JavaScript users?

During this trial, JavaScript will be required to perform any action protected by hCaptcha, starting with account creation. We will specifically measure the level of impact on users without JavaScript during the trial.

Why not build our own technology instead of relying on a third-party service?

Wikimedia already has a CAPTCHA system, but improvements over time haven't stopped large-scale spamming. With bots and abuse becoming more sophisticated – especially with AI – it makes sense to try a more modern solution. Organizations that are dedicated to running bot detection services have dramatically more expertise and resources to throw at this problem than we can – especially the ongoing work of keeping up with the cat-and-mouse game of bot detection and evasion as it changes each year.

How will hCaptcha work with unsupported languages?

hCaptcha supports 100+ languages. However, since Wikipedia supports more than that, we are investigating (T399491) how we could integrate our language-fallback system to fill these gaps.

What happens if hCaptcha has an outage or otherwise stops functioning?

We would rely on our existing CAPTCHA as a fallback until we identify another solution.

See also