Jump to content

User:AKlapper (WMF)/Bitergia data quality queries

From mediawiki.org

The data behind wikimedia.biterg.io regularly needs updates to make our metrics reliable. The database can be queried via the Sortinghat Identities API. The database can be edited via the Sortinghat Identities API and via the web interface.

For convenience this page lists GraphQL queries and bash scripts that User:AKlapper (WMF) may occasionally run.

Find accounts which likely should have an affiliation / enrollment

[edit]
  • By potential email address:
    • query { individuals(filters:{term: "@wikimedia.org", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "@wikimedia.de", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "@wikimedia.se", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "hallowelt", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "speedandfunction", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "thisdot.co", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
  • By potential username:
    • query { individuals(filters:{term: "(WMF)", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "-WMF", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "(WMDE)", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "-WMDE", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
  • Look at GitLab accounts and if they should get merged into existing accounts (very cumbersome, see phab:T306770, could manually check email addresses and/or group membership on https://ldap.toolforge.org/user/someusername but does not scale):
    • query { individuals(filters:{isEnrolled:false, isBot:false, source:"gitlab"}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }

Queries not possible due to GraphQL limitations

[edit]
  • To identify folks that should have an affiliation set, use hostnames of email addresses of user accounts in the Phabricator database, then re-use those usernames as a condition in a GraphQL query on the Bitergia database.
  • To find duplicate Phabricator accounts which only changed their "Also Known As" (as long as phab:T305230 remains unresolved): Query for mks which share the very same name and both have source:"phabricator" but have different mks.
    • Same applies to any other source which allows renaming accounts.
  • To find accounts with same email addresses to merge: Query for mks which share the very same email but have different mks.

Check detached accounts with same mw and phab usernames if they are connected to merge

[edit]

Expensive / time-intense. See the script and DB commands.

Query all existing Phab accounts about their connected MediaWiki.org accounts

[edit]

As of 2025 this is not easily possible. The old script orphab:T170091 do work anymore as we have no local database dump.

Notes on automated server-side merging / unifying and recommendations

[edit]

Note that automatic affiliation assignment to an organization based on the email address only works when manually adding the email address to an identity (one single data source) but not when adding to a profile.