Extension:Cognate

From MediaWiki.org
Jump to: navigation, search
Translate this page; This page contains changes which are not marked for translation.

Other languages:
Deutsch • ‎English • ‎Esperanto • ‎español • ‎euskara • ‎français • ‎日本語 • ‎Bahasa Melayu • ‎Nederlands • ‎português do Brasil • ‎中文
MediaWiki extensions manualManual:Extensions
Crystal Clear action run.png
Extension:Cognate

Release status:Extension status stable

ImplementationTemplate:Extension#type Database
DescriptionTemplate:Extension#description Link different language versions of a page by using the page title.
Author(s)Template:Extension#username Gabriel Birke, Addshore
Latest versionTemplate:Extension#version Continuous updates
Compatibility policyCompatibility#mediawiki_extensions release branches
MediaWikiTemplate:Extension#mediawiki 1.29+
Database changesTemplate:Extension#needs-updatephp Yes
TablesTemplate:Extension#table1 cognate_sitesExtension:Cognate/cognate_sites table
cognate_pagesExtension:Cognate/cognate_pages table
cognate_titlesExtension:Cognate/cognate_titles table
LicenseTemplate:Extension#license GNU General Public License 2.0 or later
Download
Hooks usedTemplate:Extension#hook
PageContentSaveCompleteManual:Hooks/PageContentSaveComplete
LanguageLinksManual:Hooks/LanguageLinks
ArticleDeleteCompleteManual:Hooks/ArticleDeleteComplete

Translate the Cognate extension if it is available at translatewiki.net

Check usage and version matrix.

IssuesPhabricator

Open tasks · Report a bug

The Cognate extension creates a central store where the page titles for a group of sites are stored. It can then generate interwiki links across wiki projects in cases where the titles are the same. It was developed to solve the "Centralize interwiki language links for Wiktionary" task.

"Cognate" is a linguistic concept, referring to words in different languages developed from the same origin. This means that this extension is misnamed—since this extension links pages with the same title across wikis, a proper name would be “Homograph”.

Assumptions and restrictions[edit]

  • Pages must be in one of the standard MediaWiki namespaces.
  • Page titles are the same across languages (with some simple normalization applied).
  • Sites should have the same interwiki structure for language links.
  • Pages should not contain inter language links in wikitext as these will override the link provided by Cognate.
  • Unexpected hash conflicts are unlikely but could occur, and would result in unexpected language links.

How it works[edit]

Title Normalization[edit]

Very simple title normalization (reduction to ASCII) occurs within the extension. This can be seen in the StringNormalizer class.

Initially the amount of normalization is very small. Requests can be made to expand this and will be added on a case by case basis.

String Normalized Notes
Hello… Hello... The raw string contains an ellipsis character. This is normalized to three . characters
lepelle’ lepelle' The normalized string has a normalized apostrophe.

Title Hashing[edit]

Titles are hashed using sha256. This can be seen in the StringHasher class.

Part of the hash is then stored in the database in a BIG_INT field for efficient lookups.

There are roughly 18,446,744,073,709,551,615 possible values.

String Hash Int
A 559AEAD08264D5795D3909718CDD05ABD49572E84FE55590EEF31A88A08FDFFD 6168500820899059065
Foo 1CBEC737F863E4922CEE63CC2EBBFAAFCD1CFF8B790D8CFD2E6A5D550B648AFA 2071311921841431698
1234567890 C775E7B757EDE630CD0AA1113BD102661AB38829CA52A6422AB782862F268646 -4074095513246505424

Matching Hashes[edit]

As titles that require links are assumed to be the same post normalization, they will result in the same hash and thus the same Int stored in the database.

Some sample data might look as follows when loading the "Foo..." page on enwiktionary.

Wiki Title Hash Int Normalized Hash Int Notes
enwiktionary Foo... 395730596998145766 395730596998145766 Matched row
frwiktionary Foo… -7435652355441782233 395730596998145766 Matched row, even though the pre normalized title includes the ellipsis character.
dewiktionary Foo... 395730596998145766 395730596998145766 Matched row
arwiktionary Foo 2071311921841431698 2071311921841431698

Overwriting[edit]

It is possible to overwrite the automatic links provided by Cognate, simply by adding one or more interwiki links in the page.

That also means that to make Cognate work when the extension is deployed, the pages should not contain inter language links in their wikitext.

Testing[edit]

Wikimedia Labs logo notext.svg

The extension can be tested on beta wiktionary sites:

These sites are linked together using the Cognate extension with added interwiki sorting provided by the InterwikiSorting extension.

Installation[edit]

  • Download and place the file(s) in a directory called Cognate in your extensions/ folder.
  • Add the following code at the bottom of your LocalSettings.php:
    wfLoadExtension( 'Cognate' );
    $wgCognateDb = 'cognate_wiktionary';
    $wgCognateCluster = 'cognate';
    $wgCognateNamespaces = [ 0 ];
    
  • Run the update script which will automatically create the necessary database tables that this extension needs.
  • YesY Done - Navigate to Special:Version on your wiki to verify that the extension is successfully installed.
  • Populate the sites table by running the populateCognateSites.php maintenance script. Sites must already exist in the MediaWiki sites table with the correct groupings.
php ./maintenance/populateCognateSites.php --site-group=wiktionary
  • Populate the page and title tables by running the populateCognatePages.php maintenance script.
php ./maintenance/populateCognatePages.php