Extension talk:Cognate

Cognate for Wiktionary FAQ
Hello there,

Following the announcement on all Wiktionaries, I will collect there the frequent questions that people ask me, then provide the answers for all of them :) If a question doesn't have an answer yet, don't worry: it can take some more hours to collect all the technical details to answer you properly. Lea Lacroix (WMDE) (talk) 11:24, 13 April 2017 (UTC)

What kind of Wiktionary pages Cognate will impact ?
Only the main namespace pages. For all the other pages (community discussion, etc), this will in the future be organized via sitelinks on Wikidata, just as it's done for Wikipedia article and community pages. For now, all these pages are still linked via wikitext links. We will let you know once this part of the plan will start. Lea Lacroix (WMDE) (talk) 11:24, 13 April 2017 (UTC) The sorting of interwiki links will occur on ALL namespaces  ·addshore·  talk to me! 14:21, 13 April 2017 (UTC)

How can we access the list of the links ?
Using an example on beta wiktionary, the page 232 contains no interwiki links in the wikitext, but interiwki links still appear in the sidebar generated by the cognate extension. These interwiki links work in exactly the same was as the links provided by Wikibase on WikibaseClients, they are added to the parser output. As a result you can see them through the API using the parse action api module (as well as other api modules that expose interlangugae links). Once the Cognate database tables are replicated to the labs databases you will also be able to query for interlanguage links there.  ·addshore·  talk to me! 14:19, 13 April 2017 (UTC)

How could we notify/filter the edits containing manual interlanguage links?
An abuse filter could be set up to detect this using a regex. en:Special:AbuseFilter/270 was a filter created to detect the removal of interwiki links on enwiki and can likely easily be modified to detect additions.  ·addshore·  talk to me! 14:21, 13 April 2017 (UTC)
 * On German Wiktionary, two abusefilters have been created: 21 and 23. Lea Lacroix (WMDE) (talk) 15:14, 24 April 2017 (UTC)

Does Cognate create links to redirection pages?
Not for now. It could be technically possible, and we can enable this feature, if the communities find a consensus on this and request it. Lea Lacroix (WMDE) (talk) 15:14, 24 April 2017 (UTC)

Wiktionary discussion
Was there any discussion about this on Wiktionary with Wiktionary people? I wrote wiktionary-l/2016-December/001376.html for now. --Nemo 16:19, 18 December 2016 (UTC)
 * Yes, we asked the communities and collected their feedbacks here and below. Lea Lacroix (WMDE) (talk) 09:59, 14 February 2017 (UTC)
 * Sounds like a "no". I specifically wrote "on Wiktionary with Wiktionary people". --Nemo 18:01, 14 February 2017 (UTC)
 * We have been contacted multiple times on fr: and en: Beer Parlours, although not so much recently because the extension was in development. Darkdadaah (talk) 14:01, 15 February 2017 (UTC)

Title whinge (minor)
It is usually a really bad idea to take a specific jargon and apply it as the title to something which is similar but different yet intended to be used by the audience which uses the precise jargon you are coopting. E.g. log rollers used under heavy objects to facilitate movement should be called wheels (even when adapted to factory use - Lineshaft roller conveyors should be wheel conveyors, right?) Or all systems for converting source code to commands should be called assemblers. - Amgine (talk) 20:39, 18 December 2016 (UTC)
 * I tend to agree, it is not immediately clear that all this extension does is replace interlanguage links. Darkdadaah (talk) 14:03, 15 February 2017 (UTC)
 * I also. I was terribly confused by what this was supposed to do, even though I remember the early discussion. The extension would be better called "sametitle". --EncycloPetey (talk) 23:37, 18 March 2017 (UTC)
 * Indeed. The functionality has exactly zero to do with cognates.  Very poor choice of names.  &#8209;&#8209; Eiríkr Útlendi │ Tala við mig 17:17, 21 March 2017 (UTC)

Interwiki sorting
is it normal than the French interwiki link appears before the Deutsch one here: http://enwiktionary-cognate.wmflabs.org/index.php/Test ? Looks like a bug. Automatik (talk) 20:21, 6 March 2017 (UTC)

I wish IW links still follow the MediaWiki:Interwiki config-sorting order on each local wiki. See also Interwiki sorting order. --Octahedron80 (talk) 08:36, 13 April 2017 (UTC)


 * Hi both!
 * The labs instance has now been turned off, I'll update the extension page shortly. This was indeed a bug and has been fixed.
 * The IW links sort order will follow the sort as defined in the configuration, this can be different for each wiki. The sort orders are defined in this file, the setting for each wiki can be found in InitialiseSettings.php under wgInterwikiSortingSort. Currently all wiktionaries will default to 'code' which simply sorts by code.
 *  ·addshore·  talk to me! 14:09, 13 April 2017 (UTC)


 * Thanks for those details. It looks like the sort order for en.wikt is not quite right: we use the sort order found at m:MediaWiki:Interwiki config-sorting order-native-languagename, which does not appear at https://phabricator.wikimedia.org/source/mediawiki-config/browse/master/wmf-config/InterwikiSortOrders.php. https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php has us set to alphabetic, which is close, but has a number of differences. Can the m:MediaWiki:Interwiki config-sorting order-native-languagename order be added, and en.wikt set to that? (Or, if you believe that the order at m:MediaWiki:Interwiki config-sorting order-native-languagename is wrong and yours is right, can the order at m:MediaWiki:Interwiki config-sorting order-native-languagename be corrected?) —Ruakh TALK 18:40, 16 April 2017 (UTC)

zh.wiktionary
Hi there,

zh.wiktionary have a special way to organize pages, (/zh-hant, /zh-hans, traditional entries redirects to simplified entries, etc), pywikibot had some trouble with this, will Cognate work correctly with zh.wiktionary? Can we expect links to zh.wiktionary on traditional and simplified Chinese entries on other wiktionaries? Thanks. --Thibaut120094 (talk) 08:44, 14 April 2017 (UTC)


 * Basically, the exact name must be mapped like recent IW links, so it should work fine. I think it will not become problem. However, zh.wiktionary may already have some both simplified/traditional pages that they do not redirect each other. If you see some bug, please make a ticket at Phabricator and they will solve it. --Octahedron80 (talk) 02:29, 25 April 2017 (UTC)

Can the "title normalization" feature be turned off?
Can the "title normalization" feature be turned off? It seems to be a mistake: that's not how interwikis currently work (at least on the English Wiktionary).

Thanks in advance!

—Ruakh TALK 18:32, 16 April 2017 (UTC)

Removeiw bot
Good stuff from Hydriz: pywikibot extension for removing interlanguage links on Wiktionaries. --Octahedron80 (talk) 02:40, 25 April 2017 (UTC)

Special:WithoutInterwiki
As Cognate is available on Wiktionaries, Special:WithoutInterwiki of each one should not populate, should it? In the other hand, I wish Special:WithInterwiki is available instead. 😅 --Octahedron80 (talk) 04:37, 28 April 2017 (UTC)
 * +1 for Special:WithInterwiki, I created a ticket: phab:T164066. --Thibaut120094 (talk) 12:08, 28 April 2017 (UTC)

Wrong links on Category page
Please, have a look at wikt:el:Category:Προφορικές γλώσσες. The interlanguage links are completely wrong. It seems that this page is treated as belonging to namespace 0 and the Greek characteres are misinterpreted. Actually, there was a page in ns 0 that redirected to that category page; I have deleted it, yet the wrong links are still there. --Flyax (talk) 06:34, 1 May 2017 (UTC)
 * This was because of el:Πρότυπο:vsi conatining interwiki. JAn Dudík (talk) 09:02, 2 May 2017 (UTC)
 * Thanks. --Flyax (talk) 17:44, 2 May 2017 (UTC)

Mysterious malfunction
Can you please urgently look into veni, vidi, vici on WikiWoordenboek, where Cognate seems to malfunction for mysterious reasons. There are 12 other wiktionaries with this page, but none of them shows up. These pages on the other hand do show an interwikilink to WikiWoordenboek. At the Polish veni, vidi, vici where the explicit interwikilinks are removed too, Cognate seems to work fine. So why isn't it at the Dutch page? It probably does not have to do with the space or the comma, because on oog om oog, tand om tand Cognate seems to do OK. The probleem seems not to be system dependent: it shows up both on a Windows 10 laptop and an Android smartphone. --MarcoSwart (talk) 14:02, 4 May 2017 (UTC)
 * Follow up. Cognate is not working after editing any page. So editing WikiWoordenboek now slowly leads us to losing all interwikilinks. --MarcoSwart (talk) 14:37, 4 May 2017 (UTC)
 * A practical interim solution was implemented. After purging a few hundred pages everything appears to be in order again. --MarcoSwart (talk) 21:58, 4 May 2017 (UTC)

Links missing
Hello. It seems that Cognate extension doesn't work sometimes in certain words. Maybe this error is temorary and it will work without this kind of problems in the next few days, but I thought reporting this would be positive. For example, in the word eu:wikt:finnois interwiki links don't appear, while fr:wikt:finnois (and same word in different languages) exists. Best regards, --Enzaiklopedia (talk) 16:43, 4 May 2017 (UTC)


 * It seems that problem has been solved only in eu:wikt:finnois, while the problem still exists in lots of other pages. Also, when looking up for Basque interwiki on, for example, fr:wikt:finnois or de:wikt:finnois (or any other language), you'll find that it doesn't appear. Thank you, --Bengoa (talk) 14:18, 5 May 2017 (UTC)

Statistics and tools
Hello, Thanks ! Otourly (talk) 08:27, 25 May 2017 (UTC)
 * Will it be possible to get statistics about Cognate ?
 * A Petscan integration will be availlable ?
 * Hello Otourly, can you give some examples of statistics you would like to have? :) Lea Lacroix (WMDE) (talk) 10:27, 25 May 2017 (UTC)
 * Hello, I am especially thinking about "most interlinked entries" & "uninterlinked entries". Otourly (talk) 11:01, 26 May 2017 (UTC)
 * It would be useful to see the "most interlinked entries not having a page at your own wiki"; presuming that if many different wiktionaries have this entry we might want to have it too. And we might detect spelling errors on WikiWoordenboek this way. --MarcoSwart (talk) 08:52, 28 May 2017 (UTC)
 * A valuable statistic to get an aspect of the relative completeness of a wiktionary would be its number of interwikilinked entries relative tot the total number of all interwikilinked entries. And in relation tot the total number of entries of a wiktionary it would give an impression how much "unique material" it preserves. --MarcoSwart (talk) 08:52, 28 May 2017 (UTC)
 * Another interesting feature would be a matrix of the number of interwiki's between each possible pair of wiktionaries. It could be useful in the search for volunteers with particular language skills if we can give them an impression of what is possible in quantitative terms. --MarcoSwart (talk) 08:52, 28 May 2017 (UTC)
 * Thanks! I created a ticket to list all the ideas, and discuss about if and how we can provide this. Lea Lacroix (WMDE) (talk) 07:59, 29 May 2017 (UTC)

Cognate on Wiktionary: discussion about redirects
Hello all,

I create this topic to have a global discussion about the way Cognate deals with redirects. This is the place where editors of different Wiktionaries can discuss about it, share their views and uses, and hopefully make a common decision, so the developers can provide a technical solution as close as possible to your needs.

''When you're joining this discussion, please try to keep a nice and constructive state of mind. When you're describing something, please provide examples, with links if possible. When you're describing a need, please explain clearly what is your problem, what do you need to fix it. If you refer to a community decision, please provide a link. Our goal is to find a solution that fits to the maximum of editors, not personal needs. Thank you very much.''

Current status

 * Cognate provides automatic links from pages with the exact same name. It makes the difference between each character, including capitals (foo is treated differently from FOO).
 * Cognate applies a short normalization on the title, such as replacing the ellipsis character by three dots, or the right quotation apostrophe to a normalized apostrophe. You can find the list of these replacements here. Some rules can be deleted or added, if we find a consensus amongst the different languages communities. Note however that changing normalization rules requires us to re-build the Cognate database.
 * Cognate doesn't show links to redirects. Pages being redirects are ignored by the extension and are not displayed in the automatic interlanguages links.
 * Discussions happened about this redirects issue, both for and against.

Proposal
Our proposal is to allow redirects in Cognate. Which means that the extension will include the pages that are redirects, but will not follow them automatically.

Let's take an existing example:
 * On French Wiktionary, エッチする and エッチ are two different pages, while on English, エッチする is a redirection to エッチ.
 * Currently, エッチ contains an automatic link to エッチ in the sidebar, and vice versa. But エッチする has no automatic link to English, because エッチする is a redirect.
 * What we suggest is to allow linking to redirect, so エッチする can have a link to エッチする. When clicking on this link, the user will be redirected to エッチ, but we won't link directly エッチする to エッチ.

Discussion
What do you think about this proposal? Do you have examples of uses where it would or would not work? Do you have other issues regarding the redirects? Thanks, Lea Lacroix (WMDE) (talk) 10:30, 1 June 2017 (UTC)
 * Note that currently en:エッチする with redirect=no links to fr:エッチする throught Cognate, so it is reasonably to show it also vice versa. --Vriullop (talk) 14:22, 1 June 2017 (UTC)
 * I agree that normalization is not sufficient, and that redirects should be taken into account. An example, mener quelqu’un par le bout du nez should redirect to mener par le bout du nez. It's the same phrase, undisputably, with the same spelling, the redirect should help readers searching it, that's all. Lmaltier (talk) 05:50, 2 June 2017 (UTC)

Interaction of Redirects and Normalization
Allowing Cognate to include redirects in the automatic language links may have undesirable consequences. Consider what happens when redirects are created between titles that are equivalent according to Cognate normalization:
 * enwikt has a redirect ’cause -> 'cause.
 * frwikt has a redirect 'cause -> ’cause.

If we just change Cognate to include redirects, then the following will happen:
 * 'cause will have two language links to frwikt, to 'cause and to ’cause, with no way for the user to distinguish them. Clicking either link will lead to the same page (once directly, the other via a redirect).
 * On ’cause, in converse is true: it will have two language links to enwikt, one to ’cause and to 'cause.

It seems like such redirects are quite frequent; a quick (and dirty) database query reveals about 1000 redirects of this kind on English Wiktionary (Query: )

I see two ways to address the issue of multiple language links being shown due to redirects with an "equivalent" title:
 * Make Cognate prefer the language links with the exact same title as the local page. This means only one language link is shown - but in some cases, this may be the "wrong" one (the one pointing to the redirect). In particular, since English and French Wiktionaries use different conventions for the apostrophy, all the links between pages on English and French Wiktionary with an apostrophy in the title would point to redirects.
 * Make Cognate know which pages are redirects, and prefer non-redirects. Since it would be too slow to look this up in each wiki database separately, we'd have to change the Cognate database schema for this. That's not horrible, but a bit of work for developers and database administrators.

I would propose to go with the first option for now. If this is not sufficient, we can still go for the second option later. -- Daniel Kinzler (WMDE) (talk) 14:48, 1 June 2017 (UTC)


 * Your Option 1 would have the same result as Option 2 for the user, wouldn't it? They would click on the interwiki link and be taken to the other wiki, where they would be redirected and sent to the content-having page. So, Option 1 seems unproblematic.
 * Note also that one big reason there are redirects from the curly to the straight apostrophe on en.Wikt (for, in theory, every single page whose title contains an apostrophe) is because that was needed in order for interwiki links to fr.Wikt (via fr.Wikt having reciprocal interwikis and each page on each wiki linking to the other wiki's exactly-identically-titled page). If Cognate links L’Hôpital and wikt:L'Hôpital automatically, that reason for the redirects is gone. Another reason for redirects was to handle people who copied-and-pasted and searched one form, but improvements to the search function now handle that automatically, so that reason is gone, too. In other words, each community should discuss whether they want to simply delete the apostrophe redirects. The situation for other characters which are normalized, like ellipses, is probably similar. -sche (talk) 06:17, 2 June 2017 (UTC)
 * Note that redirects remain needed in fr.wikt since most manual links use the straight apostrophe within the wiki. But that is an internal problem for only fr and any wikt which choose to use curly apostrophes.
 * As for the proposal, I think that option 1 is enough. Linking to redirections is not a problem. Darkdadaah (talk) 08:45, 2 June 2017 (UTC)

Could you please link dimerc'her and dimercʼher? — TAKASUGI Shinji (talk) 03:26, 27 June 2017 (UTC)
 * This is a specific case, as the letter « cʼh » (with yet another apostroph : U+02BC, and it's 3 characters but only one letter, 4th letter of the alphabet between ch and d) is very specific and exist only in breton (and loanword from the breton, like breton names in French). For that case, I suggest to consider the three apostrophes as equivalent. Cdlt, VIGNERON (talk) 09:44, 11 July 2017 (UTC)

I can't do a database query so I used the grep tool to do some tests and checkings. Can you confirm me the following assumption: « there is never two entries which differ only by the apostroph » (f we except redirects and hopefully, this is true for all wiktionaries). There is one obvious exception: the one character entries (', ’, same on most wiktionaries) and there is some temporary exception (I've corrected a couple of wrong duplicate entries). It means that some wiktionaries have chosen the ', some the ’ (and there is other apostrophes like ʼ), but there is never a mix. Cdlt, VIGNERON (talk) 10:10, 11 July 2017 (UTC)

Possibility of templates accessing the Cognate database
On en.Wikt, words' translations into other languages have a superscript link to a language's home wiki if it has an entry on the word in question. For example, wikt:cat has the line "Maltese: qattus (mt) m, qattusa f", because mt.Wikt has an entry for "qattus" and no entry for "qattusa".

Currently, this system is maintained by having one template that generates links, and another template undefined that doesn't, and by having bots periodically cross-check entries in translations' tables against entries other wikis have. Hence, if mt.Wikt were to delete its entry on "qattus" but create an entry for "qattusa", a bot would eventually update wikt:cat to say "Maltese: qattus m, qattusa (mt) f".

If it were possible for templates to access the database Cognate uses to determine which wikis to show interwiki links to ("which wikis have which pages"), then undefined could "automatically" determine whether the other wiki had a page and hence whether to display a link or not.

Is it possible for a template/module to access the database like that? If not, would it be problematic to make it possible? (Let me know if I should paste this into Phabricator as a feature request.) -sche (talk) 06:38, 2 June 2017 (UTC)
 * Note that from a Lua perspective only one request to get the whole list of interwikis would be enough for a whole page, as it can be cached. This is important because there can be hundreds of translation templates requesting interwikis, and we would not want to make a separate request for each one. This also applies to templates in fr.wiktionary and probably other languages.
 * The main issue I think would be how we refresh the pages, since currently we have to modify the pages or the templates/modules for them to be updated. So we may want to trigger an update for a page whenever its interwiki list changes, but I don't know how we can do that.
 * Finally, we should ideally expect those translation interwiki links to only appear if the corresponding lexeme/meaning exists in the other wikis, not just the whole page. But that's a problem for another time. Darkdadaah (talk) 08:33, 2 June 2017 (UTC)
 * See T163734. --Vriullop (talk) 18:51, 2 June 2017 (UTC)