Extension talk:Cognate

From mediawiki.org
Latest comment: 2 years ago by Vis M in topic Add to sidebar of Wikidata Lexemes

Cognate for Wiktionary FAQ[edit]

Hello there,

Following the announcement on all Wiktionaries, I will collect there the frequent questions that people ask me, then provide the answers for all of them :) If a question doesn't have an answer yet, don't worry: it can take some more hours to collect all the technical details to answer you properly. Lea Lacroix (WMDE) (talk) 11:24, 13 April 2017 (UTC)Reply

What kind of Wiktionary pages Cognate will impact ?[edit]

Only the main namespace pages. For all the other pages (community discussion, etc), this will in the future be organized via sitelinks on Wikidata, just as it's done for Wikipedia article and community pages. For now, all these pages are still linked via wikitext links. We will let you know once this part of the plan will start. Lea Lacroix (WMDE) (talk) 11:24, 13 April 2017 (UTC) The sorting of interwiki links will occur on ALL namespaces ·addshore· talk to me! 14:21, 13 April 2017 (UTC)Reply

How can we access the list of the links ?[edit]

Using an example on beta wiktionary, the page 232 contains no interwiki links in the wikitext, but interiwki links still appear in the sidebar generated by the cognate extension. These interwiki links work in exactly the same was as the links provided by Wikibase on WikibaseClients, they are added to the parser output. As a result you can see them through the API using the parse action api module (as well as other api modules that expose interlangugae links). Once the Cognate database tables are replicated to the labs databases you will also be able to query for interlanguage links there. ·addshore· talk to me! 14:19, 13 April 2017 (UTC)Reply

How could we notify/filter the edits containing manual interlanguage links?[edit]

An abuse filter could be set up to detect this using a regex. en:Special:AbuseFilter/270 was a filter created to detect the removal of interwiki links on enwiki and can likely easily be modified to detect additions. ·addshore· talk to me! 14:21, 13 April 2017 (UTC)Reply

On German Wiktionary, two abusefilters have been created: 21 and 23. Lea Lacroix (WMDE) (talk) 15:14, 24 April 2017 (UTC)Reply

Does Cognate create links to redirection pages?[edit]

Not for now. It could be technically possible, and we can enable this feature, if the communities find a consensus on this and request it. Lea Lacroix (WMDE) (talk) 15:14, 24 April 2017 (UTC)Reply

Wiktionary discussion[edit]

Was there any discussion about this on Wiktionary with Wiktionary people? I wrote mailarchive:wiktionary-l/2016-December/001376.html for now. --Nemo 16:19, 18 December 2016 (UTC)Reply

Yes, we asked the communities and collected their feedbacks here and below. Lea Lacroix (WMDE) (talk) 09:59, 14 February 2017 (UTC)Reply
Sounds like a "no". I specifically wrote "on Wiktionary with Wiktionary people". --Nemo 18:01, 14 February 2017 (UTC)Reply
We have been contacted multiple times on fr: and en: Beer Parlours, although not so much recently because the extension was in development. Darkdadaah (talk) 14:01, 15 February 2017 (UTC)Reply

Title whinge (minor)[edit]

It is usually a really bad idea to take a specific jargon and apply it as the title to something which is similar but different yet intended to be used by the audience which uses the precise jargon you are coopting. E.g. log rollers used under heavy objects to facilitate movement should be called wheels (even when adapted to factory use - w:Lineshaft roller conveyors should be wheel conveyors, right?) Or all systems for converting source code to commands should be called assemblers. - Amgine (talk) 20:39, 18 December 2016 (UTC)Reply

I tend to agree, it is not immediately clear that all this extension does is replace interlanguage links. Darkdadaah (talk) 14:03, 15 February 2017 (UTC)Reply
I also. I was terribly confused by what this was supposed to do, even though I remember the early discussion. The extension would be better called "sametitle". --EncycloPetey (talk) 23:37, 18 March 2017 (UTC)Reply

Interwiki sorting[edit]

@Lea Lacroix (WMDE): is it normal than the French interwiki link appears before the Deutsch one here: http://enwiktionary-cognate.wmflabs.org/index.php/Test ? Looks like a bug. Automatik (talk) 20:21, 6 March 2017 (UTC)Reply

I wish IW links still follow the MediaWiki:Interwiki config-sorting order on each local wiki. See also meta:Interwiki sorting order. --Octahedron80 (talk) 08:36, 13 April 2017 (UTC)Reply

Hi both!
@Automatik: The labs instance has now been turned off, I'll update the extension page shortly. This was indeed a bug and has been fixed.
The IW links sort order will follow the sort as defined in the configuration, this can be different for each wiki. The sort orders are defined in this file, the setting for each wiki can be found in InitialiseSettings.php under wgInterwikiSortingSort. Currently all wiktionaries will default to 'code' which simply sorts by code.
·addshore· talk to me! 14:09, 13 April 2017 (UTC)Reply
Thanks for those details. It looks like the sort order for en.wikt is not quite right: we use the sort order found at m:MediaWiki:Interwiki config-sorting order-native-languagename, which does not appear at https://phabricator.wikimedia.org/source/mediawiki-config/browse/master/wmf-config/InterwikiSortOrders.php. https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php has us set to alphabetic, which is close, but has a number of differences. Can the m:MediaWiki:Interwiki config-sorting order-native-languagename order be added, and en.wikt set to that? (Or, if you believe that the order at m:MediaWiki:Interwiki config-sorting order-native-languagename is wrong and yours is right, can the order at m:MediaWiki:Interwiki config-sorting order-native-languagename be corrected?) —RuakhTALK 18:40, 16 April 2017 (UTC)Reply

zh.wiktionary[edit]

Hi there,

zh.wiktionary have a special way to organize pages, (/zh-hant, /zh-hans, traditional entries redirects to simplified entries, etc), pywikibot had some trouble with this, will Cognate work correctly with zh.wiktionary? Can we expect links to zh.wiktionary on traditional and simplified Chinese entries on other wiktionaries? Thanks. --Thibaut120094 (talk) 08:44, 14 April 2017 (UTC)Reply

Basically, the exact name must be mapped like recent IW links, so it should work fine. I think it will not become problem. However, zh.wiktionary may already have some both simplified/traditional pages that they do not redirect each other. If you see some bug, please make a ticket at Phabricator and they will solve it. --Octahedron80 (talk) 02:29, 25 April 2017 (UTC)Reply

Can the "title normalization" feature be turned off?[edit]

Can the "title normalization" feature be turned off? It seems to be a mistake: that's not how interwikis currently work (at least on the English Wiktionary).

Thanks in advance!
RuakhTALK 18:32, 16 April 2017 (UTC)Reply

Removeiw bot[edit]

Good stuff from Hydriz: pywikibot extension for removing interlanguage links on Wiktionaries. [1] --Octahedron80 (talk) 02:40, 25 April 2017 (UTC)Reply

Special:WithoutInterwiki[edit]

@Lea Lacroix (WMDE): As Cognate is available on Wiktionaries, Special:WithoutInterwiki of each one should not populate, should it? In the other hand, I wish Special:WithInterwiki is available instead. 😅 --Octahedron80 (talk) 04:37, 28 April 2017 (UTC)Reply

+1 for Special:WithInterwiki, I created a ticket: phab:T164066. --Thibaut120094 (talk) 12:08, 28 April 2017 (UTC)Reply

Wrong links on Category page[edit]

Please, have a look at wikt:el:Category:Προφορικές γλώσσες. The interlanguage links are completely wrong. It seems that this page is treated as belonging to namespace 0 and the Greek characteres are misinterpreted. Actually, there was a page in ns 0 that redirected to that category page; I have deleted it, yet the wrong links are still there. --Flyax (talk) 06:34, 1 May 2017 (UTC)Reply

This was because of el:Πρότυπο:vsi conatining interwiki. JAn Dudík (talk) 09:02, 2 May 2017 (UTC)Reply
Thanks. --Flyax (talk) 17:44, 2 May 2017 (UTC)Reply

Mysterious malfunction[edit]

Can you please urgently look into veni, vidi, vici on WikiWoordenboek, where Cognate seems to malfunction for mysterious reasons. There are 12 other wiktionaries with this page, but none of them shows up. These pages on the other hand do show an interwikilink to WikiWoordenboek. At the Polish veni, vidi, vici where the explicit interwikilinks are removed too, Cognate seems to work fine. So why isn't it at the Dutch page? It probably does not have to do with the space or the comma, because on oog om oog, tand om tand Cognate seems to do OK. The probleem seems not to be system dependent: it shows up both on a Windows 10 laptop and an Android smartphone. --MarcoSwart (talk) 14:02, 4 May 2017 (UTC)Reply

Follow up. Cognate is not working after editing any page. So editing WikiWoordenboek now slowly leads us to losing all interwikilinks. --MarcoSwart (talk) 14:37, 4 May 2017 (UTC)Reply
A practical interim solution was implemented. After purging a few hundred pages everything appears to be in order again. --MarcoSwart (talk) 21:58, 4 May 2017 (UTC)Reply

Links missing[edit]

Hello. It seems that Cognate extension doesn't work sometimes in certain words. Maybe this error is temorary and it will work without this kind of problems in the next few days, but I thought reporting this would be positive. For example, in the word eu:wikt:finnois interwiki links don't appear, while fr:wikt:finnois (and same word in different languages) exists. Best regards, --Enzaiklopedia (talk) 16:43, 4 May 2017 (UTC)Reply

It seems that problem has been solved only in eu:wikt:finnois, while the problem still exists in lots of other pages. Also, when looking up for Basque interwiki on, for example, fr:wikt:finnois or de:wikt:finnois (or any other language), you'll find that it doesn't appear. Thank you, --Bengoa (talk) 14:18, 5 May 2017 (UTC)Reply

Statistics and tools[edit]

@Lea Lacroix (WMDE) and Magnus Manske: Hello,

  • Will it be possible to get statistics about Cognate ?
  • A Petscan integration will be availlable ?

Thanks ! Otourly (talk) 08:27, 25 May 2017 (UTC)Reply

Hello Otourly, can you give some examples of statistics you would like to have? :) Lea Lacroix (WMDE) (talk) 10:27, 25 May 2017 (UTC)Reply
@Lea Lacroix (WMDE): Hello, I am especially thinking about "most interlinked entries" & "uninterlinked entries". Otourly (talk) 11:01, 26 May 2017 (UTC)Reply
It would be useful to see the "most interlinked entries not having a page at your own wiki"; presuming that if many different wiktionaries have this entry we might want to have it too. And we might detect spelling errors on WikiWoordenboek this way. --MarcoSwart (talk) 08:52, 28 May 2017 (UTC)Reply
A valuable statistic to get an aspect of the relative completeness of a wiktionary would be its number of interwikilinked entries relative tot the total number of all interwikilinked entries. And in relation tot the total number of entries of a wiktionary it would give an impression how much "unique material" it preserves. --MarcoSwart (talk) 08:52, 28 May 2017 (UTC)Reply
Another interesting feature would be a matrix of the number of interwiki's between each possible pair of wiktionaries. It could be useful in the search for volunteers with particular language skills if we can give them an impression of what is possible in quantitative terms. --MarcoSwart (talk) 08:52, 28 May 2017 (UTC)Reply
Thanks! I created a ticket to list all the ideas, and discuss about if and how we can provide this. Lea Lacroix (WMDE) (talk) 07:59, 29 May 2017 (UTC)Reply
@Lea Lacroix (WMDE): So, is there some hope to see stats about cognate ? Otourly (talk) 06:24, 25 July 2018 (UTC)Reply
Hey @Otourly: , yes, a dedicated dashboard is about to be released :) It's here, but I would not share it broadly as long as we don't have a proper documentation to describe what it does (we're working on the doc right now). Lea Lacroix (WMDE) (talk) 08:09, 25 July 2018 (UTC)Reply

Cognate on Wiktionary: discussion about redirects[edit]

Hello all,

I create this topic to have a global discussion about the way Cognate deals with redirects. This is the place where editors of different Wiktionaries can discuss about it, share their views and uses, and hopefully make a common decision, so the developers can provide a technical solution as close as possible to your needs.

When you're joining this discussion, please try to keep a nice and constructive state of mind. When you're describing something, please provide examples, with links if possible. When you're describing a need, please explain clearly what is your problem, what do you need to fix it. If you refer to a community decision, please provide a link. Our goal is to find a solution that fits to the maximum of editors, not personal needs. Thank you very much.

Current status[edit]

  • Cognate provides automatic links from pages with the exact same name. It makes the difference between each character, including capitals (foo is treated differently from FOO).
  • Cognate applies a short normalization on the title, such as replacing the ellipsis character by three dots, or the right quotation apostrophe to a normalized apostrophe. You can find the list of these replacements here. Some rules can be deleted or added, if we find a consensus amongst the different languages communities. Note however that changing normalization rules requires us to re-build the Cognate database.
  • Cognate doesn't show links to redirects. Pages being redirects are ignored by the extension and are not displayed in the automatic interlanguages links.
  • Discussions happened about this redirects issue, both for and against.

Proposal[edit]

Our proposal is to allow redirects in Cognate. Which means that the extension will include the pages that are redirects, but will not follow them automatically.

Let's take an existing example:

Discussion[edit]

What do you think about this proposal? Do you have examples of uses where it would or would not work? Do you have other issues regarding the redirects? Thanks, Lea Lacroix (WMDE) (talk) 10:30, 1 June 2017 (UTC)Reply

Note that currently en:エッチする with redirect=no links to fr:エッチする throught Cognate, so it is reasonably to show it also vice versa. --Vriullop (talk) 14:22, 1 June 2017 (UTC)Reply
I agree that normalization is not sufficient, and that redirects should be taken into account. An example, mener quelqu’un par le bout du nez should redirect to mener par le bout du nez. It's the same phrase, undisputably, with the same spelling, the redirect should help readers searching it, that's all. Lmaltier (talk) 05:50, 2 June 2017 (UTC)Reply

Interaction of Redirects and Normalization[edit]

Allowing Cognate to include redirects in the automatic language links may have undesirable consequences. Consider what happens when redirects are created between titles that are equivalent according to Cognate normalization:

If we just change Cognate to include redirects, then the following will happen:

It seems like such redirects are quite frequent; a quick (and dirty) database query reveals about 1000 redirects of this kind on English Wiktionary (Query: MariaDB [enwiktionary_p]> select count(*) from page join redirect on page_id = rd_from where rd_title like "%'%" and page_title like "%’%";)

I see two ways to address the issue of multiple language links being shown due to redirects with an "equivalent" title:

  • Make Cognate prefer the language links with the exact same title as the local page. This means only one language link is shown - but in some cases, this may be the "wrong" one (the one pointing to the redirect). In particular, since English and French Wiktionaries use different conventions for the apostrophy, all the links between pages on English and French Wiktionary with an apostrophy in the title would point to redirects.
  • Make Cognate know which pages are redirects, and prefer non-redirects. Since it would be too slow to look this up in each wiki database separately, we'd have to change the Cognate database schema for this. That's not horrible, but a bit of work for developers and database administrators.

I would propose to go with the first option for now. If this is not sufficient, we can still go for the second option later. -- Daniel Kinzler (WMDE) (talk) 14:48, 1 June 2017 (UTC)Reply

Your Option 1 would have the same result as Option 2 for the user, wouldn't it? They would click on the interwiki link and be taken to the other wiki, where they would be redirected and sent to the content-having page. So, Option 1 seems unproblematic.
Note also that one big reason there are redirects from the curly to the straight apostrophe on en.Wikt (for, in theory, every single page whose title contains an apostrophe) is because that was needed in order for interwiki links to fr.Wikt (via fr.Wikt having reciprocal interwikis and each page on each wiki linking to the other wiki's exactly-identically-titled page). If Cognate links wikt:fr:L’Hôpital and wikt:L'Hôpital automatically, that reason for the redirects is gone. Another reason for redirects was to handle people who copied-and-pasted and searched one form, but improvements to the search function now handle that automatically, so that reason is gone, too. In other words, each community should discuss whether they want to simply delete the apostrophe redirects. The situation for other characters which are normalized, like ellipses, is probably similar. -sche (talk) 06:17, 2 June 2017 (UTC)Reply
Note that redirects remain needed in fr.wikt since most manual links use the straight apostrophe within the wiki. But that is an internal problem for only fr and any wikt which choose to use curly apostrophes.
As for the proposal, I think that option 1 is enough. Linking to redirections is not a problem. Darkdadaah (talk) 08:45, 2 June 2017 (UTC)Reply

Could you please link wikt:en:dimerc'her and wikt:fr:dimercʼher? — TAKASUGI Shinji (talk) 03:26, 27 June 2017 (UTC)Reply

This is a specific case, as the letter « cʼh » (with yet another apostroph : U+02BC, and it's 3 characters but only one letter, 4th letter of the alphabet between ch and d) is very specific and exist only in breton (and loanword from the breton, like breton names in French). For that case, I suggest to consider the three apostrophes as equivalent. Cdlt, VIGNERON (talk) 09:44, 11 July 2017 (UTC)Reply

@Daniel Kinzler (WMDE): I can't do a database query so I used the grep tool to do some tests and checkings. Can you confirm me the following assumption: « there is never two entries which differ only by the apostroph » (f we except redirects and hopefully, this is true for all wiktionaries). There is one obvious exception: the one character entries (wikt:fr:', wikt:fr:’, same on most wiktionaries) and there is some temporary exception (I've corrected a couple of wrong duplicate entries). It means that some wiktionaries have chosen the ', some the ’ (and there is other apostrophes like ʼ), but there is never a mix. Cdlt, VIGNERON (talk) 10:10, 11 July 2017 (UTC)Reply

Support. @Lea Lacroix (WMDE): 4 years later. A solution is needed for "https://sv.wiktionary.org/wiki/When_in_Rome,_do_as_the_Romans_do." vs "https://en.wiktionary.org/wiki/when_in_Rome,_do_as_the_Romans_do". Taylor 49 (talk) 17:49, 6 February 2021 (UTC)Reply
Thanks @Taylor 49: for the ping! Could you add a comment on the related Phabricator task and describe more precisely what you would need? This would help us moving this task forward. Thanks, Lea Lacroix (WMDE) (talk) 17:32, 9 February 2021 (UTC)Reply
@Lea Lacroix (WMDE): There seem to be at least 2 open bugs about the issue: https://phabricator.wikimedia.org/T163717 https://phabricator.wikimedia.org/T165061 and I improved the description now. Taylor 49 (talk) 18:00, 9 February 2021 (UTC)Reply
@Lea Lacroix (WMDE): phabricator.wikimedia.org/T165062 "step 2" declined phabricator.wikimedia.org/T165061 "step 1" open news "This should be getting rolled out with the train this week" date Mar-17. It does not work yet. Taylor 49 (talk) 15:12, 21 March 2021 (UTC)Reply
Hi @Taylor 49: , did you try to do an edit on one of the pages to see if the sitelink is shown? This may be needed for cache reasons.
If it doesn't work, can you indicate an example of a page where the sitelink is expected and doesn't appear? Thanks Lea Lacroix (WMDE) (talk) 08:42, 22 March 2021 (UTC)Reply
I tried to purge and nulledit this one: "https://sv.wiktionary.org/wiki/When_in_Rome,_do_as_the_Romans_do." and "https://en.wiktionary.org/w/index.php?title=When_in_Rome,_do_as_the_Romans_do." and still no interwikies. Taylor 49 (talk) 08:50, 22 March 2021 (UTC)Reply
UPDATE: @Lea Lacroix (WMDE): The links just appeared after nullediting on both sides. Taylor 49 (talk) 08:54, 22 March 2021 (UTC)Reply
Great, thanks. Feel free to ping me if you encounter other issues. Lea Lacroix (WMDE) (talk) 10:26, 22 March 2021 (UTC)Reply

Possibility of templates accessing the Cognate database[edit]

On en.Wikt, words' translations into other languages have a superscript link to a language's home wiki if it has an entry on the word in question. For example, wikt:cat#Translations has the line "Maltese: qattus (mt) m, qattusa f", because mt.Wikt has an entry for "qattus" and no entry for "qattusa".

Currently, this system is maintained by having one template {{t+}} that generates links, and another template {{t}} that doesn't, and by having bots periodically cross-check entries in translations' tables against entries other wikis have. Hence, if mt.Wikt were to delete its entry on "qattus" but create an entry for "qattusa", a bot would eventually update wikt:cat#Translations to say "Maltese: qattus m, qattusa (mt) f".

If it were possible for templates to access the database Cognate uses to determine which wikis to show interwiki links to ("which wikis have which pages"), then {{t}} could "automatically" determine whether the other wiki had a page and hence whether to display a link or not.

Is it possible for a template/module to access the database like that? If not, would it be problematic to make it possible? (Let me know if I should paste this into Phabricator as a feature request.) -sche (talk) 06:38, 2 June 2017 (UTC)Reply

Note that from a Lua perspective only one request to get the whole list of interwikis would be enough for a whole page, as it can be cached. This is important because there can be hundreds of translation templates requesting interwikis, and we would not want to make a separate request for each one. This also applies to templates in fr.wiktionary and probably other languages.
The main issue I think would be how we refresh the pages, since currently we have to modify the pages or the templates/modules for them to be updated. So we may want to trigger an update for a page whenever its interwiki list changes, but I don't know how we can do that.
Finally, we should ideally expect those translation interwiki links to only appear if the corresponding lexeme/meaning exists in the other wikis, not just the whole page. But that's a problem for another time. Darkdadaah (talk) 08:33, 2 June 2017 (UTC)Reply
See phab:T163734. --Vriullop (talk) 18:51, 2 June 2017 (UTC)Reply

Can someone explain this for normal people ?[edit]

So I just wanted to add something to wikidata, and got "Warning: Wikidata's notability policy does not allow links to Wiktionary entries unless the interlanguage links cannot be automatically provided. By clicking on "save", you confirm that this is the case. In general, connecting Wiktionary words to Wikidata concepts is not correct", which told me NOTHING. Then I followed the link to notability policy, which basically forwarded em to this extension page, and I still have NO IDEA what any of this means... Think of the normal users please, this is a terrible experience. 09:36, 7 September 2017 (UTC)

Translation into Korean[edit]

The page on the Cognate extension ought to be translated into Korean, because the Korean Wiktionary apparently didn't get the memo on this like the English Wiktionary did. --Lo Ximiendo (talk) 02:20, 12 October 2017 (UTC)Reply

@Lo Ximiendo: translatewiki:Special:Translate/ext-cognate, please. --Liuxinyu970226 (talk) 09:44, 6 March 2021 (UTC)Reply

Database dumps[edit]

Are the dumps of the cognate database available somewhere? – Jberkel (talk) 16:25, 21 October 2020 (UTC)Reply

Add to sidebar of Wikidata Lexemes[edit]

It will aid to see which all wiktionaries have an entry with same title as the Lexeme, and also allow to navigate to Wiktionary entry quickly. See d:Wikidata:Lexicographical_data/Ideas_of_tools#Show_interlanguage_sitelinks_to_Wiktionaries_on_sidebars_of_WD:Lexeme_namespace. Thanks. Vis M (talk) 10:52, 23 June 2021 (UTC)Reply