Topic on Talk:Universal Language Selector/Compact Language Links

Language discrimination

60 comments • 21:23, 9 July 2017 6 years ago

60

It is not a feature, it is a bug! It discriminates against many languages. These are languages with not many speakers (tens of languages in Russia, hundreds of languages in Africa, etc.), and also languages whose most speakers are not familiar with computers and Internet. For example, I know that the Saam language exists thank to links to Wikipedia in it. But the new bug will hide links to articles in "small" languages, and Wikipedia readers will think that those languages do not exist. So, please do not enable this bug and no more develop it!

Reply Edited 13:16, 8 July 2016 7 years ago

DidiWeidmann (talkcontribs)

Dear Gamlie Fishkin I can strongly support what you say: The new policy is a big discrimination of small languages and is completely unacceptable it is against the principles of human rights and lacks of respect of small cultures! ~~~~

Reply 13:46, 4 August 2016 7 years ago

Leofil2 (talkcontribs)

I agree with you 100%. This feature is a nuisance and contrary to encyclopedic spirit. An encyclopedia is a place where you can find all knowledge. All of it! Not a selection of it, according to some algorithm's idea of the languages the reader should be concerned with. I want to be able to see all the languages in which an article is available, even if I cannot read them, I want to be able to browse through such a list, with its variety of scripts, because it captures the very beauty of the Internet. I want to be able to reach the language in which any article is best relevant. I want to be able to do that even when I'm not logged in. Please make this feature optional, or best, cancel it.

Reply 21:17, 9 July 2017 6 years ago

Amire80 (talkcontribs)

This feature will make these languages more prominent. Now languages of Russia, such as Tatar, Bashkir and Udmurt, will be shown prominently to people who connect from Russia. Earlier, you had to look for them in a list of more than 100 languages. Same for Saami—it will be shown prominently to people who connect from Norway or Finland.

Reply 14:01, 8 July 2016 7 years ago

DidiWeidmann (talkcontribs)

It gives the impression that this new feature is especially and expressly designed with the intention to discriminate several languages like Esperanto or Yiddish! There was now real need for such a system – I ask to restore the old System!

Reply 14:36, 4 August 2016 7 years ago

Eugrus (talkcontribs)

This is simply not true. The minor languages of Russia are not shown to me in the compact list on the Russian Wikipedia. What is being shown are just the wikis I use frequently. See w:ru:Земля, for instance, which has interwikis in a dozen of minor languages of Russia, but none are shown.

Reply 13:26, 24 January 2017 7 years ago

Amire80 (talkcontribs)

@Eugrus, from which country are you connecting?

Which languages do you see? If you see languages that you use frequently, then it works as it is supposed to. Languages that you use frequently are probably the languages that you need the most. Languages of your country are shown if languages that you use frequently are not known, which will be true for all people the first time they see compact interlanguage links.

Reply Edited 18:57, 24 January 2017 7 years ago

Leofil2 (talkcontribs)

What gives you the right to decide which languages I "need the most", and show me only these? Are we still on a free Wikipedia, or was it purchased by Facebook:(?

Reply 21:19, 9 July 2017 6 years ago

Gamliel Fishkin (talkcontribs)

So, human beings outside of Russia will think that the Tatar language does not exist, etc. It is just a discrimination. As a final result of such a discrimination, almost any human being in the world will think, that there in the world only two languages do exist: his or her native language and English.

Reply Edited 14:36, 8 July 2016 7 years ago

Amire80 (talkcontribs)

The user interface shows a list of languages that is customized for every user and helps people find information in their language. In articles with a lot of languages the list will have nine languages, and not two, and there will a button that says "X more languages", where X is the number.

Reply 15:08, 8 July 2016 7 years ago

Holder (talkcontribs)

For me this feature also looks like a discrimination of especially small languages. It does not help people to find information in their language, it helps people to find information just in big dominating languages!

Reply 11:46, 11 July 2016 7 years ago

Amire80 (talkcontribs)

Hi @Holder,

Thanks a lot for your comment.

As I explained above, this feature doesn't discriminate minor languages, but actually helps them by showing them more prominently to users that are most likely to know them.

I noticed on your user page that you are writing in the Alemannic Wikipedia. I checked the CLDR territory-language information table, and this language is supposed to be shown prominently to people who are connecting from France, Liechtenstein and Switzerland (search that table for "gsw"). At the moment, however, there is a particular bug for this language because of which it is not actually shown. I filed this as a task with high priority, and it will be fixed very soon. Once it's fixed, it will be shown prominently to everybody who is connecting from these countries.

Reply 15:13, 11 July 2016 7 years ago

Holder (talkcontribs)

@Amire80, that's indeed interesting news, thank you very much.

This long known problem is much more complicated: Alemannic language (and therefore also Alemannic Wikipedia) covers gsw, swg, wae and gct. That's why it hasn't been solved over the last ten years.

How will this be fixed in this case? It would be nice if als:wp would also be shown for readers in Germany where Alemannic is also be spoken by about five million people.

Reply 09:57, 12 July 2016 7 years ago

Amire80 (talkcontribs)

Hi,

The data that we use can be found at http://www.unicode.org/cldr/charts/29/supplemental/territory_language_information.html

I can see that swg is listed in Germany and wae is listed under Liechtenstein and Switzerland. gct is not listed anywhere, but you can ask to add it by clicking "add new" under the relevant country and supplying information about the number of speakers of this language in that country.

Technically, we can probably redirect all these codes to als, but I'll have to discuss it with the team. I added a comment at the bug report: https://phabricator.wikimedia.org/T139949

Reply 10:24, 12 July 2016 7 years ago

C933103 (talkcontribs)

It rely on CLDR and CLDR rely on some official figures, so if a country refuse to recognize a language is spoken in it then the data could be slewed. Different country also have different standard of what language being spoken there are common enough to be listed in it, for instance some languages spoken by only 0.x% population are listed for some countries while they are not in some other regions.

Reply 19:26, 12 July 2016 7 years ago

Amire80 (talkcontribs)

From my experience, CLDR is fairly flexible with sources, and they listen to people who send reasonable bugs. If you have data that a language is spoken by a certain number of people, I encourage you to submit a bug there.

Reply 19:39, 12 July 2016 7 years ago

C933103 (talkcontribs)

And so we need people with enough knowledge in individual country's situation and are willing to put effort into searching for non biased info about language usage situation and also understanding that some languages that are traditionally not considered as language otself is actually a language, and the person must also be neutral enough in term of the matter to avoid intentional overlooking some languages and must also be willing to spend time to report the problem to CLDR.

Reply 23:44, 12 July 2016 7 years ago

Gamliel Fishkin (talkcontribs)

Firstly, some human beings speaking the Alemannic language can live outside of the countries where most its speakers live. Secondly, as a result of this universal language selector, human beings outside of these countries will not know that the Alemannic language exists.

It was in some of the first years of the twentieth century in the Russian Empire. Some day, one little Russian girl seen a nameplate in Yiddish or Hebrew on the door of some Jewish family. She was not Jewish, just Russian, but these letters interested her, she learned much and became a Soviet semitologist. Similarly, someone can be interested in a language of another nation thanks to seeing language's name in the interwikis; but that universal language selector destroys such a chance.

Reply 23:14, 11 July 2016 7 years ago

Amire80 (talkcontribs)

@Gamliel Fishkin, I understand, but there is also another possibility: That somebody who lives in Russia and thinks that there is no Wikipedia in the Tatar language, will find out that there is one. Compact Language Links make this more likely.

Reply 09:17, 12 July 2016 7 years ago

C933103 (talkcontribs)

@Amire80 When most major languages are displayed outside the panel, the need to find interlanguage link from the panel would be minimized. This reduce the chance for user to discover discover what they might want, if they don't know such a Wikipedia exist before. Even in a huge list, users would have a higher chance to discover their familiar small language than such a large list because users would be more familiar with language names written in their native script and native language, but if they never click into the panel then the chance for them to discover their language Wikipedia become 0

Reply 19:38, 12 July 2016 7 years ago

Amire80 (talkcontribs)

The languages are tailored for each user, and they are not necessarily major. A minor language of the user's country will be preferred to a major language spoken outside of the user's country.

Reply 19:46, 12 July 2016 7 years ago

C933103 (talkcontribs)

In countries like Russia, India or China, there are far more than 10 languages spoken in those countries and inevitably native language of some users can only be found in the expanded panel.

Reply 20:03, 12 July 2016 7 years ago

Amire80 (talkcontribs)

This is indeed an issue: https://phabricator.wikimedia.org/T133029

There's no easy solution for it, but we'll definitely get there.

Reply 20:07, 12 July 2016 7 years ago

Leofil2 (talkcontribs)

There is an easy solution: just cancel your very bad idea:( (aka known as fausse bonne idée in my mother tongue...)

Reply 21:23, 9 July 2017 6 years ago

Gamliel Fishkin (talkcontribs)

The only solution for such an issue is to turn this "feature" off and forget it.

Reply 20:16, 12 July 2016 7 years ago

C933103 (talkcontribs)

Even if you enable subregion-based filtering, there are always regions like Moscow or Shanghai where every community in the country would have people going to there for economic reason and result in more than 10 languages spoken in the same subregion.

Reply 21:21, 12 July 2016 7 years ago

Jørgen (talkcontribs)

I can see great possibilities in this feature - if it is changed a little bit. It is impossible to get all people satisfied with a uniform solution. Let the user decide! Have a list in 'preferences' where you can tick all the languages you want shown, and a button below to show the full list. As a dane, I see english, spanish and german, but need french, swedish and norwegian too. I have arabic, urdu, chinese and hindi. These languages are probably spoken by some immigrated inhabitants of my country, but useless to the vast majority. Føroysk and kalaalisut are languages from the north atlantic former possesions. I have no idea what to do with them, most danes cannot understand them, let alone write these languages.

Reply 07:52, 15 July 2016 7 years ago

Holder (talkcontribs)

@Jørgen, the problem is that there has to be a decision what is shown to readers.

Reply 07:59, 15 July 2016 7 years ago

Jørgen (talkcontribs)

yes, let the readers decide themselves by ticking a list. And for Ip-readers, let the list be default 'all' as it used to be.

Reply 08:01, 15 July 2016 7 years ago

Amire80 (talkcontribs)

You can pre-select the languages according to instructions at Universal Language Selector/Compact Language Links.

Also, every language that you select simply by clicking is remembered, so this feature adapts itself to every user (including anonymous readers).

Reply Edited 09:14, 15 July 2016 7 years ago

Madglad (talkcontribs)

Agree with Jørgen. The list is unusable, because it shows "big languages", of which many, nobody in a far away region understand (like Indonesian languages in Denmark, on the other side of the planet). It does not show the languages in the neighbourghing countries, that most people understand. Note also, that most users are not registered and cannot change their settings.

Reply 08:30, 15 July 2016 7 years ago

Amire80 (talkcontribs)

Hi @Madglad,

Are you connecting from Denmark? May I ask on which article do you see Indonesian?

Reply 09:12, 15 July 2016 7 years ago

Madglad (talkcontribs)

I saw Indonesian on several articles as far as I remember. Logged in from Denmark, and visiting da-wiki. But the languages are changing, depending on how I search around. But on another clean browser (tor) and logged out, I see

(still visiting da-wiki). This list of languages is not a good starting default value for Danish language speakers. It should be assumed that most people visiting da-wiki also know the other neighbourghing languages, and that almost no Danish speakers understand Indonesian and Indian languages. The algorithm shouldn't try to guess known languages, but should pick them from a list when visiting a language-specifik Wikipedia. Connecting to da-wiki from Australia, it should be considered more likely that the user understands Norwegian, than some aboriginal language. I don't understand why things like these are rolled out without previous discussion i the wikipedias.

Reply 10:59, 15 July 2016 7 years ago

Amire80 (talkcontribs)

In a usual working scenario, your previously selected languages are supposed to be remembered. If you are using Tor or other anonymizers or proxies, the system cannot know anything about you, so it is showing the biggest global languages, and Indonesian happens to be one of them. If you are using a private browser window, you also won't see any of your previous selections

You can configure your own preferred languages in the browser according to the instructions in Universal Language Selector/Compact Language Links.

Configuring preferred languages per specific project, as you suggest, will be possible very soon. See https://phabricator.wikimedia.org/T138973 .

Reply 11:37, 15 July 2016 7 years ago

Madglad (talkcontribs)

What is "a usual working scenario"?

I guess a typical scenario is a not logged in user, visiting one of the versions of Wikpedia, possibly not English. The IP is placed somewhere on the planet in region where the Indonesian languages etc. are not known, but the languages in the region are.

Quote: "Configuring preferred languages per specific project ... will be possible" - this gives me the impression, that this change is designed for en-wiki, and is not ready for implementation in the other wikipedias yet. Roll it out when it is developed and tested. Roll back for now.

Reply 08:36, 16 July 2016 7 years ago

Höyhens (talkcontribs)

Yes. This is a change that should not to be done.

Reply Edited 09:17, 15 July 2016 7 years ago

Gamliel Fishkin (talkcontribs)

There is one more topic. I see no problem if the system uses IP-address and other current information about an unregistered visitor. But if the system not only uses current information, but also remembers pages visited by this human being, it is a privacy gap.

Reply 14:09, 15 July 2016 7 years ago

Amire80 (talkcontribs)

No, the Compact Language Links feature doesn't remember this information.

Reply 15:15, 15 July 2016 7 years ago

Madglad (talkcontribs)

If a user visits the Danish language Wikipedia from a Danish IP address it will be reasonable to assume that interesting language versions to the person would be:

*sv=Swedish (neighbour country, mutually intelligible with Danish)

*no=Norwegian Bokmål (neighbour country, mutually intelligible with Danish)

*nn=Norwegian Nynorsk (neighbour country, mutually intelligible with Danish)

*de=German (minority language in part of Denmark, neighbour country, language taught in in Danish schools)

*en=English (language taught in in Danish schools)

Languages spoken in overseas countries of The Danish Realm:

*fo=Faroese

*kl=Kalaallisut

Languages taught in some schools:

*fr=French

*es=Spanish

Now, this example focused on Danish (and Denmark proper) can probably be generalised to most languages; languages, which have no contact with Indian, Indonesian, Chinese languages, but have contact with a lot of neighbour languages.

Reply 17:09, 15 July 2016 7 years ago

Amire80 (talkcontribs)

If the IP is identified as Denmark, then German, Faroese and Kalaallisut are supposed to be shown in the initial list (if, of course, a corresponding article in these languages is available).

Your IP probably wasn't identified as Denmark, which is quite possible if you used something like Tor, so the world's largest languages were shown.

If you don't see a language that interests you in the initial list, you can click "X more" and select the language that you need, and the next time it will be shown in the short list.

I know that Danish is similar to Norwegian and Swedish, but do you have data about the number of people in Denmark who are actually reading in these languages?

Reply 17:42, 15 July 2016 7 years ago

C933103 (talkcontribs)

IIRC wikipedia have its data about percentage of visit per language version per country? It should be possible to use the data in reverse to find out percentage of visit on specific language version in a specific country or subregion.
You can also check the mediawiki language fallback tree?

Reply 20:44, 15 July 2016 7 years ago

Madglad (talkcontribs)

Many times more read Swedish and Norwegian, than distant languages like Chinese. Especially if the article is better than the Danish one. Exact numbers unknown, but almost nobody in Denmark is able to read Chinese, almost everybody is able to read Swedish.

But I think Danish/Denmark is just an example, the problem is general for language-wikipedias. The solution is usable for en-wiki, not for the Wikipedias of other languages, and should not be implemented in these, in the current form.

Reply 20:03, 15 July 2016 7 years ago

Höyhens (talkcontribs)

I must admit to be extremely worried and sad for this attack against Wikipedia as a free dictionary. Cancel it as soon as possible, please.

Reply 17:18, 15 July 2016 7 years ago

Madglad (talkcontribs)

I now see that part of the problem is the guess is made based on the number of native speakers in a country, not the number of readers, which is a very big mistake.

An important issue is the assumption is that everybody is logged in, and has set up their browser languages etc. Setting up browser languages manually is what nerds were doing in the Netscape times. We are writing 2016. And most users btw. are not registered Wikipedia accounts.

And finally, assumption should be made on basis of the language the Wikipedia is running, the IP solution is developed for en-wiki.

This experiment should be rolled back on all language wikipedias exept en-wiki, until an acceptable algorithm is found.

Reply 20:51, 15 July 2016 7 years ago

Amire80 (talkcontribs)

The guess is not based on the number of native speakers. We use the data from CLDR, which clearly doesn't refer only to native speakers—for example, the entry for Denmark puts English at 86%, which is obviously not the number of native English speakers in Denmark, but probably the number who know it in one way or another. If you can cite data about the number of people in Denmark who can read Swedish or any other language, you should add it there by clicking "add new" in the table.

Also, the software really doesn't assume that everybody is logged in. Obviously, the vast majority of readers are not registered. The languages that they click in the the "more" panel are automatically added to their preferred languages, and the research that we conducted showed that it works for casual readers.

As the FAQ says, the languages defined in the browser and the languages identified by geolocation are secondary to what users actually click. Once you click Swedish for example, you will see it in the compact list.

Reply 10:02, 16 July 2016 7 years ago

Madglad (talkcontribs)

Technical question: How are »The languages that they click in the the "more" panel« added? Cookie? IP-address? Or?

Reply 19:30, 16 July 2016 7 years ago

Nikerabbit (talkcontribs)

Using LocalStorage.

Reply Edited 12:26, 18 July 2016 7 years ago

Amire80 (talkcontribs)

Our research shows that Swedish (and any other language) is less accessible when it is part of a long list than it is through the panel that opens when you click the "more" button.

The Nynorsk Wikipedia defined other Scandinavian languages as preferred in , so they would appear at the top of the long list. The same can be done in the Danish Wikipedia, and Compact Language Links will pick it up (not yet today, but soon).

Reply 10:23, 16 July 2016 7 years ago

Madglad (talkcontribs)

Which research? Link?

Reply 19:20, 16 July 2016 7 years ago

C933103 (talkcontribs)

The CLDR list only cover how many people speak the language not how many people understand the language. For instance in Iceland it say 100% for Icelandic but only 0.7% for Danish and none for other Scandinavian languages. Hindi-Urdu and Malaysian-Indonesian are same language with different vocabulary under different name, but it does not have data for Hindi in Pakistan or data for Indonesian in Malay, and the data for Urdu in India or data for Malay in Indonesia is only ~10% the data for Hindi/Indonesian in respective country. Almost all country in the world have a number of people that hace at least a certain understanding in English but you can see how many regions in the list have English listed. And the list doesn't even have Libyan Arabic in Libya or Taiwanese(Min-nan) in Taiwan. There are also Iran and Iraq which have only listed Central and/or Southern Kurdish but not Kurdish in general.

And did your test ask users the accessibility to a specific language edition with the specific language's name given, or when the user don't know what they are lookibg for which is the most cases?

Reply Edited 12:42, 16 July 2016 7 years ago

Madglad (talkcontribs)

In Iceland they learn Danish in school, so they are able to read Danish as well as their native Icelandic. (The languages are not mutually intelligible). In many countries especially in Europe it's most common to speak more than one language.

Reply 19:19, 16 July 2016 7 years ago

C933103 (talkcontribs)

ah I see, the example might be not that good.

Reply 00:11, 18 July 2016 7 years ago

Amire80 (talkcontribs)

The CLDR list is not perfect and it can be improved. It has direct and visible to add languages and report bugs.

Without Compact Language Links there is nothing that helps a user who is reading the Indonesian Wikipedia to find a link to the Malay language, because it is lost in the long list. With Compact Language Links, it is easier to find, because there is a search box to find the language, and after the user clicks it once, it will remembered.

As for the other question—it's a good question; I'll find it and I'll get back to you.

Reply 12:30, 16 July 2016 7 years ago

C933103 (talkcontribs)

Have you get any answers?

Reply 19:53, 28 July 2016 7 years ago

DidiWeidmann (talkcontribs)

That now not all language links are shown on Wikipedia, is a discrimination of the small languages and nothing else. There was now real need for such a change! All your arguments are very constructed an artificial and only serve to hide a political decision which sole goal is the discrimination of the small language communities: There was absolutely no reason to change the policy of language links: In an alphabetic list of languages there was never a problem to find the right ling even in a list of 200 and more languages! So I can only protest against this arbitral decision an ask to go back to the old System!

Reply 14:27, 4 August 2016 7 years ago

C933103 (talkcontribs)

How about adding a link to CLDR in CLL panel so that people can tell CLDR what languages are missing in their region?
CLDR can be improved but the process seems long. Like I submitted some additional languages being used in Hong Kong back in last year and that already get accepted, but those info are still not available in CLDR v.30 beta which is supposed to be the last CLDR to release in this year according to the release cycle which mean the process of fixing something via CLDR's issue tracker take >1 year.
Even if CLDR data get improved, that still does not resolve the problem that its scope is "population that is able to read and write each language, and is comfortable enough to use it with computers.", instead of "able to understand and retrieve info from the page" which would be a lower standard. For instance, most Chinese users can't write in Japanese but they can read out basic info from Japanese Wikipedia because of the common use of Chinese characters. It will certainly not reflected onto CLDR.

Reply Edited 12:55, 16 July 2016 7 years ago

Amire80 (talkcontribs)

Thanks for submitting fixes for CLDR!

People who can read Japanese (or any other language) will find the language in the panel, and after they click it, they will always see it.

Reply 12:59, 16 July 2016 7 years ago

C933103 (talkcontribs)

but I'm talking about people in general who don't know what they're looking for instead of finding specific language.
After click it then always see it is also problematic because sometime I only want to use the language once and then when those one time clicks accumulate, they would repell those frequent language out of the short list

Reply 15:30, 16 July 2016 7 years ago

Lingveno (talkcontribs)

Thanks for discriminating languages without an exact location, such as Esperanto and Yiddish.

Reply 13:16, 29 July 2016 7 years ago

Frenezulo (talkcontribs)

I'm going to restate some things that were posted above, because they don't seem to have penetrated. In New York City, Wolof, Tajik, Newari, Marathi, Hausa, Guarani, Azerbaijani, and Aymara are all spoken and read, but they will not be likely to appear on any location-based list of languages, because not enough people there speak and read them, and New York is a long way from the places where they originated. Users of these languages won't always think to click a "more languages" link, because they may not realize what the link is supposed to do, and they won't know their language is represented if they don't click to see. The natural assumption, given that some of these languages don't have much internet content to begin with, will be that they aren't there. Allowing users to customize which languages they see only helps if they know which languages they have to choose from.

Reply 11:44, 9 August 2016 7 years ago

Reply to "Language discrimination"