It is not a feature, it is a bug! It discriminates against many languages. These are languages with not many speakers (tens of languages in Russia, hundreds of languages in Africa, etc.), and also languages whose most speakers are not familiar with computers and Internet. For example, I know that the Saam language exists thank to links to Wikipedia in it. But the new bug will hide links to articles in "small" languages, and Wikipedia readers will think that those languages do not exist. So, please do not enable this bug and no more develop it!
Topic on Talk:Universal Language Selector/Compact Language Links
Dear Gamlie Fishkin I can strongly support what you say: The new policy is a big discrimination of small languages and is completely unacceptable it is against the principles of human rights and lacks of respect of small cultures! ~~~~
I agree with you 100%. This feature is a nuisance and contrary to encyclopedic spirit. An encyclopedia is a place where you can find all knowledge. All of it! Not a selection of it, according to some algorithm's idea of the languages the reader should be concerned with. I want to be able to see all the languages in which an article is available, even if I cannot read them, I want to be able to browse through such a list, with its variety of scripts, because it captures the very beauty of the Internet. I want to be able to reach the language in which any article is best relevant. I want to be able to do that even when I'm not logged in. Please make this feature optional, or best, cancel it.
This feature will make these languages more prominent. Now languages of Russia, such as Tatar, Bashkir and Udmurt, will be shown prominently to people who connect from Russia. Earlier, you had to look for them in a list of more than 100 languages. Same for Saami—it will be shown prominently to people who connect from Norway or Finland.
It gives the impression that this new feature is especially and expressly designed with the intention to discriminate several languages like Esperanto or Yiddish! There was now real need for such a system – I ask to restore the old System!
This is simply not true. The minor languages of Russia are not shown to me in the compact list on the Russian Wikipedia. What is being shown are just the wikis I use frequently. See w:ru:Земля, for instance, which has interwikis in a dozen of minor languages of Russia, but none are shown.
@Eugrus, from which country are you connecting?
Which languages do you see? If you see languages that you use frequently, then it works as it is supposed to. Languages that you use frequently are probably the languages that you need the most. Languages of your country are shown if languages that you use frequently are not known, which will be true for all people the first time they see compact interlanguage links.
What gives you the right to decide which languages I "need the most", and show me only these? Are we still on a free Wikipedia, or was it purchased by Facebook:(?
So, human beings outside of Russia will think that the Tatar language does not exist, etc. It is just a discrimination. As a final result of such a discrimination, almost any human being in the world will think, that there in the world only two languages do exist: his or her native language and English.
The user interface shows a list of languages that is customized for every user and helps people find information in their language. In articles with a lot of languages the list will have nine languages, and not two, and there will a button that says "X more languages", where X is the number.
For me this feature also looks like a discrimination of especially small languages. It does not help people to find information in their language, it helps people to find information just in big dominating languages!
Thanks a lot for your comment.
As I explained above, this feature doesn't discriminate minor languages, but actually helps them by showing them more prominently to users that are most likely to know them.
I noticed on your user page that you are writing in the Alemannic Wikipedia. I checked the CLDR territory-language information table, and this language is supposed to be shown prominently to people who are connecting from France, Liechtenstein and Switzerland (search that table for "gsw"). At the moment, however, there is a particular bug for this language because of which it is not actually shown. I filed this as a task with high priority, and it will be fixed very soon. Once it's fixed, it will be shown prominently to everybody who is connecting from these countries.
@Amire80, that's indeed interesting news, thank you very much.
This long known problem is much more complicated: Alemannic language (and therefore also Alemannic Wikipedia) covers gsw, swg, wae and gct. That's why it hasn't been solved over the last ten years.
How will this be fixed in this case? It would be nice if als:wp would also be shown for readers in Germany where Alemannic is also be spoken by about five million people.
The data that we use can be found at http://www.unicode.org/cldr/charts/29/supplemental/territory_language_information.html
I can see that swg is listed in Germany and wae is listed under Liechtenstein and Switzerland. gct is not listed anywhere, but you can ask to add it by clicking "add new" under the relevant country and supplying information about the number of speakers of this language in that country.
Technically, we can probably redirect all these codes to als, but I'll have to discuss it with the team. I added a comment at the bug report: https://phabricator.wikimedia.org/T139949
It rely on CLDR and CLDR rely on some official figures, so if a country refuse to recognize a language is spoken in it then the data could be slewed. Different country also have different standard of what language being spoken there are common enough to be listed in it, for instance some languages spoken by only 0.x% population are listed for some countries while they are not in some other regions.
From my experience, CLDR is fairly flexible with sources, and they listen to people who send reasonable bugs. If you have data that a language is spoken by a certain number of people, I encourage you to submit a bug there.
And so we need people with enough knowledge in individual country's situation and are willing to put effort into searching for non biased info about language usage situation and also understanding that some languages that are traditionally not considered as language otself is actually a language, and the person must also be neutral enough in term of the matter to avoid intentional overlooking some languages and must also be willing to spend time to report the problem to CLDR.
Firstly, some human beings speaking the Alemannic language can live outside of the countries where most its speakers live. Secondly, as a result of this universal language selector, human beings outside of these countries will not know that the Alemannic language exists.
It was in some of the first years of the twentieth century in the Russian Empire. Some day, one little Russian girl seen a nameplate in Yiddish or Hebrew on the door of some Jewish family. She was not Jewish, just Russian, but these letters interested her, she learned much and became a Soviet semitologist. Similarly, someone can be interested in a language of another nation thanks to seeing language's name in the interwikis; but that universal language selector destroys such a chance.
@Gamliel Fishkin, I understand, but there is also another possibility: That somebody who lives in Russia and thinks that there is no Wikipedia in the Tatar language, will find out that there is one. Compact Language Links make this more likely.
@Amire80 When most major languages are displayed outside the panel, the need to find interlanguage link from the panel would be minimized. This reduce the chance for user to discover discover what they might want, if they don't know such a Wikipedia exist before. Even in a huge list, users would have a higher chance to discover their familiar small language than such a large list because users would be more familiar with language names written in their native script and native language, but if they never click into the panel then the chance for them to discover their language Wikipedia become 0
The languages are tailored for each user, and they are not necessarily major. A minor language of the user's country will be preferred to a major language spoken outside of the user's country.
In countries like Russia, India or China, there are far more than 10 languages spoken in those countries and inevitably native language of some users can only be found in the expanded panel.
This is indeed an issue: https://phabricator.wikimedia.org/T133029
There's no easy solution for it, but we'll definitely get there.
There is an easy solution: just cancel your very bad idea:( (aka known as fausse bonne idée in my mother tongue...)
The only solution for such an issue is to turn this "feature" off and forget it.
Even if you enable subregion-based filtering, there are always regions like Moscow or Shanghai where every community in the country would have people going to there for economic reason and result in more than 10 languages spoken in the same subregion.
I can see great possibilities in this feature - if it is changed a little bit. It is impossible to get all people satisfied with a uniform solution. Let the user decide! Have a list in 'preferences' where you can tick all the languages you want shown, and a button below to show the full list. As a dane, I see english, spanish and german, but need french, swedish and norwegian too. I have arabic, urdu, chinese and hindi. These languages are probably spoken by some immigrated inhabitants of my country, but useless to the vast majority. Føroysk and kalaalisut are languages from the north atlantic former possesions. I have no idea what to do with them, most danes cannot understand them, let alone write these languages.
@Jørgen, the problem is that there has to be a decision what is shown to readers.
yes, let the readers decide themselves by ticking a list. And for Ip-readers, let the list be default 'all' as it used to be.
You can pre-select the languages according to instructions at Universal Language Selector/Compact Language Links.
Also, every language that you select simply by clicking is remembered, so this feature adapts itself to every user (including anonymous readers).
Agree with Jørgen. The list is unusable, because it shows "big languages", of which many, nobody in a far away region understand (like Indonesian languages in Denmark, on the other side of the planet). It does not show the languages in the neighbourghing countries, that most people understand. Note also, that most users are not registered and cannot change their settings.
Are you connecting from Denmark? May I ask on which article do you see Indonesian?
I saw Indonesian on several articles as far as I remember. Logged in from Denmark, and visiting da-wiki. But the languages are changing, depending on how I search around. But on another clean browser (tor) and logged out, I see
(still visiting da-wiki). This list of languages is not a good starting default value for Danish language speakers. It should be assumed that most people visiting da-wiki also know the other neighbourghing languages, and that almost no Danish speakers understand Indonesian and Indian languages. The algorithm shouldn't try to guess known languages, but should pick them from a list when visiting a language-specifik Wikipedia. Connecting to da-wiki from Australia, it should be considered more likely that the user understands Norwegian, than some aboriginal language. I don't understand why things like these are rolled out without previous discussion i the wikipedias.
In a usual working scenario, your previously selected languages are supposed to be remembered. If you are using Tor or other anonymizers or proxies, the system cannot know anything about you, so it is showing the biggest global languages, and Indonesian happens to be one of them. If you are using a private browser window, you also won't see any of your previous selections
You can configure your own preferred languages in the browser according to the instructions in Universal Language Selector/Compact Language Links.
Configuring preferred languages per specific project, as you suggest, will be possible very soon. See https://phabricator.wikimedia.org/T138973 .
What is "a usual working scenario"?
I guess a typical scenario is a not logged in user, visiting one of the versions of Wikpedia, possibly not English. The IP is placed somewhere on the planet in region where the Indonesian languages etc. are not known, but the languages in the region are.
Quote: "Configuring preferred languages per specific project ... will be possible" - this gives me the impression, that this change is designed for en-wiki, and is not ready for implementation in the other wikipedias yet. Roll it out when it is developed and tested. Roll back for now.
Yes. This is a change that should not to be done.
There is one more topic. I see no problem if the system uses IP-address and other current information about an unregistered visitor. But if the system not only uses current information, but also remembers pages visited by this human being, it is a privacy gap.
No, the Compact Language Links feature doesn't remember this information.
If a user visits the Danish language Wikipedia from a Danish IP address it will be reasonable to assume that interesting language versions to the person would be:
*sv=Swedish (neighbour country, mutually intelligible with Danish)
*no=Norwegian Bokmål (neighbour country, mutually intelligible with Danish)
*nn=Norwegian Nynorsk (neighbour country, mutually intelligible with Danish)
*de=German (minority language in part of Denmark, neighbour country, language taught in in Danish schools)
*en=English (language taught in in Danish schools)
Languages spoken in overseas countries of The Danish Realm:
Languages taught in some schools:
Now, this example focused on Danish (and Denmark proper) can probably be generalised to most languages; languages, which have no contact with Indian, Indonesian, Chinese languages, but have contact with a lot of neighbour languages.
If the IP is identified as Denmark, then German, Faroese and Kalaallisut are supposed to be shown in the initial list (if, of course, a corresponding article in these languages is available).
Your IP probably wasn't identified as Denmark, which is quite possible if you used something like Tor, so the world's largest languages were shown.
If you don't see a language that interests you in the initial list, you can click "X more" and select the language that you need, and the next time it will be shown in the short list.
I know that Danish is similar to Norwegian and Swedish, but do you have data about the number of people in Denmark who are actually reading in these languages?
- IIRC wikipedia have its data about percentage of visit per language version per country? It should be possible to use the data in reverse to find out percentage of visit on specific language version in a specific country or subregion.
- You can also check the mediawiki language fallback tree?
Many times more read Swedish and Norwegian, than distant languages like Chinese. Especially if the article is better than the Danish one. Exact numbers unknown, but almost nobody in Denmark is able to read Chinese, almost everybody is able to read Swedish.
But I think Danish/Denmark is just an example, the problem is general for language-wikipedias. The solution is usable for en-wiki, not for the Wikipedias of other languages, and should not be implemented in these, in the current form.
I must admit to be extremely worried and sad for this attack against Wikipedia as a free dictionary. Cancel it as soon as possible, please.
I now see that part of the problem is the guess is made based on the number of native speakers in a country, not the number of readers, which is a very big mistake.
An important issue is the assumption is that everybody is logged in, and has set up their browser languages etc. Setting up browser languages manually is what nerds were doing in the Netscape times. We are writing 2016. And most users btw. are not registered Wikipedia accounts.
And finally, assumption should be made on basis of the language the Wikipedia is running, the IP solution is developed for en-wiki.
This experiment should be rolled back on all language wikipedias exept en-wiki, until an acceptable algorithm is found.
The guess is not based on the number of native speakers. We use the data from CLDR, which clearly doesn't refer only to native speakers—for example, the entry for Denmark puts English at 86%, which is obviously not the number of native English speakers in Denmark, but probably the number who know it in one way or another. If you can cite data about the number of people in Denmark who can read Swedish or any other language, you should add it there by clicking "add new" in the table.
Also, the software really doesn't assume that everybody is logged in. Obviously, the vast majority of readers are not registered. The languages that they click in the the "more" panel are automatically added to their preferred languages, and the research that we conducted showed that it works for casual readers.
As the FAQ says, the languages defined in the browser and the languages identified by geolocation are secondary to what users actually click. Once you click Swedish for example, you will see it in the compact list.
Technical question: How are »The languages that they click in the the "more" panel« added? Cookie? IP-address? Or?
Our research shows that Swedish (and any other language) is less accessible when it is part of a long list than it is through the panel that opens when you click the "more" button.
The Nynorsk Wikipedia defined other Scandinavian languages as preferred in , so they would appear at the top of the long list. The same can be done in the Danish Wikipedia, and Compact Language Links will pick it up (not yet today, but soon).
Which research? Link?
The CLDR list only cover how many people speak the language not how many people understand the language. For instance in Iceland it say 100% for Icelandic but only 0.7% for Danish and none for other Scandinavian languages. Hindi-Urdu and Malaysian-Indonesian are same language with different vocabulary under different name, but it does not have data for Hindi in Pakistan or data for Indonesian in Malay, and the data for Urdu in India or data for Malay in Indonesia is only ~10% the data for Hindi/Indonesian in respective country. Almost all country in the world have a number of people that hace at least a certain understanding in English but you can see how many regions in the list have English listed. And the list doesn't even have Libyan Arabic in Libya or Taiwanese(Min-nan) in Taiwan. There are also Iran and Iraq which have only listed Central and/or Southern Kurdish but not Kurdish in general.
And did your test ask users the accessibility to a specific language edition with the specific language's name given, or when the user don't know what they are lookibg for which is the most cases?
In Iceland they learn Danish in school, so they are able to read Danish as well as their native Icelandic. (The languages are not mutually intelligible). In many countries especially in Europe it's most common to speak more than one language.
ah I see, the example might be not that good.
The CLDR list is not perfect and it can be improved. It has direct and visible to add languages and report bugs.
Without Compact Language Links there is nothing that helps a user who is reading the Indonesian Wikipedia to find a link to the Malay language, because it is lost in the long list. With Compact Language Links, it is easier to find, because there is a search box to find the language, and after the user clicks it once, it will remembered.
As for the other question—it's a good question; I'll find it and I'll get back to you.
Have you get any answers?
That now not all language links are shown on Wikipedia, is a discrimination of the small languages and nothing else. There was now real need for such a change! All your arguments are very constructed an artificial and only serve to hide a political decision which sole goal is the discrimination of the small language communities: There was absolutely no reason to change the policy of language links: In an alphabetic list of languages there was never a problem to find the right ling even in a list of 200 and more languages! So I can only protest against this arbitral decision an ask to go back to the old System!
- How about adding a link to CLDR in CLL panel so that people can tell CLDR what languages are missing in their region?
- CLDR can be improved but the process seems long. Like I submitted some additional languages being used in Hong Kong back in last year and that already get accepted, but those info are still not available in CLDR v.30 beta which is supposed to be the last CLDR to release in this year according to the release cycle which mean the process of fixing something via CLDR's issue tracker take >1 year.
- Even if CLDR data get improved, that still does not resolve the problem that its scope is "population that is able to read and write each language, and is comfortable enough to use it with computers.", instead of "able to understand and retrieve info from the page" which would be a lower standard. For instance, most Chinese users can't write in Japanese but they can read out basic info from Japanese Wikipedia because of the common use of Chinese characters. It will certainly not reflected onto CLDR.
Thanks for submitting fixes for CLDR!
People who can read Japanese (or any other language) will find the language in the panel, and after they click it, they will always see it.
- but I'm talking about people in general who don't know what they're looking for instead of finding specific language.
- After click it then always see it is also problematic because sometime I only want to use the language once and then when those one time clicks accumulate, they would repell those frequent language out of the short list
Thanks for discriminating languages without an exact location, such as Esperanto and Yiddish.
I'm going to restate some things that were posted above, because they don't seem to have penetrated. In New York City, Wolof, Tajik, Newari, Marathi, Hausa, Guarani, Azerbaijani, and Aymara are all spoken and read, but they will not be likely to appear on any location-based list of languages, because not enough people there speak and read them, and New York is a long way from the places where they originated. Users of these languages won't always think to click a "more languages" link, because they may not realize what the link is supposed to do, and they won't know their language is represented if they don't click to see. The natural assumption, given that some of these languages don't have much internet content to begin with, will be that they aren't there. Allowing users to customize which languages they see only helps if they know which languages they have to choose from.